Measurement, assessment, and evaluation


  Measurement, assessment, and evaluation mean very different things, and yet most of my students are unable to adequately explain the differences. So, in keeping with the ADPRIMA approach to explaining things in as straightforward and meaningful a way as possible, here are what I think are useful descriptions of these three fundamental terms.

Measurement refers to the process by which the attributes or dimensions of some physical object are determined. One exception seems to be in the use of the word measure in determining the IQ of a person. The phrase, "this test measures IQ" is commonly used. Measuring such things as attitudes or preferences also applies. However, when we measure, we generally use some standard instrument to determine how big, tall, heavy, voluminous, hot, cold, fast, or straight something actually is. Standard instruments refer to instruments such as rulers, scales, thermometers, pressure gauges, etc. We measure to obtain information about what is. Such information may or may not be useful, depending on the accuracy of the instruments we use, and our skill at using them. There are few such instruments in the social sciences that approach the validity and reliability of say a 12" ruler. We measure how big a classroom is in terms of square feet, we measure the temperature of the room by using a thermometer, and we use Ohm meters to determine the voltage, amperage, and resistance in a circuit. In all of these examples, we are not assessing anything; we are simply collecting information relative to some established rule or standard . Assessment is therefore quite different from measurement, and has uses that suggest very different purposes. When used in a learning objective, the definition provided on the ADPRIMA for the behavioral verb measure is: To apply a standard scale or measuring device to an object, series of objects, events, or conditions, according to practices accepted by those who are skilled in the use of the device or scale.

Assessment is a process by which information is obtained relative to some known objective or goal. We assess at the end of a lesson or unit. We assess progress at the end of a school year through testing, and we assess verbal and quantitative skills through such instruments as the SAT and GRE. Whether implicit or explicit, assessment is most usefully connected to some goal or objective for which the assessment is designed. An assessment is another way of saying a test. A test or assessment yields information relative to an objective or goal. In that sense, we test or assess to determine whether or not an objective or goal has been obtained. Assessment of skill attainment is rather straightforward. Either the skill exists at some acceptable level or it doesn’t. Skills are readily demonstrable. Assessment of understanding is much more difficult and complex. Skills can be practiced; understandings cannot. We can assess a person’s knowledge in a variety of ways, but there is always a leap, an inference that we make about what a person does in relation to what it signifies about what he knows. In the section on this site on behavioral verbs, to assess means To stipulate the conditions by which the behavior specified in an objective may be ascertained. Such stipulations are usually in the form of  written descriptions.

Evaluation is perhaps the most complex and least understood of the terms. Inherent in the idea of evaluation is "value." When we evaluate, what we are doing is engaging in some process that is designed to provide information that will help us make a judgment about a given situation. Generally, any evaluation process requires information about the situation in question. A situation is an umbrella term that takes into account such ideas as objectives, goals, standards, procedures, and so on. When we evaluate, we are saying that the process will yield information regarding the worthiness, appropriateness, goodness, validity, legality, etc., of something for which a reliable measurement or assessment has been made. For example, I often ask my students if they wanted to determine the temperature of the classroom they would need to get a thermometer and take several readings at different spots, and perhaps average the readings. That is simple measuring. The average temperature tells us nothing about whether or not it is appropriate for learning. In order to do that, students would have to be polled in some reliable and valid way. That polling process is what evaluation is all about. A classroom average temperature of 75 degrees is simply information. It is the context of the temperature for a particular purpose that provides the criteria for evaluation. A temperature of 75 degrees may not be very good for some students, while for others, it is ideal for learning. We evaluate every day. Teachers, in particular, are constantly evaluating students, and such evaluations are usually done in the context of comparisons between what was intended (learning, progress, behavior) and what was obtained. When used in a learning objective, the definition provided on the ADPRIMA site for the behavioral verb evaluate is: To classify objects, situations, people, conditions, etc., according to defined criteria of quality. Indication of quality must be given in the defined criteria of each class category. Evaluation differs from general classification only in this respect.

To sum up, we measure distance, we assess learning, and we evaluate results in terms of some set of criteria. These three terms are certainly connected, but it is useful to think of them as separate but connected ideas and processes.

Here is a great link that offer different ideas about these three terms, with well-written explanations. Unfortunately, most information on the Internet concerning this topic amounts to little more than advertisements for services.



 General Directorate of Measurement and Evaluation /Saudi Arabia




Assessment, measurement, research, and evaluation are part of the processes of science and issues related to each topic often overlap. Assessment refers to the collection of data to describe or better understand an issue, measurement is the process of quantifying assessment data, research refers to the use of data for the purpose of describing, predicting, and controlling as a means toward better understanding the phenomena under consideration, and evaluation refers to the comparison of data to a standard for the purpose of judging worth or quality. Assessment and/or measurement are done with respect to variables (phenomena that can take on more than one value or level). For example, the variable "gender" has the values or levels of male and female and data could be collected relative to this variable. Data on variables are normally collected by one or more of four methods: paper/pencil, systematic observation, participant observation, and clinical. Three types of research studies are normally performed: descriptive, correlational, and experimental.

Collecting data (assessment), quantifying that data (measurement), making judgments (evaluation), and developing understanding about the data (research) always raise issues of reliability and validity. Reliabilityvalidity focuses on accuracy or truth. The relationship between reliability and validity can be confusing because measurements (e.g., scores on tests, recorded statements about classroom behavior) can be reliable (consistent) without being valid (accurate or true). However, the reverse is not true: measurements cannot be valid without being reliable. attempts to answer concerns about the consistency of the information (data) collected, while

The same statement applies to findings from research studies. Findings may be reliable (consistent across studies), but not valid (accurate or true statements about relationships among "variables"), but findings may not be valid if they are not reliable. At a miniumum, for an instrument to be reliable a consistent set of data must be produced each time it is used; for a research study to be reliable it should produce consistent results each time it is performed.


Bill Huitt, John Hummel, and Dan Kaeck
Department of Psychology, Counseling & Guidance
Valdosta State University


  • Kuhn, T. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.

Wilber, K. (1998). The marriage of sense and soul: Integrating science and religion. New York: Random House







  Performance-Based Assessment





KNOWLEDGE (skills or attitudes) ASSESSMENT = systematic examination procedure by testing and with the goal to establish desired characteristics and gathering proof about the level and quality of the acquired knowledge (skills or attitudes). 

The required characteristics (of knowledge, skills or attitudes) are defined by instructional goals, so that the type of examination depends on educational goals. Goals clearly and concretely define what knowledge and which skills or attitudes a student should have at the end of the instructional process. Therefore, the selection of goals is the most important decision made by experts when planning a curriculum for a certain educational profile, and the teachers should follow it!
The content and range is chose from three areas (see Bloom's Taxonomy): 
1) cognitive area of knowledge and understanding
    - knowledge is defined as a systematic overview of acquired and permanently memorized facts
    - cognitive knowledge is defined as knowledge related to mental ability or function.
2) affective area of attitudes
3) psycho-motor area of skills
Examination is an integral part of every instructional process!


1) grading the student's success at the end of the instructional and learning process
2) feedback (to both the student and professor) during the learning process
3) improvement of the quality of instruction.
1) oral
3) practical
1. Anić B.. Rječnik hrvatskog jezika. Novi liber, Zagreb 2000.
2. Norman G.R. Van der Vleuten C.P.M. Newble DJ. (editor). International Habdbook of Research in Medical Education. Dordrecht: Klouwer Academic Publishers, 2002
3. Jakšić Ž, Pokrajac N, Šmalcelj A. Vrcić-Keglević M (editor.) Umijeće medicinske nastave. Medicinski fakultet, Zagrebu, 2002. (2nd edition)
10 Golden Rules for Writing Multiple Choice Questions


         In a classical multiple choice question a student should choose a correct answer among several (optimally 5) answers.

 Multiple choice questions consist of three obligatory parts:


1. the question ("body of the question")
2. the correct answer ("the key of the question")
3. several incorrect alternatives (the so called "distracters")

and optional (and especially valuable in self-assessment)
comment on the student's answer.



    Writing a good exam question with multiple answers is a skill that usually comes with experience (often bitter :-) ). Feedback gathered through analysis of student answers ("item analysis") is very important for the authors of the test. There are several rules we can follow to improve the quality of this type of written examination. 

1. Examine only the important facts!
Make sure that every question examines only the important knowledge. Avoid detailed questions - each question has to be relevant for the previously set instructional goals of the course.

2. Use simple language!
Use simple language, taking care of spelling and grammar. Spelling and grammar mistakes (unless you are testing spelling or grammar) only confuse students. Remember that you are examining knowledge about your subject and not language skills.

3. Make the questions brief and clear!
Clear the text of the body of the question from all superfluous words and irrelevant content. It helps students to understand exactly what is expected of them. It is desirable to formulate a question in such way that the main part of the text is in the body of the question, without being repeated in the answers.

4. Form the questions correctly!
Be careful that the formulation of the question does not (indirectly) hide the key to the correct answer. Student (adept at solving tests) will be able to recognize it easily and will find the right answer because of the word combination, grammar etc, and not because of their real knowledge.

5. Take into consideration the independence of questions!
Be careful not to repeat content and terms related to the same theme, since the answer to one question can become the key to solving another.

6. Offer uniform answers!
All offered answers should be unified, clear and realistic. For example, unlikely realisation of an answer or uneven text quantity of different answers can point to the right answer. Such a question does not test real knowledge. The position of the key should be random. If the answers are numbers, they should be listed in an ascending order.

7. Avoid asking negative questions!
If you use negative questions, negation must be emphasized by using CAPITAL letters, e.g. "Which of the following IS NOT correct..." or "All of the following statements are true, EXCEPT...".

8. Avoid distracters in the form of "All the answers are correct" or "None of the answers is correct"!
Teachers use these statements most frequently when they run out of ideas for distracters. Students, knowing what is behind such questions, are rarely misled by it. Therefore, if you do use such statements, sometimes use them as the key answer. Furthermore, if a student recognizes that there are two correct answers (out of 5 options), they will be able to conclude that the key answer is the statement "all the answers are correct", without knowing the accuracy of the other distracters.

9. Distracters must be significantly different from the right answer (key)!
Distracters which only slightly differ from the key answer are bad distracters. Good or strong distracters are statements which themselves seem correct, but are not the correct answer to a particular question.

10. Offer an appropriate number of distracters!
The greater the number of distracters, the lesser the possibility that a student could guess the right answer (key). In higher education tests questions with 5 answers are used most often (1 key + 4 distracters). That means that a student is 20% likely to guess the right answer.




Writing and taking MCTs


saudienglish Forum


 Writing Multiple-Choice Test Items (For teachers)


Taking Multiple-Choice Tests (For students)




 Testing & Assessment

Educational Standards U.S. / by state
International Comparisons of Student Performance World Bank
Assessment Australia
Assessment Canada
Assessment Glossary questionmark
Skill-Level Descriptions englishschool.org.uk
Proficiency Guidelines ACTFL
Proficiency Guidelines ACTFL
Proficiency Guidelines ACTFL / gwu.edu/~slavic/actfl.htm



 Types of written questions


Types of written questions

(one of the offered answers
should be selected)
(an answer should be written)
TypeBrief descriptionTypeBrief description
Multiple choice question - MCQOnly one of the offered answers is correctEssay 
Multiple responseMore than one answer can be correctTextual entryThere are no answers offered, a student has to write the answer into an empty field
True / FalseThe question is a statement the accuracy of which should be evaluated (only two options)Numerical entryA student writes a numerical answer  into an empty field
Matching questionsNotions written in two columns should be matched Gap fillingA student fills in the words missing from the text


 Multiple choice question - MCQ


- jednostavno, precizno i objektivno ocjenjivanje/bodovanje; :
- this assessment/grading system is simple, precise and objective; it is especially convenient for computer awarded point system - it significantly reduces and simplifies professor's job with grading
- convenient for self-assessment, as well as for summative exams
- it is possible to examine different cognitive levels of knowledge - according to Bloom's taxonomy of educational goals: acquiring facts, understanding, application of factual knowledge, analysis, synthesis, evaluation)
- quick and simple statistical analysis of the whole test, as well as of the difficulty of individual questions and discriminative ability of a question ("item analysis")
- MCQ tests examine the subject content more comprehensively than essay type questions.
- simple composition and analysis of MCQ tests enables frequent testing, so professors (and students) get regular feedback about students' success in mastering the matter
- in MCQ tests students don't have the possibility to skirt or simplify the theme (unlike essay type tests) - the professor can determine in detail the plan ("depth") of the exam

- writing good quality questions is complex and time-consuming
- professors are prone to writing questions which examine only the memorizing of facts since that type of questions is the easiest to create
- these tests often don't measure the real knowledge since the right answer can be deducted by eliminating the incorrect ones
- a certain percentage of answers can be guessed
- it is very difficult to test creativity with this type of questions (it is tested best through essay type questions)

 Multiple response questions

- their advantage is easy automated result analysis 
- their downside is usually low level of knowledge which is being tested
- and a greater possibility of guessing the right answers

 True / Fals question)

- subcategory of MCQ tests with only two choices - "true" or "false"
- a question is written in the form of a statement which needs to be evaluated as true or false
- this type of questions is suitable for short formative self-assessment which gives a student (or a professor) quick feedback on the level of acquired factual knowledge
- suitable for factual knowledge examination
- not suitable for examination of higher cognitive knowledge (see Bloom taxonomy)
- the statement should be written using different vocabulary from the one used in the textbook or instructional materials, so that simply memorizing the text will not be sufficient to select the right answer
- when writing the statement, words that could suggest what the true/false answer is, should be avoided; e.g. words such as "never", "nothing", "always", "everything" have a tendency towards false, while adverbs such as "usually", "generally", "sometimes" or "often" are probably true
- it is desirable to avoid negative statements
- it is recommended to have approximately the same number of true and false statements in a test
- statistic probability to guess the right answer without actual knowledge (randomly) is 50%, so the minimum test score percentage should be at least 75% (with MCQ with 5 options the minimum test score is 60%)

 Matching list

- subcategory of MCQ where notions in two columns must be mutually matched
- these questions are suitable for both formative self-assessment through which students and professors get quick feedback about the level of acquired knowledge, and for summative exams
- it enables examination of factual knowledge, and adept question construction can enable examination of higher cognitive levels of knowledge such as causal connection (see Bloom's taxonomy)
- it is important that information in both columns are as homogenous as possible
- reusability of answers is important
- it is recommended to have more possible answers than questions


     Text match

- constructed-response questions are formed similarly as MCQ, but without the offered answer. In stead, students have to write the answer into free space provided for it
- unlike MCQ where students choose between offered answers (there is a possibility that they do not have the necessary knowledge, but choose the correct answer by elimination or randomly), with this type of questions the students must know the correct answer
- if the problem is numerical and the answer is a number, all possible variants which are considered correct should be predicted (e.g. decimal places)

    Fill in the blank

- advantage: it is easy to examine higher cognitive levels
- downside: harder automated analysis, since all the forms of the correct answer have to be predicted


        There are two types of essays:
- long, extended essay which is used to examine one sample of subject matter on a higher cognitive knowledge level (creativity)
- short, limited essay - several short essays can be used for a more superficial processing of a larger number of samples of subject matter
- it is the only examination method which can be used to assess the students' ability to explain a certain answer independently and in a written form
- higher cognitive levels of knowledge are examined (analysis, synthesis, evaluation)
- relatively simple development and formulation of questions
- time-consuming analysis and grading
- potential unreliability and subjectivity in grading
- questionable validity and reliability
- use only to examine the types of activities which cannot be successfully assessed through another type of examination
- select the task suitable for instructional goals and which is the representative sample of the subject matter
- guide the students with clear instructions (e.g. explain the mechanism through which A influences B; analyse relations between A and B)
 Item Analysis after the Test, and Before the Results are Published


What Does Research Say About Assessment?


Purposes of Assessment

Effects of Traditional Tests

Language Assessment

What It Measures and How 

Jill Kerper Mora, Ed.D.
San Diego State University

Click here to start


16 slides (graphic /text) {types of 2nd learner errors included}



Characteristics of Good Assessment

Trends Stemming from the Behavioral to Cognitive Shift

Checklist for Excellence in Assessment And glossary



Thondike and Hagen (1986) define measurement as "the process of quantifying observations [or descriptions] about a quality or attribute of a thing or person" (p.5).

The process of measurement involves three steps:

  1. identifying and defining the quality or attribute that is to be measured;
  2. determining a set of operations by which the attribute may be made manifest and percievable; and
  3. establishing a set of procedures or definitions for translating observations into quantitative statements of degree or amount. (p. 9)

Methods of data collection

Data are generally collected through one or more of the following methods:

    1. Paper/pencil--Collection of data through self-reports, interviews, questionnaires, tests or other instruments
    2. Systematic observation--Researcher looks for specific actions or activities, but is not involved in the actions being observed
    3. Participant observation--Researcher is actively involved in the process being described and writes up observations at a later time
    4. Clinical--Data are collected by specialists in the process of treatment


  • Thorndike, R., & Hagen, E. (1986). Measurement and evaluation in psychology and education (4th ed.). New York: Wiley.


Evaluation includes the process of making judgments about the value of data collected through observations and descriptions. It is closely related to the concept of assessment, which is defined as "the process of collecting, interpreting, and synthesizing information in order to make decisions" (Gage & Berliner, 1991, p. 568). It is generally agreed that it is better to base judging and decision making on quantitative data as much as possible.

There are a variety of issues related to measurement and evaluation that are relevant to classroom and school settings.



  More links !





In general a rubric is a scoring guide used in subjective assessments. A rubric implies that a rule defining the criteria of an assessment system is followed in evaluation. A rubric can be an explicit description of performance characteristics corresponding to a point on a rating scale. A scoring rubric makes explicit expected qualities of performance on a rating scale or the definition of a single scoring point on a scale

Rubrics are explicit schemes for classifying products or behaviors into categories that vary along a continuum. They can be used to classify virtually any product or behavior, such as essays, research reports, portfolios, works of art, recitals, oral presentations, performances, and group activities. Judgments can be self-assessments by students; or judgments can be made by others, such as faculty, other students, or field-work supervisors. Rubrics can be used to provide formative feedback to students, to grade students, and/or to assess programs.

Rubrics have many strengths:

  • Complex products or behaviors can be examined efficiently.
  • Developing a rubric helps to precisely define faculty expectations.
  • Well-trained reviewers apply the same criteria and standards, so rubrics are useful for assessments involving multiple reviewers.
  • Summaries of results can reveal patterns of student strengths and areas of concern.
  • Rubrics are criterion-referenced, rather than norm-referenced. Raters ask, "Did the student meet the criteria for level 5 of the rubric?" rather than "How well did this student do compared to other students?" This is more compatible with cooperative and collaborative learning environments than competitive grading schemes and is essential when using rubrics for program assessment because you want to learn how well students have met your standards.
  • Ratings can be done by students to assess their own work, or they can be done by others, such as peers, fieldwork supervisions, or faculty.


"Rubrics" allow teachers to be more objective in scoring/grading complex student performances. Moreover, they help students understand more clearly just what is expected of them in an assignment or activity. Students and teachers can compose rubrics together, and revise them according to actual performance. They give a reference point and language for raising expectations and achievement.


rubrics links

rubrics advantages here

 Language Arts Rubrics links



Rubrics for Teachers

This Week's Featured Rubric

Elementary School Rubric

Descriptive Writing Rubric

Teacher Self-Evaluation

Science Lab Report Rubrics

Generic Listening Rubric

Middle School Phys. Ed. Rubrics

Blank Rubric Templates

                               Rubrics by Term


Understanding Rubrics


Teachers &Self-Assessment


Class Participation Rubrics


Organization Rubrics


Presentation Rubrics


Behavior Rubrics


Teachers TeAch-nology      Top Sites for English Learners     Mark's ESL World
  Top 100 Teacher Sites     Welcome to ITESL     TESOLMAX     English Club




HCC Assessment Website Information on HCC assessment endeavors.
Classroom Assessment Techniques Techniques for better teaching and learning.
Classroom Assessment Examples Five examples from the previous article.
Quizzes, Tests, and Exams Types, Bloom bases, guidelines, construction.
Assessment is More than Keeping Score Moving from inquiry, through interpretatin, to action.
Test Item Bias Review When decisions are made based on test scores, it is critical to avoid bias.
The Knowledge Survey A Tool for All Reasons.
Portfolio Assessment Using a collection of student work representing a selection of performance.
Student Passports A formal document presenting student mastery of skills.
A Mid-Semester Survey Use this simple survey to get feedback from your students.