Test validity

In very simple terms, validity tells us whether a test works well for the purpose that we will use it for.

People often talk about a test being good or bad, or whether it is fit for purpose. In technical language, they’re actually talking about validity. Test developers often claim that their test is valid, or that it’s been validated. But what is actually meant by the term valid? In fact, there are two camps here. One argues that validity is a feature of a test. The test does what it claims to do. The other claims that it is a feature of the decisions that are made based on test performance. Let’s think about those two positions for a moment. If validity is a feature of a test, then we only need to gather evidence that the test is measuring a particular trait or ability.

 Text by: Helen Papadopoulou

If validity is a feature of the decisions, then the scope of the information required to demonstrate validity is far broader. It should include evidence that the test is working appropriately within specific educational and social contexts, and with specific test takers.


Since the latter is the most commonly accepted, let’s stick with it. Validity is actually an argument built around a set of evidence, which is expected to support any decisions that are made based on test performance.


Even before the birth of the testing industry early in the 20th century, people were beginning to think about the quality of tests. And by the 1950s, the modern concept of validity emerged.


At that time, the feeling was what types of evidence was required; what the test contained and criterion how the test outcome compared to other measures of the same skill for example, from another test or from another teacher.


In the 1990s, it was widely agreed that a unitary approach was needed with evidence from a variety of sources expected to contribute to a single validation argument. By then, the types of evidence expected had grown so complex that while the approach became the theoretical norm in the 21st century, it has never been possible to fully apply it.


Around the turn of the century, the socio-cognitive approach emerged. This approach attempted to balance the social and cognitive aspects of language ability while making clear how the different types of evidence fitted together to form a fuller picture of validity. The approach asks that evidence be gathered from the beginning of the development process, focusing on the test taker, the test system, the questions and activities included in the test, as well as things such as timing and the scoring system -including all aspects of scoring and awarding a grade or mark.


Evidence should also come from test stakeholders. Stakeholders are those people affected by a test, and can include test takers, parents, teachers, education officials, policymakers, and others. Once all the evidence has been gathered, it must be put together to form a convincing argument to support test-related decisions.


The most important thing is that a logical and comprehensive set of evidence is presented in an appropriate way for all stakeholders to understand.


How do we know if a test works?

If we want to know whether a test ‘works’, we need to think about the test’s purpose.

Questions we need to ask about a test include:

  • Are the test tasks relevant to students’ real-life language needs? For example, if students need to write academic essays, is this type of writing included in the test?
  • Do the test tasks encourage students to think like they would in real-life communication? For example, if students need to be able to have interactive conversations, does the test require them to speak spontaneously, as they would in real life?
  • Does the test copy the social conditions in which the students will communicate? For example, if we want to assess students’ ability to talk with their peers, we might prefer to test them in pairs or groups, rather than have them interviewed by a teacher.
  • Do the test results give us useful information about students’ ability? For example, can a foreign university be confident that the test results show ability to use English for academic study?

Of course, no test is perfect. For practical reasons, we often need to make compromises, but we should aim to get as many of these things right as possible.

These things all contribute to the validity of a test. Validity is a difficult concept. However, when teachers create a classroom test, or choose a test for students to take, the most important questionsare: Does this really test the language abilities that I want to know about? Is the content of the test relevant to what we have done in class? Is it age appropriate? Does it encourage further learning? Has it been constructed to cater for diverse groups of students and learning styles?

This is only a short list. You can add as many questions as you wish.