Adaptive testing and the LanguageCert Test of English

A test – through most teachers’ and students’ eyes – is usually a linear operation consisting of a set number of items that a student grinds through from the first item to the last. The current article describes the background to and the functioning of LanguageCert’s non-linear Test of English (LTE) ‘adaptive’ test. An ‘adaptive’ test adapts to a student’s responses and is able to move to a higher level or a lower level to adjust to the student’s real level of proficiency. Following a brief overview of computer adaptive testing (CAT), and of how CATs function and operate, the LanguageCert adaptive test is described, together with an overview of the algorithm used by the LTE CAT. The article closes with a comment on what taking a CAT means for a student, and how LanguageCert LTE CAT compares with its pen-and-paper sister.

by Nigel Pike and David Coniam

The LanguageCert Test of English is a testing system which operates on the back of a very large item bank of calibrated test items. From the item bank, two types of test may be drawn. One facility relates to the production of linear tests with fixed numbers of test items: one test provides results from CEFR A1-B2; a second longer test, provides results from A1-C2. The second facility is the non-linear adaptive test measuring from A1-C2.

COMPUTER ADAPTIVE TESTING: BACKGROUND

The concept underpinning a computer adaptive test (CAT) involves each student taking a unique test that is directly tailored to their specific ability level. Students do not work through a linear test, but rather are offered items which are pitched as far as possible at their own particular ability level (Wise, & Kingsbury, 2000). Against this backdrop, CATs point up a number of benefits (see Chalhoub-Deville & Deville, 1999) which are of particular use with groups of students who may be at differing levels of language ability.

A CAT focuses immediately on a student’s ability level.
A CAT requires fewer items than a typical pen-and-paper test to be able to estimate a student’s ability but still provides a high degree of precision.
A CAT offers a student a consistent, realistic challenge as the student is not forced to answer items that are too easy or too difficult for them.
The CAT algorithm enhances test security – due to each student sitting a different set of test items.

Over the past two decades, there has been considerable growth in the development and use of CATs in English language assessment, with adaptive tests developed to measure a range of language skills: listening (Coniam, 2006), reading (Chalhoub-Deville, 1999, Kaya-Carton et al., 1991), vocabulary (Tseng, 2016).

CAT FUNCTIONING AND OPERATIONAL ISSUES

Developing an adaptive test requires a considerable amount of background specification setting, item production and vetting, pretesting and calibration in order to create the eventual item bank. Weiss & Kingsbury (1984) outline some key issues integral to the construction of a CAT:

A calibrated item pool
The starting point or entry level of first item offered to a student
The item selection algorithm by which additional items are offered to a student, and the scoring procedure
The ‘termination’ decision point (with student ability determined) where no further items need to be taken.

The success of a CAT program depends on the size of the item bank, with items covering a wide range of item types, topics, testing points etc. Other practical considerations that determine the appropriate size of a bank are how many levels, for example, are going to be tested, the number of task types, and the number of items per task type. All these factors dictate the number of items required in the adaptive test bank, how quickly (or slowly) a student moves to their ability level and at what ranges around a particular ability level a student may be offered items.

CATs vary in their operation in terms of length and selection criteria of subsequent items. Some CATs are designed such that all students do a set number of items, say, 50. Other CATs are set up such that students do as many items as may be needed for the CAT algorithm to arrive at a ‘terminal’ score. With the first fixed-number set, the mode of selection for each item, or set of items, changes according to past scores. The second, approach (where students are presented with items one at a time) adapts more readily to student ability, but generally results in more variable-length tests, as the test continually offers items at differing levels of difficulty to students.

THE LANGUAGECERT CAT

The LTE adaptive test is a ‘level agnostic’ listening and reading test, in that students do not apply to do a test at a specific level. The LTE CAT assesses listening and reading, returning results on students from CEFR levels A1 to C2. It began development in 2019 with an item bank of approximately 800 items consisting of a range of listening and reading items and testlets (mini tasks of 2-5 connected items) which assessed different listening and reading constructs. The adaptive test first went live with a much-increased bank size in April 2020. Items are continually being added to the item bank. Each adaptive test bank is drawn from this item pool.

The LanguageCert algorithm is set up so that all students are presented with approximately 58 items: normally this is 28 listening items and 30 reading items. The listening component has four sections, each with between five and eight items of different types and different testing constructs, for example understanding detailed information, following sequential aspects of an exchange and providing contextually and functionally appropriate responses (pragmatic competence), understanding longer spoken texts and (at the higher CEFR levels) appreciation of speaker intention, inference and summarizing. The first item presented to a student is at approximately B1 level. The reading component has six sections, and students move between item types depending upon performance and predicted ability. The different sections of the test have different testing focuses, for example reading and understanding short notices, vocabulary use in context, lexico-grammatical awareness, and understanding longer texts which taps into different reading sub-skills depending on the level of the student (from understanding factual information at lower levels to inference and understanding writer intention at higher levels).

After having completed the 58 items, a student’s level is then determined – at a specific point on the CEFR ability scale A1 to C2. As of mid-2021, tens of thousands of students have already taken the LTE adaptive test. The LTE and the item bank have been calibrated together such that a student receives the same grade whether they take a paper-based test, or the computer adaptive test.

IN CLOSING

From a teacher’s perspective, the advantage of an adaptive test, such as LanguageCert’s LTE is that the test can be taken at any time, anywhere, for any level of student. Different test forms are offered to all students so ‘cheating’ or sharing of test papers is much harder to do. In addition, a result is available almost immediately.

A LanguageCert CAT test takes about 50-60 minutes to complete. It is a pretty seamless online experience as students are directed through the test and presented with items via the algorithm and software. Students receive individual tests and all tests cover a range of functions, vocabulary/grammar points, and tap into a wide range of reading and listening sub-skills.

The LanguageCert CAT is intended (as far as this is possible with a test!) to be a comparatively test-taker-friendly test. It consists of easily-accessible task types that students will be aware of from course books, classroom activities etc; there is no need for specific preparation except understanding how the test works and what task types you will see. Some preparation materials/mock tests are available on the LanguageCert website: https://www.languagecert.org/en/language-exams/english.

Finally, the LanguageCert CAT is not a standalone test. As mentioned above, the LanguageCert Test of English (LTE) was set up so that both linear pen-and-paper-tests as well as online adaptive tests may be generated from the one item bank. The two types of test have been calibrated and validated so that a student will receive a similar result irrespective of whichever version of the LTE they take.

REFERENCES

Chalhoub–Deville, M., & Deville, C. (1999). Computer adaptive testing in second language contexts. Annual Review of Applied Linguistics, 19, 273-299.

Chang, H. H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1-20.

Coniam, D. (2006). Evaluating computer-based and paper-based versions of an English-language listening test. ReCALL, 18(2), 193-211.

Kaya-Carton, E., Carton, A. S., & Dandonoli, P. (1991). Developing a computer-adaptive test of French reading proficiency. In P. Dunkel (Ed.), Computer assisted language learning and testing: Research issues and practice. New York: Newbury House.

Rudner, L. M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer, New York, NY.

Tseng, W. T. (2016). Measuring English vocabulary size via computerized adaptive testing. Computers & Education, 97, 69-85.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375.

Wise, S. L., & Kingsbury, G. G. (2000). Practical issues in developing and maintaining a computerized adaptive testing program. Psicológica, 21, 135-155.