Cognitive Labs

Validity is everything in assessment. It means that assessment does what it says— measures what it is intended to measure, providing accurate and meaningful results. During assessment development, C3Be uses a rigorous process that promotes validity. Cognitive labs are powerful tools for finding and removing threats to an assessment’s validity.
What is a cognitive lab?
Cognitive labs are a tool that can be used to investigate a variety of research problems. The main component of a cognitive lab is an activity called a “think-aloud.” In a think aloud interview, participants are prompted to recount their complete thought process as they encounter and respond to an item (Ericsson & Simon, 1993). An interviewer facilitates the test-taker's verbal report, reminding them to think aloud and probing for reasoning without influencing or directing their thinking.
Uses for cognitive labs
In assessment and measurement, cognitive labs have been used to support the development and validation of knowledge and attitudinal assessments (Desimone & LeFloch, 2004; Howell et al., 2013). They allow a test developer to check whether an assessment item is working as it was intended. Rooted in cognitive psychology, cognitive labs were originally developed to explore the thought process behind human behavior. Today, they play a crucial role in assessment validation.
There are two main ways that cognitive labs support assessment validity:
- Provide information about whether the test-taker's response to an assessment task is aligned with the intended construct.
- Identify anything unclear or confusing that might interfere with the test-taker's ability to do the task.
In addition to supporting assessment validity, C3Be relies on cognitive labs as a tool to include users and stakeholders in the assessment development process. Aligned with our design thinking approach, cognitive labs serve as a pilot test giving us valuable insights about how real test-takers will react to assessment tasks without the stakes associated with a real testing situation.
Refining Assessments Based on Feedback from Cognitive Labs
The following examples come from a recent project where C3Be designed assessments to measure readiness for careers in cybersecurity. The assessment was designed to be taken on a computer, and we conducted the cognitive labs remotely over zoom. Each participant shared their screen while thinking aloud. This gave us a full sense of the user experience for each assessment task.
Example 1: More than one pattern
Problem Identified
A test-taker provided a wrong answer, despite demonstrating evidence of the intended constructed.
The question

Intended construct and answer
The test was designed so that the test-takers recognize patterns within columns:
- First column are multiples of three
- Second column are multiples of 3
- Third column are multiples of 5
- Correct answers: 3, 8, and 15
By recognizing and completing the pattern within columns, they would demonstrate analytical reasoning.
Insight from the cognitive lab
During the cognitive lab session, a participant provided the response 1, 9, and 13, which was considered incorrect. When asked why, the participant explained they had identified a different but logically valid pattern moving across the rows rather than down the columns:
- Rows 1, 2, and 3: Add 3 to the first column’s value, then add 1 to the second column’s value
Their reasoning demonstrated that they recognized a pattern and applied analytical reasoning, exactly the skills the test aimed to measure, but the original scoring method would have penalized the test-taker.
Solution
To eliminate the threat to validity, the item needed revision:
- Adjust the scoring to allow multiple correct answers, recognizing that there is more than one way to demonstrate the intended construct.
- Revise the item so that there is only one correct answer.
Example 2: Dire consequences of language
Problem identified
Test-takers were distracted by extreme language in a response option, preventing them from choosing a correct answer.
The question
Intended construct and answer:
This assessment item measured attention to detail by asking the test-takers to examine an email for signs of a phishing attempt. One of the correct answers was (E). The email states that a deadline for renewal was missed.
Insight from cognitive lab
Participants felt the phrase “dire consequences” was too extreme for a missed SiriusXM subscription renewal. Test-takers hesitated to select the correct answer because they doubted whether missing a subscription renewal classifies as being truly dire in consequence. One of the participants even remarked, “A siriusXM subscription is not dire.” The language unintentionally led test-takers away from a correct answer, threatening the validity of the item.
Solution
Revise the language. The phrase “dire consequences” was replaced with more neutral and realistic wording.
Conclusion
For any widely used assessment, the items should provide trustworthy information about a broad range of people. As a psychometrician, I’ve learned that test-takers often interpret assessments tasks in unexpected ways. Cognitive labs are a tool that helps us navigate this challenge. Cognitive labs allow us to uncover these variations early, ensuring assessments accurately reflect the intended constructs before they are widely implemented. At C3Be, we use cognitive labs to gather rich information about the test-taker's experience and uncover any issues before an assessment goes live. The resulting assessment tasks are more engaging and valid representations of the skills being measured.
References
Desimone, L. M., & Le Floch, K. C. (2004). Are We Asking the Right Questions? Using Cognitive Interviews to Improve Surveys in Education Research. Educational Evaluation and Policy Analysis, 26(1), 1-22. https://doi.org/10.3102/01623737026001001
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press.
Howell, H., Phelps, G., Croft, A. J., Kirui Dl, & Gitomer, D. (2013). Cognitive interviews as a tool for investigating the validity of content knowledge for teaching assessments (Research Report RR-13-19). Princeton, NJ: Educational Testing Service.