2019 MAP® Growth™ Technical Report Page 96
2016; Wolf, et al., 1995). Research has demonstrated that the structure of item response time
distributions allows examinee behavior to be classified as a rapid-guessing or solution behavior
(Wise & Kong, 2005) and aggregated into a composite measure of a test-taker’s engagement
during a test event (Wise, 2006).
A lack of student motivation has been shown to reduce mean scores by more than a half
standard deviation (Wise & DeMars, 2005). Strategies for reducing this effect on a student’s
score include statistical score adjustments (Wang & Xu, 2015; Wise & DeMars, 2006) and effort
monitoring. Score adjustments take place after a test event has concluded, but effort monitoring
occurs during testing by intervening with messages to the student or prompts for a proctor to
encourage test-taking engagement. Messages to disengaged students have been shown to
positively affect student engagement and overall test performance (Kong, Wise, Harmes, &
Yang, 2006; Wise, Bhola, & Yang, 2006). Research with MAP Growth has also shown that
proctor notification improves test-taking engagement, test performance, and convergent validity
evidence (Wise, Kuhfeld, & Soland, in press).
NWEA provides engagement information on score reports and employs multiple strategies for
enhancing engagement, including student messages, test pauses, and proctor notification. The
work of Wise, Kuhfeld, and Soland (in press) demonstrates the benefit of these strategies.
8.3.2. Differential Item Functioning (DIF)
A fundamental assumption in the Rasch model is that the probability of a correct response to a
test item is a function of the item’s difficulty and the student’s ability. This function is expected to
remain invariant to other person characteristics such as gender and ethnicity. Therefore, if two
students with the same ability respond to the same item, they are assumed to have an equal
probability of answering the item correctly. To test this assumption, responses to items by
students sharing an aspect of a person characteristic (e.g., gender) are compared to responses
to the same items by other students who share a different aspect of the same characteristic
(e.g., males vs. females). The group representing students in a specific demographic group
(usually a minority group) is referred to as the focal group. The group comprised of students
from outside this group is referred to as the reference group.
When students with the same ability from two different groups of interest have different
probabilities of correctly answering an item, the item is said to exhibit DIF, a statistical
characteristic of an item that shows the extent to which the item might be measuring different
ability for different student subgroups. DIF indicates a violation of a major assumption of the
Rasch model, and it signals potential for a lack of fairness at the item level. The presence of DIF
in an item suggests that the item is functioning unexpectedly regarding the groups included in the
comparison. The cause of the unexpected functioning is not revealed in a DIF analysis. It may be
that item content is inadvertently providing an advantage or disadvantage to members of one of
the two groups. Content experts who have special knowledge of the groups involved are often in a
good position to identify a cause of this type. DIF may also result from differential instruction
closely associated with group membership.
The Mantel-Haenszel (MH) procedure (1959) is the most cited and studied method for detecting
DIF. It stratifies examinees by a composite test score, compares the item performance of
reference and focal group members in each strata, and then pools this comparison over all
strata. The MH procedure is easy to implement and is featured in most statistical software.
NWEA applied the MH method to assess DIF of the MAP Growth item pool in this report.