Evaluating item fit statistic thresholds in PISA

Periodical
Educational Measurement: Issues and Practice
Year
2020
Relates to study/studies
PISA 2015
PISA 2018

Evaluating item fit statistic thresholds in PISA

Analysis of cross‐country comparability of cognitive items

Abstract

In Programme for International Student Assessment (PISA), item response theory (IRT) scaling is used to examine the psychometric properties of items and scales and to provide comparable test scores across participating countries and over time. To balance the comparability of IRT item parameter estimations across countries with the best possible model fit, a partial invariance approach is used in PISA. In this approach, international or common item parameters are estimated for the majority of items, while unique or country‐specific item parameters are allowed for item‐country combinations where a misfit to the common parameters can be identified. The goal of the current study is to establish item fit statistic thresholds for identifying such misfits. We investigated the impact of various thresholds on scale and score estimation. To evaluate the impact of various item fit thresholds, we systematically examined the number of unique item parameters and country performance distributions and compared the overall model fit statistics using data from PISA 2015 and 2018. Results showed that RMSD = .10 provides the best fitting model while still establishing stable parameter estimations and sufficient comparability across groups. The applications and implications of the results are discussed.