Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost)

Research output: Contribution to journalArticle

Abstract

Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.

Original languageEnglish (US)
JournalEducational and Psychological Measurement
DOIs
StatePublished - Jan 1 2019

Fingerprint

Boosting
Reaction Time
Fraud
Extremes
fraud
Gradient
Testing
Response Time
Categorical or nominal
learning method
performance
Learning systems
statistics
Research Personnel
Statistics
Characteristic Curve
Operating Characteristics
False Positive
human being
Machine Learning

Keywords

  • extreme gradient boosting
  • item compromise
  • item preknowledge
  • machine learning
  • test security
  • XGBoost

ASJC Scopus subject areas

  • Education
  • Developmental and Educational Psychology
  • Applied Psychology
  • Applied Mathematics

Cite this

@article{984ba0c34090478db89aabed1922ff4e,
title = "Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost)",
abstract = "Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.",
keywords = "extreme gradient boosting, item compromise, item preknowledge, machine learning, test security, XGBoost",
author = "Cengiz Zopluoglu",
year = "2019",
month = "1",
day = "1",
doi = "10.1177/0013164419839439",
language = "English (US)",
journal = "Educational and Psychological Measurement",
issn = "0013-1644",
publisher = "SAGE Publications Inc.",

}

TY - JOUR

T1 - Detecting Examinees With Item Preknowledge in Large-Scale Testing Using Extreme Gradient Boosting (XGBoost)

AU - Zopluoglu, Cengiz

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.

AB - Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.

KW - extreme gradient boosting

KW - item compromise

KW - item preknowledge

KW - machine learning

KW - test security

KW - XGBoost

UR - http://www.scopus.com/inward/record.url?scp=85063932871&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063932871&partnerID=8YFLogxK

U2 - 10.1177/0013164419839439

DO - 10.1177/0013164419839439

M3 - Article

AN - SCOPUS:85063932871

JO - Educational and Psychological Measurement

JF - Educational and Psychological Measurement

SN - 0013-1644

ER -