Examining the efficacy of generative artificial intelligence in item generation: comparative analysis of human-developed and AI-generated reading tests

Shin, Dongkwang; Kwon, Suh Keong; Lee, Yongsang

doi:10.1007/s10639-025-13683-6

상세 보기

Examining the efficacy of generative artificial intelligence in item generation: comparative analysis of human-developed and AI-generated reading tests

Shin, Dongkwang;
Kwon, Suh Keong;
Lee, Yongsang

Citations

WEB OF SCIENCE

1

Citations

SCOPUS

1

초록

This study examined the utility of questions created with artificial intelligence (AI) compared to those created by human item writers for English reading tests. Despite the rapid adoption of generative AI, such as ChatGPT, in educational settings, there is limited amount of empirical evidence comparing AI-generated and human-generated items based on actual test administration. Given the practical need for empirical validation of AI-generated test items, the present study analysed item difficulty and perceived quality using test data from high school students in South Korea. The findings revealed that AI-generated items were slightly more difficult due to occasional inappropriate language choices, although the difference was not statistically significant. Test-takers generally perceived AI-generated items as comparable to human-developed ones, indicating cautious optimism regarding the integration of generative AI into language test development and preparation.

키워드

ChatGPT; Automated item generation; Item difficulty; Test equation

제목: Examining the efficacy of generative artificial intelligence in item generation: comparative analysis of human-developed and AI-generated reading tests

저자: Shin, Dongkwang; Kwon, Suh Keong; Lee, Yongsang

DOI: 10.1007/s10639-025-13683-6

발행일: 2025-11

유형: Article

저널명: Education and Information Technologies

권: 30

호: 16

페이지: 23981 ~ 24007