Deep Learning in Automated Essay Scoring for Islamic Education: A Systematic Review
DOI:
https://doi.org/10.58524/oler.v5i2.753Keywords:
Assessment, Automated Essay Scoring, Deep Learning, Islamic Education, Systematic ReviewAbstract
Automated Essay Scoring (AES) is a computer-based scoring system that uses appropriate features to automatically assess or give feedback to students, by combining the power of Artificial Intelligence and natural language processing (NLP) to provide convenience and benefits for evaluators. This study aims to analyze the most effective algorithmic models in evaluating the accuracy and reliability of the Automated Essay Scoring (AES) system, especially in the context of Islamic religious education assessment, as well as examine its advantages and disadvantages in supporting objective and efficient learning evaluation. This study uses the Systematic Literature Review (SLR) approach by following the PRISMA protocol. A total of 31 relevant articles published in the period 2020 to 2025 from the Scopus and Springer databases were analyzed to evaluate the use and effectiveness of algorithms in the development of AES systems. The results show that transformer-based models, specifically BERT, are the most effective algorithms in current AES implementations. BERT excels because of its ability to understand bidirectional context and semantic depth in text. These models generate accurate scores and can provide automated feedback that is close to the quality of human judgment. However, the use of BERT requires large training data and high computing resources. While BERT demands substantial data and computing power, its application in Islamic education highlights the potential of AES to support more objective, consistent, and scalable assessment of students’ essaysReferences
Abosalem, Y. (2015). Assessment techniques and students’ higher-order thinking skills. ICSIT 2018 - 9th International Conference on Society and Information Technologies, Proceedings, 4(1), 61–66. https://doi.org/10.11648/j.ijsedu.20160401.11
Al Awaida, S. A., Al-Shargabi, B., & Al-Rousan, T. (2019). Automated Arabic essay grading system based on F-score and Arabic WordNet. Jordanian Journal of Computers and Information Technology, 5(3), 170–180. https://doi.org/10.5455/jjcit.71-1559909066
Alqahtani, A., & Alsaif, A. (2019). Automatic evaluation for Arabic essays: A rule-based system. IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 1–7. https://doi.org/10.1109/ISSPIT47144.2019.9001802
Amalia, A., Lydia, M. S., Kadir, R. A., Tanjung, F. A. U., Ginting, D. S. B., & Gunawan, D. (2024). Automated Indonesian essay scoring and holistic feedback using bidirectional encoder representations for transformers. 8th International Conference on Electrical, Telecommunication and Computer Engineering (ELTICOM), 96–101. https://doi.org/10.1109/ELTICOM64085.2024.10864959
Amorim, E., Cançado, M., & Veloso, A. (2018). Automated essay scoring in the presence of biased ratings. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 1, 229–237. https://doi.org/10.18653/v1/n18-1021
Azahar, M., & Ghauth, K. (2022). A hybrid automated essay scoring using NLP and random forest regression (pp. 448–457). Atlantis Press. https://doi.org/10.2991/978-94-6463-094-7_35
Bahroun, Z., Anane, C., Ahmed, V., & Zacca, A. (2023). Transforming education: A comprehensive review of generative artificial intelligence in educational settings through bibliometric and content analysis. Sustainability, 15(17), 12983. https://doi.org/10.3390/su151712983
Bansal, B., Gupta, J., Singh, M., Rani, R., Jaiswal, G., & Sharma, A. (2025). Automated essay scoring: A comparative study of machine learning and deep learning approaches. 5th International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 1–7. https://doi.org/10.1109/ICAECT63952.2025.10958994
Bernardin, H. J., Thomason, S., Buckley, M. R., & Kane, J. S. (2016). Rater rating-level bias and accuracy in performance appraisals: The impact of rater personality, performance management competence, and rater accountability. Human Resource Management, 55(2), 321–340. https://doi.org/10.1002/hrm.21676
Beseiso, M., & Alzahrani, S. (2020). An empirical analysis of BERT embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10), 204–210. https://doi.org/10.14569/IJACSA.2020.0111027
Beseiso, M., Alzubi, O. A., & Rashaideh, H. (2021). A novel automated essay scoring approach for reliable higher educational assessments. Journal of Computing in Higher Education, 33(3), 727–746. https://doi.org/10.1007/s12528-021-09283-1
Cao, Y., Jin, H., Wan, X., & Yu, Z. (2020). Domain-Adaptive Neural Automated Essay Scoring. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), pages 1011-1020. Association for Computing Machinery.
Catulay, J. J. J. E., Magsael, M. E., Ancheta, D. O., & Costales, J. A. (2021). Neural-network architecture approach: An automated essay scoring using Bayesian linear ridge regression algorithm. 8th International Conference on Soft Computing & Machine Intelligence (ISCMI), 196–200. https://doi.org/10.1109/ISCMI53840.2021.9654801
Chassab, R. H., Zakaria, L. Q., & Tiun, S. (2021). Automatic essay scoring: A review on the feature analysis techniques. International Journal of Advanced Computer Science and Applications, 12(10), 252–264. https://doi.org/10.14569/IJACSA.2021.0121028
Chavva, R. K. R., Muthyam, S. R., Seelam, M. S., & Nalliboina, N. (2024). A transformer-based approach for enhancing automated essay scoring. 1st International Conference on Advanced Computing and Emerging Technologies (ACET), 1–6. https://doi.org/10.1109/ACET61898.2024.10730000
Das, L. B., Raghu, C. V., Jagadanand, G., George, R. A. R., Yashasawi, P., Kumaran, N. A. A., & Patnaik, V. K. (2022). FACTOGRADE: Automated essay scoring system. IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), 42–48. https://doi.org/10.1109/IAICT55358.2022.9887447
Dascalu, M., Westera, W., Ruseti, S., Trausan-Matu, S., & Kurvers, H. (2017). ReaderBench learns Dutch: Building a comprehensive automated essay scoring system for Dutch language. Artificial Intelligence in Education: 18th International Conference, AIED 2017 (Vol. 10331, pp. 52–63). Springer. https://doi.org/10.1007/978-3-319-61425-0_5
Eang, C., & Lee, S. (2024). Improving the accuracy and effectiveness of text classification based on the integration of the BERT model and a recurrent neural network (RNN_BERT_Based). Applied Sciences, 14(18), 8388. https://doi.org/10.3390/app14188388
Faseeh, M., Jaleel, A., Iqbal, N., Ghani, A., Abdusalomov, A., Mehmood, A., & Cho, Y. I. (2024). Hybrid approach to automated essay scoring: Integrating deep learning embeddings with handcrafted linguistic features for improved accuracy. Mathematics, 12(21), 3416. https://doi.org/10.3390/math12213416
Fiacco, J., Adamson, D., & Rose, C. (2023). Towards extracting and understanding the implicit rubrics of transformer-based automatic essay scoring models. In E. Kochmar et al. (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) (pp. 232–241). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bea-1.20
Fiacco, J., Jiang, S., Adamson, D., & Rosé, C. (2022). Toward automatic discourse parsing of student writing motivated by neural interpretation. In E. Kochmar et al. (Eds.), Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) (pp. 204–215). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.bea-1.25
Gaheen, M. M., ElEraky, R. M., & Ewees, A. A. (2021). Automated students’ Arabic essay scoring using trained neural network by e-jaya optimization to support personalized instruction. Education and Information Technologies, 26(1), 1165–1181. https://doi.org/10.1007/s10639-020-10300-6
Geetha, M. P., & Renuka, D. K. (2021). Improving the performance of aspect-based sentiment analysis using fine-tuned BERT base uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/j.ijin.2021.06.005
Gillath, O., & Karantzas, G. (2019). Attachment security priming: A systematic review. Current Opinion in Psychology, 25, 86–95. https://doi.org/10.1016/j.copsyc.2018.03.001
Han, C. (2019). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Measurement: Interdisciplinary Research and Perspectives, 17(2), 113–116. https://doi.org/10.1080/15366367.2018.1516094
Hua, C., & Wind, S. A. (2019). Exploring the psychometric properties of the mind-map scoring rubric. Behaviormetrika, 46(1), 73–99. https://doi.org/10.1007/s41237-018-0062-z
Hussein, M. A., Hassan, H. A., & Nassef, M. (2020a). A trait-based deep learning automated essay scoring system with adaptive feedback. International Journal of Advanced Computer Science and Applications, 11(5), 287–293. https://doi.org/10.14569/IJACSA.2020.0110538
Hussein, M. A., Hassan, H., & Nassef, M. (2019a). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/peerj-cs.208
John Bernardin, H., Thomason, S., Buckley, M. R., & Kane, J. S. (2016). Rater rating-level bias and accuracy in performance appraisals: The impact of rater personality, performance management competence, and rater accountability. Human Resource Management, 55(2), 321–340. https://doi.org/10.1002/hrm.21678
Ke, Z., & Ng, V. (2019). Automated essay scoring: A survey of the state of the art. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019), 6300–6308. https://doi.org/10.24963/ijcai.2019/879
Kruse, O., Rapp, C., Anson, C. M., Benetos, K., Cotos, E., Devitt, A., & Shibani, A. (2023). Digital writing technologies in higher education: Theory, research, and practice. Springer. https://doi.org/10.1007/978-3-031-36033-6
Kusumaningrum, R., Kadarisman, K., Endah, S. N., Sasongko, P. S., Khadijah, K., Sutikno, S., Rismiyati, R., & Afriani, A. (2024). Automated essay scoring using convolutional neural network long short-term memory with mean of question-answer encoding. ICIC Express Letters, 18(8), 785–792. https://doi.org/10.24507/icicel.18.08.785
Lagakis, P., & Demetriadis, S. (2021). Automated essay scoring: A review of the field. International Conference on Computer, Information and Telecommunication Systems (CITS), 1–6. https://doi.org/10.1109/CITS52676.2021.9618476
Lim, C. T., Bong, C. H., Wong, W. S., & Lee, N. K. (2021). A comprehensive review of automated essay scoring (AES) research and development. Pertanika Journal of Science and Technology, 29(3), 1875–1899. https://doi.org/10.47836/pjst.29.3.27
Liu, O. L., Frankel, L., & Roohr, K. C. (2014). Assessing critical thinking in higher education: Current state and directions for next-generation assessment. ETS Research Report Series, 2014(1), 1–23. https://doi.org/10.1002/ets2.12009
Lonetti, F., Bertolino, A., & Di Giandomenico, F. (2023). Model-based security testing in IoT systems: A rapid review. Information and Software Technology, 164, 107326. https://doi.org/10.1016/j.infsof.2023.107326
Lu, C., & Cutumisu, M. (2021). Integrating deep learning into an automated feedback generation system for automated essay scoring. In Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021) (pp. 573–579). International Educational Data Mining Society.
Ludwig, S., Mayer, C., Hansen, C., Eilers, K., & Brandt, S. (2021). Automated essay scoring using transformer models. Psych, 3(4), 897–915. https://doi.org/10.3390/psych3040056
Machhout, R. A., & Zribi, C. B. O. (2024). Enhanced BERT approach to score Arabic essay’s relevance to the prompt. IBIMA Business Review, 2024. https://doi.org/10.5171/2024.176992
Mahmoud, S., Nabil, E., & Torki, M. (2024). Automatic scoring of Arabic essays: A parameter-efficient approach for grammatical assessment. IEEE Access, 12, 142555–142568. https://doi.org/10.1109/ACCESS.2024.3470728
Mayfield, E., & Black, A. W. (2020). Should you fine-tune BERT for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2020) (pp. 151–162). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.bea-1.15
Misgna, H., On, B. W., Lee, I., & Choi, G. S. (2025). A survey on deep learning-based automated essay scoring and feedback generation. Artificial Intelligence Review, 58(2), 11017. https://doi.org/10.1007/s10462-024-11017-5
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
Naqvi, B., Perova, K., Farooq, A., Makhdoom, I., Oyedeji, S., & Porras, J. (2023). Mitigation strategies against phishing attacks: A systematic literature review. Computers and Security, 132, 103387. https://doi.org/10.1016/j.cose.2023.103387
Nguyen, H. V., & Litman, D. J. (2018). Argument mining for improving the automated scoring of persuasive essays. 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), 5892–5899. https://doi.org/10.1609/aaai.v32i1.12046
Nie, Y. (2025). Automated essay scoring with SBERT embeddings and LSTM-attention networks. PeerJ Computer Science, 11, e2634. https://doi.org/10.7717/peerj-cs.2634
Ouyang, F., Wu, M., Zhang, L., Xu, W., Zheng, L., & Cukurova, M. (2023). Making strides towards AI-supported regulation of learning in collaborative knowledge construction. Computers in Human Behavior, 142, 107650. https://doi.org/10.1016/j.chb.2023.107650
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. International Journal of Surgery, 88, 105906. https://doi.org/10.1016/j.ijsu.2021.105906
Rahman, A. A., Ahmad, J., Yasin, R. M., & Hanafi, N. M. (2017). Investigating central tendency in competency assessment of design electronic circuit: Analysis using many facet Rasch measurement (MFRM). International Journal of Information and Education Technology, 7(7), 525–528. https://doi.org/10.18178/ijiet.2017.7.7.923
Ramesh, D., & Sanampudi, S. K. (2022). Automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55(3), 2495–2527. https://doi.org/10.1007/s10462-021-10068-2
Ridley, R., He, L., Dai, X., Huang, S., & Chen, J. (2020). Prompt-agnostic essay scorer: A domain generalization approach to cross-prompt automated essay scoring. arXiv preprint. http://arxiv.org/abs/2008.01441
Ridley, R., He, L., Dai, X. Y., Huang, S., & Chen, J. (2021). Automated cross-prompt scoring of essay traits. Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 15, 13745–13753. https://doi.org/10.1609/aaai.v35i15.17620
Rosen, Y., & Tager, M. (2014). Making student thinking visible through a concept map in computer-based assessment of critical thinking. Journal of Educational Computing Research, 50(2), 249–270. https://doi.org/10.2190/EC.50.2.f
Sevcikova, B. L. (2018). Human versus Automated Essay Scoring: A Critical Review. Arab World English Journal, 9(2), 157-174. https://doi.org/10.24093/awej/vol9no2.11
Shin, J., & Gierl, M. J. (2020). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing, 38(2), 247–272. https://doi.org/10.1177/0265532220937830
Song, W., Song, Z., Liu, L., & Fu, R. (2020). Hierarchical multi-task learning for organization evaluation of argumentative student essays. Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), 3875–3881. https://doi.org/10.24963/ijcai.2020/536
Tashu, T. M., Maurya, C. K., & Horvath, T. (2022). Deep learning architecture for automatic essay scoring. arXiv preprint. http://arxiv.org/abs/2206.08232
Uto, M., & Ueno, M. (2018). Empirical comparison of item response theory models with rater’s parameters. Heliyon, 4(5), e00622. https://doi.org/10.1016/j.heliyon.2018.e00622
Uto, M., Xie, Y., & Ueno, M. (2020). Neural automated essay scoring incorporating handcrafted features. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 6077–6088). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.535
Wang, Y., Wang, C., Li, R., & Lin, H. (2022). On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In M. Carpuat, M.-C. de Marneffe, & I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 3416–3425). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.249
Xie, J., Cai, K., Kong, L., Zhou, J., & Qu, W. (2022). Automated essay scoring via pairwise contrastive regression. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022) (pp. 2724–2733). Association for Computational Linguistics. https://aclanthology.org/2022.coling-1.240/
Yang, R., Cao, J., Wen, Z., Wu, Y., & He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1560–1569). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.141
Zawacki-Richter, O., & Jung, I. (2023). Handbook of open, distance and digital education. Springer. https://doi.org/10.1007/978-981-19-2080-6
Zupanc, K., & Bosnić, Z. (2018). Increasing accuracy of automated essay grading by grouping similar graders. ACM International Conference Proceeding Series. https://doi.org/10.1145/3227609.3227645
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Rokhmatul Khoiro Amin Putri, Kusaeri Kusaeri, Suparto Suparto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Online Learning in Educational Research is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with Online Learning in Educational Research agree to the following terms:
Copyright Retention: Authors retain the copyright of their work without any restrictions.
Publishing Rights: Authors retain the right to publish and distribute their work without any restrictions.
License Agreement: By publishing with Online Learning in Educational Research, authors agree that their work will be licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA). This license allows others to share and adapt the work, provided that appropriate credit is given, any changes are indicated, and the new creations are licensed under the same terms.
