Classifying Reading-Level Difficulty from Text

See the full report here: Classifying Reading-Level Difficulty from Text

The goal of this month-long project was to explore the possibility of automatic assessment of reading level. Accurate automatic assessment of reading level could empower students to select more possible materials (recent novels, current event articles) while being confident the difficulty of the text remains in reach.

Working through various methods, hyperparameter configurations, and feature representations, my partner and I built one of the top-performing classification models in our machine learning class. Our final model leveraged Google’s BERT embeddings, alongside a grid search and neural network, to achieve impressive predictive power with limited training data.

Our dataset comes from research work by Jordan J. Bird, released in December 2024. His raw dataset, UK Key Stage Readability for English Texts is available on Kaggle under an MIT license.