Intelligent Thematic Classification of Books Based on Summary Analysis and Supervised Learning

Ghadimi Nik, Maryam; Hajimohammadi, Heliehsadat

doi:10.22034/jkrs.2026.69800.1201

Articles in Press

Document Type : Original Article

Authors

10.22034/jkrs.2026.69800.1201

Abstract

Objective:
This research aims to develop a data-driven model to automatically identify the subject of a book using machine learning algorithms and text mining techniques. The study utilizes a structured dataset containing book titles, summaries, and subjects. The ultimate goal is to build a model that is technically accurate and practical for real-world applications like book recommendation engines and digital reading platforms.

Method:
Data were collected through web scraping and crawling from reliable sources including Goodreads, Ketabrah, and Fidibo. The raw data went through preprocessing steps including removal of special characters, morphological stemming, and stop-word elimination. Feature extraction was performed on the cleaned summaries using the Tf-idf statistic. Various statistical models, such as Logistic Regression and Support Vector Machines (SVM), were applied to discover hidden relationships between the book’s subject and its summary.

Findings:
Using an 80-20% split for training and testing, Logistic Regression, Linear SVM, and RBF SVM achieved accuracies of 80%, 79%, and 79.7%, respectively. With a 90-10% split, the accuracies were 82.2%, 78.6%, and 79.3%, respectively.

Conclusion:
Results indicate that Logistic Regression provides the best prediction accuracy. Its fast training and prediction times make it a suitable choice for textual analysis and multi-class classification tasks related to book subject identification.

Keywords

Main Subjects

software and Hardware Dimensions of Data, Information and Knowledge Studies

Journal of Knowledge-Research Studies

Intelligent Thematic Classification of Books Based on Summary Analysis and Supervised Learning

Articles in Press, Accepted Manuscript
Available Online from 09 April 2026

Intelligent Thematic Classification of Books Based on Summary Analysis and Supervised Learning

Articles in Press, Accepted Manuscript Available Online from 09 April 2026

Articles in Press, Accepted Manuscript
Available Online from 09 April 2026