Download PDFOpen PDF in browser

Classification of Hindi News Articles Using Machine Learning Models with Challenges and Solutions

EasyChair Preprint 15739

12 pagesDate: January 20, 2025

Abstract

In today's modern digitized world, large amount of Hindi text documents are generated and shared through many sectors, including public organizations, news portals, government webpages, and commercial sectors. These news documents need to be classified into distinct classes such as business, health, science, politics, and sports. Text classification is essential due to the overwhelming amount of unorganized data that exists. Hindi news agencies still rely on manual sorting due to the lack of a dedicated Hindi text classifier. While English text classification is well-established and has ample resources, Indian languages, particularly Hindi, lack standardized benchmarks. Hindi, one of the most popular and used languages in the world, faces challenges in text processing. Despite the progress made in text summarization, keyword extraction,  and information retrieval, the creation of classifiers for dividing Hindi news articles into predefined categories is still lacking in several areas. This paper addresses this gap by preprocessing a collection of standard Hindi news articles at various levels—word, sentence, paragraph, and document. The paper also explores feature extraction techniques and applies machine learning classifiers to categorize the articles. Classifying Hindi news articles presents unique difficulties due to the language's intricate letter combinations, conjuncts, sentence structures, and multi-sense words.

Keyphrases: NLP, machine learning, text classifier

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15739,
  author    = {Subhashini Spurjeon Kashi and Ani Thomas},
  title     = {Classification of Hindi News Articles Using Machine Learning Models with Challenges and Solutions},
  howpublished = {EasyChair Preprint 15739},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser