Download PDFOpen PDF in browser

A Framework for the Classification and Exploration of Semi-Structured Data

EasyChair Preprint no. 14186

26 pagesDate: July 26, 2024

Abstract

In recent years, advances in natural language processing (NLP) have broadened the traditional boundaries of proficiency in artificial intelligence considerably, particularly with respect to tasks demanding high levels of cognitive sophistication. While the evolution of NLP has been impressive, a significant gap remains in the efficacy of NLP with respect to the classification and exploration of semi-structured data. These data comprise a blend of structured and unstructured features, and hold considerable potential, especially when traditional NLP classification tasks are complemented with structured meta-data. A generic framework, called the Classification and Exploration of Semi-structured Data (CESD) framework, is proposed in this article for enhancing the efficacy of classification tasks based on unstructured data when including insights gleaned from accompanying structured data. The versatility of the framework empowers users to modify it or to add framework components, as per their specific requirements, with the overarching goal of equipping users with a holistic understanding of the best-performing classification models and to elucidate the inherent characteristics of semi-structured input data sets based on their structured and unstructured features. A computerised instantiation of the CESD framework is validated by applying it to a real-world data set. The case study data pertains to the severity of software error logs in a production environment, and comprises both structured and unstructured data describing these errors.

Keyphrases: Clustering, NLP classification, semi-structured data

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:14186,
  author = {Louis Burger and Jan van Vuuren},
  title = {A Framework for the Classification and Exploration of Semi-Structured Data},
  howpublished = {EasyChair Preprint no. 14186},

  year = {EasyChair, 2024}}
Download PDFOpen PDF in browser