An empirical evaluation of text classification and feature selection methods

Muazzam Ahmed Siddiqui

doi:10.5430/air.v5n2p70

An empirical evaluation of text classification and feature selection methods

Muazzam Ahmed Siddiqui

Abstract

An extensive empirical evaluation of classifiers and feature selection methods for text categorization is presented. More than 500 models were trained and tested using different combinations of corpora, term weighting schemes, number of features, feature selection methods and classifiers. The performance measures used were micro-averaged F measure and classifier training time. The experiments used five benchmark corpora, three term weighting schemes, three feature selection methods and four classifiers. Results indicated only slight performance improvement with all the features over only 20% features selected using Information Gain and Chi Square. More importantly, this performance improvement was not deemed statistically significant. Support Vector Machine with linear kernel reigned supreme for text categorization tasks producing highest F measures and low training times even in the presence of high class skew. We found statistically significant difference between the performance of Support Vector Machine and other classifiers on text categorization problems.

Full Text:

PDF

DOI: https://doi.org/10.5430/air.v5n2p70

Refbacks

There are currently no refbacks.

Artificial Intelligence Research

ISSN 1927-6974 (Print) ISSN 1927-6982 (Online)

Copyright © Sciedu Press
To make sure that you can receive messages from us, please add the 'Sciedupress.com' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.

Artificial Intelligence Research
International Peer-reviewed and Open Access Journal for the Artificial Intelligence Specialists

Honorary Editor-in-chief

An empirical evaluation of text classification and feature selection methods

Abstract

Full Text:

Refbacks

Username
Password
Remember me

Artificial Intelligence Research International Peer-reviewed and Open Access Journal for the Artificial Intelligence Specialists

Honorary Editor-in-chief

An empirical evaluation of text classification and feature selection methods

Abstract

Full Text:

Refbacks

Artificial Intelligence Research
International Peer-reviewed and Open Access Journal for the Artificial Intelligence Specialists