Header menu link for other important links
A comparative study of various supervised feature selection methods for spam classification
Trivedi S K,
Published in Association for Computing Machinery
Pages: 1 - 6
Classification of the spam from bunch of the email files is a challenging research area in text mining domain. However, machine learning based approaches are widely experimented in the literature with enormous success. For excellent learning of the classifiers, few numbers of informative features are important. This researh presents a comparative study between various supervised feature selection methods such as Document Frequency (DF), Chi-Squared (χ2), Information Gain (IG), Gain Ratio (GR), Relief F (RF), and One R (OR). Two corpuses (Enron and SpamAssassin) are selected for this study where enron is main corpus and spamAssassin is used for validation of the results. Bayesian Classifier is taken to classify the given corpuses with the help of features selected by above feature selection techniques. Results of this study shows that RF is the excellent feature selection technique amongst other in terms of classification accuracy and false positive rate whereas DF and X2 were not so effective methods. Bayesian classifier has proven its worth in this study in terms of good performance accuracy and low false positives.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery
Open AccessNo