This Research presents the effects of interaction between various Kernel functions and different Feature Selection Techniques for improving the learning capability of Support Vector Machine (SVM) in detecting email spams. The interaction of four Kernel functions of SVM i.e. “Normalised Polynomial Kernel (NP)”, “Polynomial Kernel (PK)”, “Radial Basis Function Kernel (RBF)”, and “Pearson VII Function-Based Universal Kernel (PUK)” with three feature selection techniques i.e. “Gain Ratio (GR)”, “Chi-Squared (2), and “Latent Semantic Indexing (LSI)” have been tested on the “Enron Email Data Set”. The results reveal some interesting facts regarding the variation of the performance of Kernel functions with the number of features (or dimensions) in the data. NP performs the best across a wide range of dimensionality, for all the feature selection techniques tested. PUK kernel works well with low dimensional data and is the second best in performance (after NP), but shows poor performance for high dimensional data. Latent Semantic Indexing (LSI) appears to be the best amongst all the tested feature selection techniques. However, for high dimensional data, all the feature selection techniques perform almost equally well.
|Journal||International Journal of Computer Applications|
|Publisher||Foundation of Computer Science (FCS)|