Impact of Feature Selection on Classifier Testing Validity

Inese Poļaka; Arkādijs Borisovs

Impact of Feature Selection on Classifier Testing Validity

Proceedings of the 17th International Conference on Soft Computing MENDEL 2011
Inese Poļaka, Arkādijs Borisovs

This paper studies the influence of feature selection (pre-processing stage in data mining) on classifier testing, in particular, when data mining techniques are applied in bioinformatics (classification and pattern recognition task using antibody display data in this case). The study experimentally evaluates classifier testing validity if the data set used in testing has already been used in feature selection in pre-processing because of the possible classification model corruption and adaptation to test data. The experiments employ ten feature selection methods – four subset selection methods (correlation-based, consistency evaluator and two types of wrappers) and six feature ranking methods (Chi-square statistic, Gain Ratio, Information Gain, OneR, ReliefF and SVM), and evaluates four classifiers (C4.5, Random forest, SVM and Naive Bayes) using data sets that were used in feature selection and absolutely independent test sets.

Atslēgas vārdi
feature selection, classification, classification result validation, classifier evaluation, data mining, anti-body display

Poļaka, I., Borisovs, A. Impact of Feature Selection on Classifier Testing Validity. No: Proceedings of the 17th International Conference on Soft Computing MENDEL, Čehija, Brno, 15.-17. jūnijs, 2011. Brno: Nosova Hana, 2011, 411.-418.lpp. ISBN 9788021443020.

Publikācijas valoda
English (en)

Publikācijas veids
Publikācijas konferenču materiālos, kas ir indeksēti Web of Science un/vai SCOPUS
Pamatdarbībai piesaistītais finansējums
Nav zināms
Pētniecības nozare
2. Inženierzinātnes un tehnoloģijas
Pētniecības apakšnozare
2.2. Elektrotehnika, elektronika, informācijas un komunikāciju tehnoloģijas
ID: 10343