Data Mining Module

Translate original post with Google Translate

Project team: Sandro Babić, Jovana Brestovac, Matko Burul, Dejan Ćućić, Nikola Domazet, Erika Fafanđel, Bojan Filipović, Mateo Hrastnik, Vanja Jansky, Samir Jugo, Nikola Lacković, Teo Manojlović, Igor Opačak, Domagoj Pinčić, Domagoj Poljančić, Nataša Prodić, Matej Raguzin, Filip Stojanac, Ozren Šejić

System Description: Data mining module is intended as an addition to the BuCo Analyzer tool for a complete research in the Software Defect Prediction (SDP) area. It offers the user to import the data that was collected with BuCo Analyzer, prepare the data, build a model and statistically evaluate its performance in a range of metrics.

Data can be imported from a .csv file and the import is guided with a simple GUI menu. The choice of data preprocessing techniques is motivated by the issues that often affect SDP data, like data imbalance and a great number of features that can have negative influence on classification performance.

After the choice of a preprocessing technique and a classification algorithm for prediction of faulty software units, the results may be statistically analyzed. The analysis looks for the significant difference between groups of results from different data preprocessing or classification algorithms. The groups are submitted to a normality test. If all the groups are normally distributed a One-Way ANOVA is implemented. In the case of a non-normal distribution a Kruskal-Wallis analysis is implemented.

Development of data mining module was divided into several subprojects. The technical details of each are described in posters given below:

Data preprocessing offers:

data import from a .csv report
feature selection
over sampling and under sampling

Classification algorithms can be chosen from:

adaboost
logistic regression
naive Bayes
nearest neighbours
random forest
rotation forest
support vector machine

Performance evaluation can be examined in terms of:

accuracy
precision
F-measure
AUC

Data Mining Module

Language