Statistical Evaluation & Reporting Framework for Machine Learning Results

View the Project on GitHub cguckelsberger/statistical-evaluation-for-machine-learning

STATSREP-ML is an open-source solution for automating the process of eval- uating machine-learning results. It calculates qualitative statistics, performs the appropriate tests and reports them in a comprehensive way. It largely, but not exclusively, relies on well-tested and robust statistics implementations in R, and uses the tests the machine-learning community largely agreed upon.

- Straight-forward configuration, either programmatically or using an XML file.
- Support for sample input from k-fold cross-validation, repeated cross-validation on one or multiple datasets, and train-test splits.
- Support for two or >2 values of the independent variable, i.e. either classifiers or features. The appropriate tests for the number of groups are selected automatically.
- Support for sample sets with two independent variables, by automatically splitting the data along a predefined fixed independent variable value.
- Integration of both parametric and non-parametric omnibus- as well as post-hoc tests that are commonly used for comparing machine learning results.
- Integration of specific parametric and non-parametric tests to compare multiple models against a baseline, and support for input data annotations to indicate the baseline model.
- Automatic p-value correction and integration of several more and less conservative techniques for p-value adjustment.
- Testing of parametric test assumptions such as normality and sphericity, allowing an easy application of these algorithms.
- Generation of both a plain-text and a better structured \LaTeX\ report, comprising sample tables, qualitative statistics, basic graphs and the evaluation results.

- Parametric, omnibus: dependent T-test, repeated Measure One-Way ANOVA
- Parametric, post-hoc: Dunett test, Tukey HSD test
- Non-parametric, omnibus: Wilcoxon Signed-Rank test, Friedman's test, McNemar test
- Non-parametric, post-hoc: Nemenyi test, pairwise Wilcoxon Signed-Rank test
- P-value correction methods: Bonferroni, Hochberg, Holm, Hommel, Benjamini-Hochberg, Benjamini-Yekuteli

Additional tests can be easily integrated by means of calling the corresponding R packages or by implementing them natively in Java (See Wiki!).

Please have a look at our Wiki for a quick introduction to get you started, and more information on setting up and extending STATSREP-ML.

If you use STATSREP-ML in research, please cite the following paper (Download):

Christian Guckelsberger, Axel Schulz (2014). STATSREP-ML: Statistical Evaluation & Reporting Framework for Machine Learning Results. Technical Report. Published by tuprints [http://tuprints.ulb.tu-darmstadt.de/id/eprint/4294].

While most STATSREP-ML modules are available under the Apache Software License (ASL) version 2, there are a few modules that depend on external libraries and are thus licensed under the GPL. The license of each individual module is specified in its LICENSE file.

It must be pointed out that while the component's source code itself is licensed under the ASL or GPL, individual components might make use of third-party libraries or products that are not licensed under the ASL or GPL. Please make sure that you are aware of the third party licenses and respect them.

This project was initiated under the auspices of Prof. Dr. Max Mühlhäuser, Telecooperation Lab (TK), Technische Universität Darmstadt.