SubROC is a machine learning model evaluation method based on Subgroup Discovery in the Exceptional Model Mining framework, aimed at providing interpretable descriptions of strengths and weaknesses for a given model on structured data.
This is the code repository accompanying the paper "SubROC: AUC-Based Discovery of Exceptional Subgroup Performance for Binary Classifiers". This code is intended to enable the reproduction of our results. For use of our method we plan on integrating the implementation into the pysubgroup library for subgroup discovery.
To quickly reproduce one of the experiments, perform the following steps.
- Ensure you are working on a Linux system with Docker installed and set up in rootless mode.
- Clone this repository and
cd
into its root directory. - Run the following two commands in your terminal, replacing
<experiment-name>
with the name of the experiment you would like to run.
bash experiments/run/build.sh
bash experiments/run/start.sh <experiment-name>
The output will appear in ./experiments/outputs/<experiment-name>
.
You can start multiple experiments in parallel by repeatedly issuing the second command with different experiment names.
See below for a legend of the experiment names and script arguments, such as for limiting computational resources of an experiment run.
The code execution is organized into experiments. Experiments run code inside a docker container as specified by an experiment definition. An experiment consists of stages which are executed sequentially. Stages are made up of steps which are executed in parallel. Steps are defined through bash scripts.
exp-27: skew plots on synthetic data (Figure 3)
exp-34: Adult case study (supp. Tables 6-11)
exp-33, exp-68 - exp-82, exp-101 - exp-116: measuring optimistic estimate runtime speedups (Table 2, supp. Table 12 and Tables 18-21, supp. Figure 8; when changing the depth to 3 (set PARAM_DEPTH: 3
) also supp. Tables 13-17, supp. Figure 7)
exp-44 - exp-51: result set properties (Table 1, supp. Tables 4-5)
exp-84: subgroup injection (Figure 2, supp. Figures 5-6)
Notes:
- Experiments using the Census KDD dataset (48, 71, 79, 105, 113) require a large amount of memory (about 170GB per parallel run, of which there are up to 12).
- We changed the search algorithm in exp-48 to BestFirstSearch to improve memory consumption. Results might therefore be slightly different to those in the paper. For exact reproduction, change this back to Apriori in
experiments/definitions/exp-48/stage-03/base_config.yaml
.
- We changed the search algorithm in exp-48 to BestFirstSearch to improve memory consumption. Results might therefore be slightly different to those in the paper. For exact reproduction, change this back to Apriori in
- Sometimes the Adult dataset fails to load because OpenML is temporarily unavailable. Try again later in that case.
When results are present in experiments/outputs
, the notebooks in experiments/meta_reports
may be used to combine them into overview tables similar to those in the paper.
- Linux system (with terminal access)
- Docker (rootless)
- recommended: tmux (Start with
experiments/tmux_init.sh
, then run experiments inside the tmux session and use theresource-monitor
window to observe the resource consumption.)
The experiments are intended to run in Docker rootless mode. This way, output files on the host are owned by your host-user while the user inside the container can be root without implying root access on the host system.
An experiment can be started by the following two commands, which leaves all options at their default except for the experiment name. The first command runs a script that builds the Docker image that serves as the environment for running the experiments in. This took 224 seconds for us. The second command runs a script that starts the execution of the specified experiment using the previously built Docker image. This took anywhere between 3h (exp-27) and 33h (exp-48) for us, depending on the experiment. In case the image is already available to docker, the second command suffices.
bash experiments/run/build.sh
bash experiments/run/start.sh <experiment-name>
Another way to make the image available to docker is to load an image from a file, if you have one available. The full process then looks like this.
docker image load -i image.tar
bash experiments/run/start.sh <experiment-name>
The full sets of available arguments to the build and start scripts are defined as follows.
bash experiments/run/build.sh \
-d <definitions-directory-path> \
-b <data-directory-path> \
-p <code-directory-path> \
bash experiments/run/start.sh \
-n <experiment-name> \
-a <host-definitions-path> \
-o <outputs-directory-path> \
-m <container-memory-limit> \
-w <container-swap-memory-limit> \
-c <container-cpu-limit> \
-s <start-stage-index> \
-e <num-stages-to-run>
All arguments except -n <experiment_name>
are optional. The default values are:
-n <experiment_name>
:$1
(if only the name is given as an unnamed argument)-d <container-definitions-path>
:$(pwd)/experiments/definitions
-b <data-directory-path>
:$(pwd)/experiments/data
-p <code-directory-path>
:$(pwd)/code
-a <cache-directory-path>
:$(pwd)/experiments/cache
-o <container-outputs-path>
:$(pwd)/experiments/outputs
-m <container-memory-limit>
: (empty)-w <container-swap-memory-limit
: (empty)-c <container-cpu-limit>
: (empty)-s <start-stage-index>
:0
-e <num-stages-to-run>
:999
Empty values of the resource limits mean that the corresponding resource limiting option in the docker run
command will be omitted entirely.
<num-stages-to-run>
stages are run, starting with the stage at <start-stage-index>
.
This project is licensed under the Apache 2.0 License.