The MONKEY challenge: Machine-learning for Optimal detection of iNflammatory cells in KidnEY transplant biopsies
This repository contains all tutorials and code in connection with the MONKEY challenge run on Grand Challenge
The folder tutorials
contains the code to get started with the MONKEY challenge.
The Jupyter notebooks show you how to preprocess the data, train a model and run inference.
The folder docker
contains the code to create the inference docker image that can be submitted to the MONKEY challenge.
test_run.sh
lets you test the docker image locally. After that you can save it with save.sh
and submit it to the MONKEY challenge.
If you want to see the log files of your submission, you have to submit to the Debugging Phase of the challenge.
The folder evaluation
contains the code to evaluate the results of the MONKEY challenge. The exact same script is used for the leaderboard evaluation computation.
-
Put the ground truth json files in the folder
evaluation/ground_truth/
with the file name formatcase-id_inflammatory-cells.json
,case-id_lympocytes.json
andcase-id_monocytes.json
for the respective cell types. These files are provided along with thexml
files for the ground truth annotations (how to access the data). -
Put the output of your algorithm in the folder
evaluation/test/
in a separate folder for each case with a subfolderoutput
i.e.case-id/output/
as folder names. In each of these folders, put the json files with the detection output and the name formatdetected-inflammatory-cells.json
,detected-lympocytes.json
anddetected-monocytes.json
for the respective cell types. Additionally, you will need to provide the json fileevaluation/test/output/predictions.json
, which helps to distribute the jobs. -
Run
evaluation.py
. The script will compute the evaluation metrics for each case as well as overall and save them toevaluation/test/output/metrics.json
. It will create an additional output filemonkey-evaluation-details.json
for the more extensive metrics list (i.e. per slide and to plot the FROC curve).
The evaluation script will compute the following metrics:
- FROC score: This is derived from the FROC curve by calculating the sensitivity at five pre-selected values of FP/mm²: [10, 20, 50, 100, 200, 300].
- Presicion@threshold: The precision at the threshold of 0.4 and 0.9 (prediction probability of a point).
- Recall@threshold: The recall at the threshold of 0.4 and 0.9 (prediction probability of a point).
The examples provided are the three files that are used for the evaluation phase of the debugging phase.
Note: The evaluation script, along with the leaderboards was updated on 23.5.25 to include precision and recall as metrics.
The output is now also saved in two different files, metrics.json
for the leaderboard metrics and monkey-evaluation-details.json
for the more extensive metrics list (i.e. per slide and to plot the FROC curve).
The previous evaluation script is renamed to evaluation_old.py
.
.
├── ground_truth/
│ ├── A_P000001_inflammatory-cells.json
│ ├── A_P000001_lymphocytes.json
│ ├── A_P000001_monocytes.json
│ └── (...)
└── test/
├── input/
│ ├── A_P000001/
│ │ └── output/
│ │ ├── detected-inflammatory-cells.json
│ │ ├── detected-lymphocytes.json
│ │ └── detected-monocytes.json
│ ├── A_P000002/
│ │ └── (...)
│ └── (...)
├── predictions.json
└── output/
└── metrics.json
└── monkey-evaluation-details.json
The folder utils
contains other useful functions
json_to_xml.py
: Script that converts the output json files from grand-challenge back to xml files compatible with ASAP There is also an optionalprob_cutoff
argument, that lets you filter out annotations with a threshold. Helpful for visualising your results in ASAP.plot_froc.py
: Script that plots the FROC curve from the metrics.json file generated by the evaluation script (this is also available for download on grand-challenge for the validation set cases).