-
Notifications
You must be signed in to change notification settings - Fork 1.2k
1904 Add early stop handler #1921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/black |
Signed-off-by: monai-bot <[email protected]>
Signed-off-by: Nic Ma <[email protected]>
Signed-off-by: Nic Ma <[email protected]>
/black |
Signed-off-by: Nic Ma <[email protected]>
/black |
Signed-off-by: Nic Ma <[email protected]>
/black |
/black |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Hi @vfdev-5 , Could you please help check the error message in the CI:
Do you have any experience with it? Thanks in advance. |
Hi @Nic-Ma , yes this looks very strange. Usually, this can happen if an old version is installed, e.g. <0.4 but checking the logs seems like it is using 0.4.4 version. |
Signed-off-by: Nic Ma <[email protected]>
Thanks for your check. Thanks. |
@Nic-Ma the issue is related to A minimal repro could be setup_logger, _ = optional_import("ignite.utils", "0.4.4", exact_version, "setup_logger")
logger = setup_logger("test")
they are rather different. EarlyStopping sets up a logger by default. By the way, do you use ignite's |
strange I couldn't replicate the issue locally: Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from monai.utils import exact_version, optional_import
>>> setup_logger, _ = optional_import("ignite.utils", "0.4.4", exact_version, "setup_logger")
>>> logger = setup_logger("test")
>>> logger.info("hello")
2021-04-02 10:50:54,156 test INFO: hello
>>> import ignite
>>> ignite.__version__
'0.4.4'
>>> |
@wyli thanks for testing it. Yes, from my side I couldn't repro it neither and even init of from monai.utils import exact_version, optional_import
Engine, _ = optional_import("ignite.engine", "0.4.4", exact_version, "Engine")
EarlyStopping, _ = optional_import("ignite.handlers", "0.4.4", exact_version, "EarlyStopping")
engine = Engine(lambda e, b: None)
handler = EarlyStopping(4, lambda e: 0.0, engine) This is also a bit unsurprising as May be related to the machine env on those |
Hi @vfdev-5 , Seems it's a random issue in the tests, should we import Thanks. |
@vfdev-5 , maybe Thanks. |
@Nic-Ma yes, I see that it fails on some of GPU tests and passes on others... Yes, ignite.distributed is intended to simplify things while working with distibuted configuration and GPUs (we also have GPU tests and haven't seen such problems). I wonder if the environment (e.g. |
I couldn't replicate the issue locally using exactly the same configure as |
I'm able to replicate this with some randomness: docker run --gpus all --rm -ti --ipc=host --net=host -v ~/Documents/MONAI:/opt/monai nvcr.io/nvidia/pytorch:20.09-py3
# within the docker container:
python -m pip install --upgrade pip wheel
python -m pip install -r requirements-dev.txt
BUILD_MONAI=0 ./runtests.sh --quick --unittests but I couldn't locate the issue further... some debug info:
|
@wyli i tried as well to repro locally. Results are flaky: on the 3rd run I could have this ignite.distributed issue. Will investigate further... There is also another reproducible issue with warp (if used with
I also remarked this exception (also present in the logs here): Starting test: test_dataloader (tests.test_thread_buffer.TestDataLoader)...
Exception in thread Thread-1327:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/__w/MONAI/MONAI/monai/data/thread_buffer.py", line 45, in enqueue_values
for src_val in self.src:
TypeError: 'super' object is not iterable |
right, |
@wyli I passed some time inspecting what happens and here are my conclusions: If we run The failure is related to MONAI/monai/config/deviceconfig.py Line 71 in d9e68c8
and Line 266 in d9e68c8
Phew, it is not ignite's failure :) Other remarks:
Otherwise, tests order is arbitrary. A fix could be:
- ${cmdPrefix}${cmd} ./tests/runner.py -p "test_((?!integration).)"
+ ${cmdPrefix}${cmd} -u ./tests/runner.py -p "test_((?!integration).)" Thanks |
@vfdev-5 really nice investigations, many thanks for looking into these. cc @Nic-Ma @rijobro (incidentally, I feel it might be helpful to have shuffled test cases if we assume they should work independently.) |
Thanks so much for your debug and investigation!! Thanks in advance. |
Hi @vfdev-5 I couldn't open the above link, could you please update? I'll look into it. Thanks! |
Hi @wyli
The link is on the full log of one of the previously failling jobs. The same thing is present in one of current jobs:
I understand your point on that, however it could be more complicated to reproduce the failure as it has happend here. PS: thanks a lot for the invitation to join github org ! |
* [DLMED] add EarlyStop handler Signed-off-by: Nic Ma <[email protected]> * [MONAI] python code formatting Signed-off-by: monai-bot <[email protected]> * [DLMED] enhance validation handler Signed-off-by: Nic Ma <[email protected]> * [DLMED] add set_trainer support Signed-off-by: Nic Ma <[email protected]> * [DLMED] add more check Signed-off-by: Nic Ma <[email protected]> * [DLMED] fix flake8 issue Signed-off-by: Nic Ma <[email protected]> * [DLMED] update according to comments Signed-off-by: Nic Ma <[email protected]> * [DLMED] fix flake8 issue Signed-off-by: Nic Ma <[email protected]> Co-authored-by: monai-bot <[email protected]> Co-authored-by: Wenqi Li <[email protected]> Signed-off-by: Neha Srivathsa <[email protected]>
Fixes #1904 .
Description
This PR added the
EarlyStopHandler
for workflows.Status
Ready
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests
.make html
command in thedocs/
folder.