-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Fix bad input and deployment container crash error in notebook tests #3609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The image-object-detection test is failing later at deployment step which will be taken care in a separate PR. But as far as bad input failures is concerned, this change has worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge post gates are successfull
@@ -85,6 +85,7 @@ jobs: | |||
source "${{ github.workspace }}/infra/bootstrapping/init_environment.sh"; | |||
bash "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh" generate_workspace_config "../../.azureml/config.json"; | |||
bash "${{ github.workspace }}/infra/bootstrapping/sdk_helpers.sh" replace_template_values "automl-image-object-detection-task-fridge-items.ipynb"; | |||
sed -i 's/max_trials=2/max_trials=3/g' automl-image-object-detection-task-fridge-items.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not working with 2 but with 3 max trials?
max_trials as per documentation mentions the following:
Parameter for maximum number of configurations to sweep. Must be an integer between 1 and 1000. When exploring just the default hyperparameters for a given model algorithm, set this parameter to 1. Default value is 1.
Trying to understand why?
Description
Couple of issues fixed in the PR
Image-object-detection notebook
test run is failing because of bad input.
image_object_detection_job = automl.image_object_detection(
compute=compute_name,
experiment_name=exp_name,
training_data=my_training_data_input,
validation_data=my_validation_data_input,
target_column_name="label",
primary_metric="mean_average_precision",
tags={"my_custom_tag": "My custom value"},
)
image_object_detection_job.set_limits(
max_trials=2,
max_concurrent_trials=2,
)
Issue: max trials used for running the sweep pipeline job uses max trials as 2, which is throwing "Error: Input request is invalid" as can be seen in the job
Updating to max trials to 3 fixes it.
Job
Failed build
Looking at the deployment logs, below is the error.
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Job
Fix: use GPU machine for deployment.
Checklist