-
-
Notifications
You must be signed in to change notification settings - Fork 65
Add check_bfmi function #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #500 +/- ##
==========================================
- Coverage 93.11% 91.97% -1.14%
==========================================
Files 12 12
Lines 3237 3214 -23
==========================================
- Hits 3014 2956 -58
- Misses 223 258 +35
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jsocolar! I made a few small suggestions in review comments, but this looks good. Besides those suggestions we just need a test in https://github.com/stan-dev/cmdstanr/blob/master/tests/testthat/test-utils.R.
We also need to decide how to use this function (but that doesn't necessarily have to be decided before merging this). Currently this is implemented in utils.R but then not used anywhere so there won't actually be any warnings thrown. One option is to add this check to all the places we do the divergence and treedepth checks (that probably makes sense). We could also combine the divergence, treedepth and ebfmi checks into a fit$diagnose()
method. Thoughts? Also tagging @rok-cesnovar to get his opinion.
For now, I think we should add it to the two places where other checks are used. In fact I just pushed a commit that does that. But then we need to do the fit$diagnose() yes, but that should be done once this is in. See my comment in the issue: #205 (comment) |
And thanks Jacob for working on this! |
@rok-cesnovar Any idea why tests are failing? Currently I don't think anything in this PR should affect the tests that used to be passing. I can look into it if it's mysterious to you too. |
One cause of failures is the edge case of iter_sampling = 1
I think one issue is var(x) in that case? This is a weird edge case and I am not sure what ebfmi should be in that case.
It will run after sampling in all tests (where validate_csv isnt manually set to FALSE). |
Good point. I don't think we should compute it for 1 iteration. |
Thanks a bunch guys, especially for spotting the iter_sampling = 1 issue (I was mystified about the checks!). I may not have time to take care of this tomorrow (Sunday), but Monday should be doable. |
No rush at all! Thanks! |
Just a heads up that I merged in master and fixed a conflict introduced by a different PR that was merged. |
Co-authored-by: Jonah Gabry <[email protected]>
This now passes checks (again) locally |
I like this one. |
I think it's fine to just not run any of the diagnostics if there are no iterations (we can't diverge or hit max treedepth either in that case). @rok-cesnovar? |
Agreed. No sampler diagnostics => no checks. |
R/utils.R
Outdated
) | ||
if (any(ebfmi < ebfmi_threshold)) { | ||
message(paste0(sum(ebfmi < ebfmi_threshold), " of ", length(ebfmi), " chains had energy-based Bayesian fraction ", | ||
"of missing information (E-BFMI) less than ", ebfmi_threshold, ", which may indicate poor exploration of the ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we last discussed this @martinmodrak made a suggestion in
https://discourse.mc-stan.org/t/issues-with-e-bfmi-missing-doc-confusing-name-etc/22553/17
that we just use the name E-BFMI and don't bother with the full "energy-based Bayesian fraction of missing information", which we all agree is confusing. I think ultimately Martin is probably right, but for this PR I'm ok with either leaving it as or changing it.
Agree it's mysterious. I have a theory about it that I'm looking into. Will report back shortly. |
Still mysterious. I thought perhaps it was due to the recently released R 4.1.0, but I just installed it and still can't reproduce the error locally. |
The tests should be fixed now. The issues were handling the check when no samples were output and a test that had hard coded order of the sampler diagnostics names (the order has now changed because we retrieve |
I want to work on the Changes:
This is ready for another look. No rush, I know its vacation season. I can work on the fit$diagnose() on top of this branch anyways now that its close to the finish line. |
Sorry for the delay on this. I just pushed a few minor edits and I think this is ready to go. But I'm wondering if we should wait to merge it until we have #505 so that we have a resource for users when they see these warnings. At the moment I think only a tiny subset of users will have any idea what to do when they see an E-BFMI warning. Not saying we should definitely wait, just wondering out loud. |
Definitely fine with waiting. If this becomes the last thing left for 1.0 then we can re-think. |
Going to leave this open for now, but we probably won't need to actually merge this since all of @jsocolar's commits are included in #585. Also, @jsocolar in the branch for PR #585 I'm going to add you to the DESCRIPTION file as a contributor. I realized you're not listed but I think we should list you, both for code contributions (ebfmi , future #565 code, probably other stuff you've done that I don't recall at the moment) and especially for contributing to discussions about the development of the package, which can be just as important as actually contributing code. (If you'd rather not be included let me know and I can undo it.) |
Submission Checklist
Summary
Added function
check_bfmi
, which takes as input the output ofsampler_diagnostics
, computes the estimated Bayesian fraction of missing information (E-BFMI) for each chain, and prints a message if any chain has E-BFMI less than 0.3. Note that this uses the threshold of 0.3, consistent with cmdstan's diagnose.cpp (see line 154), but different from the 0.2 threshold used in rstan'scheck_energy
function (see line 259).Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Jacob B. Socolar
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: