Track model logp during sampling #3121

eigenfoo · 2018-07-27T23:16:44Z

Closes #2971.

This PR adds the models likelihood to the tracked variables during sampling. The guys over at Stan explain why this is desirable.

Currently, the SMC sampler adds the model logp manually, but it appears that none of the other samplers do. I moved the logic into the BaseTrace class, so that we don't need to repeat this code in all our samplers.

Still a WIP, will probably need a lot more work.

eigenfoo · 2018-07-28T02:28:34Z

Hm, this caused a lot more tests to fail that I was expecting, I'm actually kind of bewildered. Will take another stab sometime soon.

aloctavodia · 2018-07-28T14:22:50Z

Hi @eigenfoo thanks for taking the time to work on this.

I am writing on a new version of SMC (pretty big refactoring), and hence I suggest to not spend time making changes to smc.py because (I hope) to change that code soon. Probably it will be a better idea to focus on the diagnostic for model logp. What do you think @junpenglao?

junpenglao · 2018-07-28T18:31:07Z

In that case, I would suggest dont log the logp during sampling, but computed it post-sampling. This should also break much less codes.h

eigenfoo · 2018-07-29T01:42:56Z

Can I make a different suggestion? I still think it's more elegant to add the model likelihood as a pm.Deterministic, and keep track of it in the trace during sampling. After all, it's not different from any other deterministic variable, and computing it post-sampling sounds like a quick fix that incurs technical debt.

Instead, could we not add some logic to pm.sample? Right before we sample, we check to see if the model already includes a model likelihood. If not, we add it. We can even add a compute_model_logp flag (that defaults to True) so that the user can specify to not add this variable, if necessary.

@junpenglao @aloctavodia does this sound reasonable to you?

eigenfoo · 2018-07-29T01:55:41Z

In other words, something along the lines of the PR above ^

eigenfoo · 2018-07-29T03:24:45Z

It looks like it passes most tests, and the failed tests look like easy fixes. I'll invest more time into that PR if you guys think this is a good idea.

junpenglao · 2018-07-29T14:41:24Z

Hmm, if you are going to log it during sampling, treating it as a Deterministic might be suboptimal, as you are computing the logp twice - once in the sampler for MC update, once in the deterministic. If the logp is expensive to compute, the overhead could be significant. If we were to do this properly maybe it is better to treat this as a sampler stats and log it there. I will imagine some difficult for compound step, thoughts?

eigenfoo · 2018-07-30T02:05:26Z

@junpenglao hm, I see how this might be inefficient. I agree that adding the model logp to the trace while sampling (as a sampling stat) would be better.

I guess I don't have a good enough understanding of PyMC3 internals to understand:

where sampler stats logging takes place, or what the best way to track model logp is. My intuition says the step_methods directory should have the code, but I can't find anything there. Could you point me to something?
a) why compound steps would make this logging any harder. Is there any good resource for me to familiarize myself about compound step sampling?
b) Does no other sampler stat become harder when using compound steps? If the model logp is really the only sampler stat that would be tricky for compound steps, I would agree with you that computing it post-sampling would be a better choice. Per the zen:

Special cases aren't special enough to break the rules.
Although practicality beats purity.

In the meantime, I'll close the previous PR.

ColCarroll · 2018-07-30T07:58:32Z

Late to discussion, and I think being able to access the logp would be great, but also want to mention that there are a few utility functions that would be effected by adding a deterministic variable (get_default_varnames, at least), and a bunch of places that don't use that code that might start picking up a deterministic variable.

It sounds like there's a good different way forward, but I wanted to mention this reason too!

ColCarroll · 2018-07-30T08:07:17Z

A good place to start looking at sampler statistics is how they are implemented in Hamiltonian Monte Carlo. You can then trace that back through

the step method base classes to see how stats are emitted if they are supported, then
the sampling code which accepts the stats and sends them to the backend if the backend knows how to handle them, and then
NDArray, which knows how to handle all the statistics.

It is a bunch of steps, but trying to add a sampler stat to NUTS or HMC will throw errors until it doesn't, and then it will make more sense 😄 .

junpenglao · 2018-07-30T08:32:33Z

CompoundStep might be challenging because logp was evaluated multiple times in each Gibbs step, but we only want the last one where logp is evaluated on the final accepted point (not the intermediate ones).
Maybe something like this, looking at
https://github.com/pymc-devs/pymc3/blob/452d5e2eaa74dcdc8fecec95095c2630d2c44ee1/pymc3/step_methods/compound.py#L21-L34

Add logp to sampler stats, now all sampler have stats method

Maybe remove the self.generates_stats? since now every sampler has stats properties (need to also make sure if user are writing custom step method without logging the logp, we need to add it for them)

At the end of the for-loop, log the final logp:

for method in self.methods:
    point, state = method.step(point) 
    states.extend(state)
states['model_logp'].extend(states[-1]['logp'])

eigenfoo · 2018-07-30T21:23:00Z

I see, thanks for all the help! I'll read through and whip up another PR. In the meantime, I'll close this PR.

I also just realized that computing the model logp post-sampling is similarly inefficient: the logp would be computed for MC update, and then recomputed again after sampling is finished. So it looks like tracking it as a sampler stat is the only logical choice here!

Unfortunately I'm a bit tied up this week, but I'll have plenty of time to tinker with this on my upcoming (interminable) flight 😄

eigenfoo added 3 commits July 27, 2018 19:10

Initial pass at tracking model logp

145dd22

Import pymc3 for logging

c7eb7e2

SMC requires logp__ deterministic variable

213121f

junpenglao requested a review from aloctavodia July 28, 2018 06:44

eigenfoo mentioned this pull request Jul 29, 2018

Add model likelihood to RVs before sampling #3125

Closed

eigenfoo closed this Jul 30, 2018

eigenfoo deleted the track_model_logp branch July 30, 2018 21:23

eigenfoo mentioned this pull request Aug 6, 2018

Track model logp (in HMC and NUTS) #3134

Merged

eigenfoo mentioned this pull request Sep 13, 2018

Track model_logp in other samplers #3188

Closed

kirangauthier mentioned this pull request May 28, 2020

Tracking logp using step=pm.SMC() in 3.7 and pm.sample_smc in 3.8 #3937

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Track model logp during sampling #3121

Track model logp during sampling #3121

Uh oh!

eigenfoo commented Jul 27, 2018

Uh oh!

eigenfoo commented Jul 28, 2018

Uh oh!

aloctavodia commented Jul 28, 2018

Uh oh!

junpenglao commented Jul 28, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

junpenglao commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 30, 2018 •

edited

Loading

Uh oh!

ColCarroll commented Jul 30, 2018

Uh oh!

ColCarroll commented Jul 30, 2018

Uh oh!

junpenglao commented Jul 30, 2018 •

edited

Loading

Uh oh!

eigenfoo commented Jul 30, 2018

Uh oh!

Uh oh!

Uh oh!

Track model logp during sampling #3121

Track model logp during sampling #3121

Uh oh!

Conversation

eigenfoo commented Jul 27, 2018

Uh oh!

eigenfoo commented Jul 28, 2018

Uh oh!

aloctavodia commented Jul 28, 2018

Uh oh!

junpenglao commented Jul 28, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 29, 2018

Uh oh!

junpenglao commented Jul 29, 2018

Uh oh!

eigenfoo commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ColCarroll commented Jul 30, 2018

Uh oh!

ColCarroll commented Jul 30, 2018

Uh oh!

junpenglao commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eigenfoo commented Jul 30, 2018

Uh oh!

Uh oh!

eigenfoo commented Jul 30, 2018 •

edited

Loading

junpenglao commented Jul 30, 2018 •

edited

Loading