Skip to content

Possible inefficency of $unconstrain_variables() #920

Closed
@crsh

Description

@crsh

For context, I have implemented methods to support CmdStanFit-objects in bridgesampling::bridge_sampler(). While the new methods work as expected, they are very slow compared to the stanfit-methods for rstan. On my 2017 MacBook Pro, My example takes about 1.000 ms with rstan::unconstrain_pars() but almost 20.000 ms with fit$unconstrain_pars():

library("cmdstanr")
library("rstan")
library("microbenchmark")

set.seed(12345)

mu <- 0
tau2 <- 0.5
sigma2 <- 1

n <- 20
theta <- rnorm(n, mu, sqrt(tau2))
y <- rnorm(n, theta, sqrt(sigma2))

### set prior parameters ###
mu0 <- 0
tau20 <- 1
alpha <- 1
beta <- 1

# models
stan_code_H0 <- 'data {
int<lower=1> n; // number of observations
vector[n] y; // observations
real<lower=0> alpha;
real<lower=0> beta;
real<lower=0> sigma2;
}
parameters {
real<lower=0> tau2; // group-level variance
vector[n] theta; // participant effects
}
model {
target += inv_gamma_lpdf(tau2 | alpha, beta);
target += normal_lpdf(theta | 0, sqrt(tau2));
target += normal_lpdf(y | theta, sqrt(sigma2));
}
'

# RStan
rstan_model_H0 <- suppressWarnings(
  stan_model(model_code = stan_code_H0, model_name="stanmodel")
)

rstan_object_H0 <- sampling(
  rstan_model_H0, data = list(y = y, n = n, alpha = alpha, beta = beta, sigma2 = sigma2),
  iter = 3500, warmup = 500, chains = 4, show_messages = FALSE, refresh = 0
)

# CmdStanR
cmd_stan_model_H0 <- write_stan_file(stan_code_H0) |>
  cmdstan_model(force_recompile = TRUE)

cmdstan_fit_H0 <- cmd_stan_model_H0$sample(
  data = list(y = y, n = n, alpha = alpha, beta = beta, sigma2 = sigma2)
  , iter_sampling = 3500, iter_warmup = 500, chains = 4, show_messages = FALSE, refresh = 0
)

cmdstan_fit_H0$init_model_methods()

# Compare unconstraining
microbenchmark(
  upars <- cmdstan_fit_H0$unconstrain_draws(draws = cmdstan_fit_H0$draws(format = "draws_df"))
  , {
    pars <- rstan::extract(rstan_object_H0, permute = FALSE)
    skeleton <- rstan:::create_skeleton(rstan_object_H0@model_pars, rstan_object_H0@par_dims)
    upars <- apply(pars, 1:2, FUN = function(theta) {
      unconstrain_pars(rstan_object_H0, rstan:::rstan_relist(theta, skeleton))
    })
  }
  , times = 3
)
        min         lq       mean    median        uq       max neval cld
 19371.8549 20333.6410 20795.4397 21295.427 21507.232 21719.037     3   a 
   806.4729   913.1353   998.0147  1019.798  1093.786  1167.774     3   b

Note that this does not include the additional compilation time required by $init_model_methods().

I have spent quite some time profiling this issue and from what I can tell, it is this line that is responsible for most of the difference:

private$model_methods_env_$unconstrain_variables(private$model_methods_env_$model_ptr_, stan_pars)

I know next to nothing about C++, so I was wondering whether this is an issue with the implementation of $unconstrain_variables(), whether there is a problem with my setup, or whether I'm possibly making a mistake somewhere. Any pointers would be appreciated.

CmdStanR version number

packageVersion("cmdstanr")                                       
[1] ‘0.7.1’

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions