Skip to content

Commit 8608e67

Browse files
committed
minor edits to opencl vignette
1 parent 4905dd8 commit 8608e67

File tree

1 file changed

+40
-40
lines changed

1 file changed

+40
-40
lines changed

vignettes/opencl.Rmd

Lines changed: 40 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Run Stan on the GPU with OpenCL"
2+
title: "Running Stan on the GPU with OpenCL"
33
author: "Rok Češnovar and Jonah Gabry"
44
output:
55
rmarkdown::html_vignette:
@@ -8,46 +8,50 @@ output:
88
params:
99
EVAL: !r identical(Sys.getenv("CMDSTANR_OPENCL_TESTS"), "true")
1010
vignette: >
11-
%\VignetteIndexEntry{Run Stan on the GPU with OpenCL}
11+
%\VignetteIndexEntry{Running Stan on the GPU with OpenCL}
1212
%\VignetteEngine{knitr::rmarkdown}
1313
%\VignetteEncoding{UTF-8}
1414
---
1515

1616
## Introduction
1717

18-
This vignette demonstrates how to use the OpenCL capabilites of CmdStan with
19-
CmdStanR. The vignette requires CmdStan 2.26.1 or newer.
18+
This vignette demonstrates how to use the OpenCL capabilities of CmdStan with
19+
CmdStanR. The functionality described in this vignette requires CmdStan 2.26.1
20+
or newer.
2021

2122
As of version 2.26.1, users can expect speedups with OpenCL when using vectorized
22-
probability distribution/mass functions (functions with the `_lpdf` or `_lpmf`
23-
suffix). You can expect speedups when the input variables contain 20.000 or more elements.
23+
probability distribution functions (functions with the `_lpdf` or `_lpmf`
24+
suffix) and when the input variables contain at least 20,000 elements.
2425

25-
The actual speedup for a model will depend on whether the `lpdf/lpmf` functions
26-
are the bottlenecks of the model and the `lpdf/lpmf` function used.
27-
The more computationally complex the function is, the larger the expected speedup.
28-
The biggest speedups are expected when using the GLM functions.
26+
The actual speedup for a model will depend on the particular `lpdf/lpmf`
27+
functions used and whether the `lpdf/lpmf` functions are the bottlenecks of the
28+
model. The more computationally complex the function is, the larger the expected
29+
speedup. The biggest speedups are expected when using the specialized GLM
30+
functions.
2931

30-
Use [profiling](../profiling.Rmd) in order to establish the bottlenecks in your model.
32+
In order to establish the bottlenecks in your model we recommend using
33+
[profiling](../profiling.Rmd), which was introduced in Stan version 2.26.0.
3134

3235
## OpenCL runtime
3336

3437
OpenCL is supported on most modern CPUs and GPUs. In order to use
3538
OpenCL in CmdStanR, an OpenCL runtime for the target device must be installed.
36-
A guide for the most common devices is available [here](https://mc-stan.org/docs/2_26/cmdstan-guide/parallelization.html#opencl).
39+
A guide for the most common devices is available in the CmdStan manual's
40+
[chapter on parallelization](https://mc-stan.org/docs/2_26/cmdstan-guide/parallelization.html#opencl).
3741

3842
## Compiling a model with OpenCL
3943

40-
By default, models in CmdStanR are compiled without OpenCL support. Once OpenCL
44+
By default, models in CmdStanR are compiled *without* OpenCL support. Once OpenCL
4145
support is enabled, a CmdStan model will make use of OpenCL if the functions
4246
in the model support it. Technically no changes to a model are required to
4347
support OpenCL since the choice of using OpenCL is handled by the compiler,
4448
but it can still be useful to rewrite a model to be more OpenCL-friendly by
45-
using vectorization as much as possible.
49+
using vectorization as much as possible when using probability distributions.
4650

4751
Consider a simple logistic regression with parameters `alpha` and `beta`,
4852
covariates `X`, and outcome `y`.
4953

50-
```stan
54+
```
5155
data {
5256
int<lower=1> k;
5357
int<lower=0> n;
@@ -74,11 +78,8 @@ library(cmdstanr)
7478
n <- 200000
7579
k <- 20
7680
X <- matrix(rnorm(n * k), ncol = k)
77-
78-
y <- 3 * X[,1] - 2 * X[,2] + 1
79-
p <- runif(n)
80-
y <- ifelse(p < (1 / (1 + exp(-y))), 1, 0)
81-
mdata <- list(k = ncol(X), n = nrow(X), y = y, X = X)
81+
y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
82+
mdata <- list(k = k, n = n, y = y, X = X)
8283
```
8384

8485
In this model, most of the computation will be handled by the
@@ -88,56 +89,55 @@ it should be possible to accelerate it with OpenCL. Check
8889
with OpenCL support.
8990

9091
To build the model with OpenCL support, add
91-
`cpp_options = list(stan_opencl = TRUE)` to the model compile.
92+
`cpp_options = list(stan_opencl = TRUE)` at the compilation step.
9293

9394
```{r compile-opencl, message=FALSE, results='hide'}
9495
# Compile the model with STAN_OPENCL=TRUE
95-
model_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
96-
cpp_options = list(stan_opencl = TRUE))
96+
mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
97+
cpp_options = list(stan_opencl = TRUE))
9798
```
9899

99100
## Running models with OpenCL
100101

101102
Running models with OpenCL requires specifying the OpenCL platform and device
102103
on which to run the model (there can be multiple). If the system has one GPU
103104
and no OpenCL CPU runtime, the platform and device IDs of the GPU are typically
104-
both `0`, but the `clinfo` tool can be used to figure out for sure what devices
105+
both `0`, but the `clinfo` tool can be used to figure out for sure which devices
105106
are available.
106107

107108
On an Ubuntu system with both CPU and GPU OpenCL support, `clinfo -l` outputs:
108109

109-
```{bash eval=FALSE}
110+
```
110111
Platform #0: AMD Accelerated Parallel Processing
111112
`-- Device #0: gfx906+sram-ecc
112113
Platform #1: Intel(R) CPU Runtime for OpenCL(TM) Applications
113114
`-- Device #0: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
114115
```
115116

116-
The GPU is platform ID 0 and device ID 0, while the CPU is platform ID 1,
117-
device ID 0 . These can be specified with the `opencl_ids` argument
118-
when running a model. The `opencl_ids` is supplied as a vector of
119-
length 2, where the first element is the platform ID and the second
120-
argument is the device ID.
117+
On this system the GPU is platform ID 0 and device ID 0, while the CPU is
118+
platform ID 1, device ID 0. These can be specified with the `opencl_ids`
119+
argument when running a model. The `opencl_ids` is supplied as a vector of
120+
length 2, where the first element is the platform ID and the second argument is
121+
the device ID.
121122

122123
```{r fit-opencl}
123-
fit_cl <- model_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
124-
opencl_ids = c(0, 0))
124+
fit_cl <- mod_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
125+
opencl_ids = c(0, 0))
125126
```
126127

127-
Lets run a version without OpenCL:
128+
We'll also run a version without OpenCL and compare the run times.
128129

129130
```{r fit-cpu, message=FALSE}
130131
# no OpenCL version
131-
model <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan")
132-
fit_cpu <- model$sample(data = mdata, chains = 4, parallel_chains = 4,
133-
refresh = 0)
132+
mod <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan")
133+
fit_cpu <- mod$sample(data = mdata, chains = 4, parallel_chains = 4, refresh = 0)
134134
```
135135

136136
The speedup of the OpenCL model is:
137137
```{r time-ratio, message=FALSE}
138-
fit_cpu$time()$total/fit_cl$time()$total
138+
fit_cpu$time()$total / fit_cl$time()$total
139139
```
140140

141-
This speedup will be determined by the GPU/CPU used, the input problem
142-
sizes (data as well as parameters) and if the model uses functions
143-
that can be run on the GPU or other OpenCL device.
141+
This speedup will be determined by the particular GPU/CPU used, the input
142+
problem sizes (data as well as parameters) and if the model uses functions that
143+
can be run on the GPU or other OpenCL devices.

0 commit comments

Comments
 (0)