minor edits to opencl vignette

jgabry · jgabry · commit 8608e67b2d09 · 2021-04-15T08:55:19.000-06:00
diff --git a/vignettes/opencl.Rmd b/vignettes/opencl.Rmd
@@ -1,5 +1,5 @@
 ---
-title: "Run Stan on the GPU with OpenCL"
+title: "Running Stan on the GPU with OpenCL"
 author: "Rok Češnovar and Jonah Gabry"
 output: 
   rmarkdown::html_vignette:
@@ -8,46 +8,50 @@ output:
 params:
   EVAL: !r identical(Sys.getenv("CMDSTANR_OPENCL_TESTS"), "true")
 vignette: >
-  %\VignetteIndexEntry{Run Stan on the GPU with OpenCL}
+  %\VignetteIndexEntry{Running Stan on the GPU with OpenCL}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
 ## Introduction
 
-This vignette demonstrates how to use the OpenCL capabilites of CmdStan with
-CmdStanR. The vignette requires CmdStan 2.26.1 or newer.
+This vignette demonstrates how to use the OpenCL capabilities of CmdStan with
+CmdStanR. The functionality described in this vignette requires CmdStan 2.26.1
+or newer.
 
 As of version 2.26.1, users can expect speedups with OpenCL when using vectorized
-probability distribution/mass functions (functions with the `_lpdf` or `_lpmf`
-suffix). You can expect speedups when the input variables contain 20.000 or more elements.
+probability distribution functions (functions with the `_lpdf` or `_lpmf`
+suffix) and when the input variables contain at least 20,000 elements.
 
-The actual speedup for a model will depend on whether the `lpdf/lpmf` functions
-are the bottlenecks of the model and the `lpdf/lpmf` function used. 
-The more computationally complex the function is, the larger the expected speedup.
-The biggest speedups are expected when using the GLM functions.
+The actual speedup for a model will depend on the particular `lpdf/lpmf`
+functions used and whether the `lpdf/lpmf` functions are the bottlenecks of the
+model. The more computationally complex the function is, the larger the expected
+speedup. The biggest speedups are expected when using the specialized GLM
+functions.
 
-Use [profiling](../profiling.Rmd) in order to establish the bottlenecks in your model.
+In order to establish the bottlenecks in your model we recommend using
+[profiling](../profiling.Rmd), which was introduced in Stan version 2.26.0. 
 
 ## OpenCL runtime
 
 OpenCL is supported on most modern CPUs and GPUs. In order to use
 OpenCL in CmdStanR, an OpenCL runtime for the target device must be installed.
-A guide for the most common devices is available [here](https://mc-stan.org/docs/2_26/cmdstan-guide/parallelization.html#opencl).
+A guide for the most common devices is available in the CmdStan manual's 
+[chapter on parallelization](https://mc-stan.org/docs/2_26/cmdstan-guide/parallelization.html#opencl).
 
 ## Compiling a model with OpenCL
 
-By default, models in CmdStanR are compiled without OpenCL support. Once OpenCL
+By default, models in CmdStanR are compiled *without* OpenCL support. Once OpenCL
 support is enabled, a CmdStan model will make use of OpenCL if the functions
 in the model support it. Technically no changes to a model are required to
 support OpenCL since the choice of using OpenCL is handled by the compiler,
 but it can still be useful to rewrite a model to be more OpenCL-friendly by
-using vectorization as much as possible.
+using vectorization as much as possible when using probability distributions.
 
 Consider a simple logistic regression with parameters `alpha` and `beta`,
 covariates `X`, and outcome `y`.
 
-```stan
+```
 data {
   int<lower=1> k;
   int<lower=0> n;
@@ -74,11 +78,8 @@ library(cmdstanr)
 n <- 200000
 k <- 20
 X <- matrix(rnorm(n * k), ncol = k)
-
-y <- 3 * X[,1] - 2 * X[,2] + 1
-p <- runif(n)
-y <- ifelse(p < (1 / (1 + exp(-y))), 1, 0)
-mdata <- list(k = ncol(X), n = nrow(X), y = y, X = X)
+y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
+mdata <- list(k = k, n = n, y = y, X = X)
 ```
 
 In this model, most of the computation will be handled by the
@@ -88,56 +89,55 @@ it should be possible to accelerate it with OpenCL. Check
 with OpenCL support.
 
 To build the model with OpenCL support, add
-`cpp_options = list(stan_opencl = TRUE)` to the model compile.
+`cpp_options = list(stan_opencl = TRUE)` at the compilation step.
 
 ```{r compile-opencl, message=FALSE, results='hide'}
 # Compile the model with STAN_OPENCL=TRUE
-model_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
-                          cpp_options = list(stan_opencl = TRUE))
+mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
+                        cpp_options = list(stan_opencl = TRUE))
 ```
 
 ## Running models with OpenCL
 
 Running models with OpenCL requires specifying the OpenCL platform and device
 on which to run the model (there can be multiple). If the system has one GPU
 and no OpenCL CPU runtime, the platform and device IDs of the GPU are typically
-both `0`, but the `clinfo` tool can be used to figure out for sure what devices
+both `0`, but the `clinfo` tool can be used to figure out for sure which devices
 are available.
 
 On an Ubuntu system with both CPU and GPU OpenCL support, `clinfo -l` outputs:
 
-```{bash eval=FALSE}
+```
 Platform #0: AMD Accelerated Parallel Processing
  `-- Device #0: gfx906+sram-ecc
 Platform #1: Intel(R) CPU Runtime for OpenCL(TM) Applications
  `-- Device #0: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
 ```
 
-The GPU is platform ID 0 and device ID 0, while the CPU is platform ID 1,
-device ID 0 . These can be specified with the `opencl_ids` argument 
-when running a model. The `opencl_ids` is  supplied as a vector of 
-length 2, where the first element is the platform ID and the second
-argument is the device ID.
+On this system the GPU is platform ID 0 and device ID 0, while the CPU is
+platform ID 1, device ID 0. These can be specified with the `opencl_ids`
+argument when running a model. The `opencl_ids` is  supplied as a vector of
+length 2, where the first element is the platform ID and the second argument is
+the device ID.
 
 ```{r fit-opencl}
-fit_cl <- model_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
-                          opencl_ids = c(0, 0))
+fit_cl <- mod_cl$sample(data = mdata, chains = 4, parallel_chains = 4,
+                        opencl_ids = c(0, 0))
 ```
 
-Lets run a version without OpenCL:
+We'll also run a version without OpenCL and compare the run times. 
 
 ```{r fit-cpu, message=FALSE}
 # no OpenCL version
-model <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan")
-fit_cpu <- model$sample(data = mdata, chains = 4, parallel_chains = 4,
-                    refresh = 0)
+mod <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan")
+fit_cpu <- mod$sample(data = mdata, chains = 4, parallel_chains = 4, refresh = 0)
 ```
 
 The speedup of the OpenCL model is:
 ```{r time-ratio, message=FALSE}
-fit_cpu$time()$total/fit_cl$time()$total
+fit_cpu$time()$total / fit_cl$time()$total
 ```
 
-This speedup will be determined by the GPU/CPU used, the input problem
-sizes (data as well as parameters) and if the model uses functions
-that can be run on the GPU or other OpenCL device.
+This speedup will be determined by the particular GPU/CPU used, the input
+problem sizes (data as well as parameters) and if the model uses functions that
+can be run on the GPU or other OpenCL devices.