Open
Description
Introduction
Some languages now support Unicode (mostly UTF8) for writing source code. It would be great if one could also use Unicode in Stan source. (Note that comments in UTF8, or any superset that embeds ASCII, are already supported in the sense the parser just ignores them.)
Broadly, there are two possible levels of support:
- in variable and function names (eg
ϕ
), and - in operators (eg
≤
), which provide synonyms for existing ones (eg<=
)
Example
This is how the 8 schools example would look like in unicode:
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effect (school j)
real<lower=0> σ[J]; // std err of effect estimate (school j)
}
parameters {
real μ;
real θ[J];
real<lower=0> τ;
}
model {
θ ~ normal(μ, τ);
y ~ normal(θ, σ);
}
Possible benefits
- more compact source code
- better mapping to equations in papers
Possible downsides
- editor/entry support
- font support
- possibly corrupted files
The first two are mitigated by the fact that ASCII is a subset of UTF8, so using the feature is optional.
UTF8 support in various languages which have interfaces for Stan
language | literals | identifiers | operators | would UTF8 variables work for interfacing with Stan? |
---|---|---|---|---|
R | yes | yes | no | yes |
Python | yes | only from version 3 | no | yes, even in Python 2, as they are used as literal keys |
Julia | yes | yes | yes | yes |
Matlab | yes | yes, but needs to be enabled | no | yes |
Stata | yes | yes, from version 14 | no | probably? |
Editor support
Emacs
See this list for various UTF8 implementations using autocomplete, company-mode, and quail.