-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Update gaussian processes guide #6693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
d3e0063
moved and cleaned up GP guide
daniel-saunders-phil 5a36f03
added GP guide to core_notebooks index
daniel-saunders-phil 6319835
implement oriol's suggestions
daniel-saunders-phil ddb7b66
implement oriol's suggestions
daniel-saunders-phil 50cf1ff
Update docs/source/learn/core_notebooks/Gaussian_Processes.rst
daniel-saunders-phil 23d4c76
Update docs/source/learn/core_notebooks/Gaussian_Processes.rst
daniel-saunders-phil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
265 changes: 265 additions & 0 deletions
265
docs/source/learn/core_notebooks/Gaussian_Processes.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,265 @@ | ||
.. _gp_guide: | ||
|
||
****************** | ||
Gaussian Processes | ||
****************** | ||
|
||
GP Basics | ||
========= | ||
|
||
Sometimes an unknown parameter or variable in a model is not a scalar value or | ||
a fixed-length vector, but a *function*. A Gaussian process (GP) can be used | ||
as a prior probability distribution whose support is over the space of | ||
continuous functions. A GP prior on the function :math:`f(x)` is usually written, | ||
|
||
.. math:: | ||
|
||
f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,. | ||
|
||
The function values are modeled as a draw from a multivariate normal | ||
distribution that is parameterized by the mean function, :math:`m(x)`, and the | ||
covariance function, :math:`k(x, x')`. Gaussian processes are a convenient | ||
choice as priors over functions due to the marginalization and conditioning | ||
properties of the multivariate normal distribution. Usually, the marginal | ||
distribution over :math:`f(x)` is evaluated during the inference step. The | ||
conditional distribution is then used for predicting the function values | ||
:math:`f(x_*)` at new points, :math:`x_*`. | ||
|
||
The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate | ||
normal, | ||
|
||
.. math:: | ||
|
||
\begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim | ||
\text{N}\left( | ||
\begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,, | ||
\begin{bmatrix} k(x,x') & k(x_*, x) \\ | ||
k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix} | ||
\right) \,. | ||
|
||
Starting from the joint distribution, one obtains the marginal distribution | ||
of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional | ||
distribution is | ||
|
||
.. math:: | ||
|
||
f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\, | ||
k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,. | ||
|
||
.. note:: | ||
|
||
For more information on GPs, check out the book `Gaussian Processes for | ||
Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen & | ||
Williams, or `this introduction <http://www.inference.org.uk/mackay/gpB.pdf>`_ | ||
by D. Mackay. | ||
|
||
PyMC is a great environment for working with fully Bayesian Gaussian Process | ||
models. GPs in PyMC have a clear syntax and are highly composable, and many | ||
predefined covariance functions (or kernels), mean functions, and several GP | ||
implementations are included. GPs are treated as distributions that can be | ||
used within larger or hierarchical models, not just as standalone regression | ||
models. | ||
|
||
Mean and covariance functions | ||
============================= | ||
|
||
Those who have used the GPy or GPflow Python packages will find the syntax for | ||
construction mean and covariance functions somewhat familiar. When first | ||
instantiated, the mean and covariance functions are parameterized, but not | ||
given their inputs yet. The covariance functions must additionally be provided | ||
with the number of dimensions of the input matrix, and a list that indexes | ||
which of those dimensions they are to operate on. The reason for this design | ||
is so that covariance functions can be constructed that are combinations of | ||
other covariance functions. | ||
|
||
For example, to construct an exponentiated quadratic covariance function that | ||
operates on the second and third column of a three column matrix representing | ||
three predictor variables:: | ||
|
||
ls = [2, 5] # the lengthscales | ||
cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2]) | ||
|
||
Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second | ||
and third dimension to have a different lengthscale. The reason we have to | ||
specify :code:`input_dim`, the total number of columns of :code:`X`, and | ||
:code:`active_dims`, which of those columns or dimensions the covariance | ||
function will act on, is because :code:`cov_func` hasn't actually seen the | ||
input data yet. The :code:`active_dims` argument is optional, and defaults to | ||
all columns of the matrix of inputs. | ||
|
||
Covariance functions in PyMC closely follow the algebraic rules for kernels, | ||
which allows users to combine covariance functions into new ones, for example: | ||
|
||
- The sum of two covariance functions is also a covariance function:: | ||
|
||
|
||
cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...) | ||
|
||
- The product of two covariance functions is also a covariance function:: | ||
|
||
|
||
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...) | ||
|
||
- The product (or sum) of a covariance function with a scalar is a | ||
covariance function:: | ||
|
||
|
||
cov_func = eta**2 * pm.gp.cov.Matern32(...) | ||
|
||
|
||
|
||
After the covariance function is defined, it is now a function that is | ||
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since | ||
PyMC is built on top of PyTensor, it is relatively easy to define and experiment | ||
with non-standard covariance and mean functions. For more information check out | ||
the :ref:`tutorial <GP-MeansAndCovs>` on covariance functions. | ||
|
||
|
||
GP Implementations | ||
================== | ||
|
||
PyMC includes several GP implementations, including marginal and latent | ||
variable models and also some fast approximations. Their usage all follows a | ||
similar pattern: First, a GP is instantiated with a mean function and a | ||
covariance function. Then, GP objects can be added together, allowing for | ||
function characteristics to be carefully modeled and separated. Finally, one | ||
of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP | ||
object to actually construct the PyMC random variable that represents the | ||
function prior. | ||
|
||
Using :code:`gp.Latent` for the example, the syntax to first specify the GP | ||
is:: | ||
|
||
gp = pm.gp.Latent(mean_func, cov_func) | ||
|
||
The first argument is the mean function and the second is the covariance | ||
function. We've made the GP object, but we haven't made clear which function | ||
it is to be a prior for, what the inputs are, or what parameters it will be | ||
conditioned on. | ||
|
||
.. note:: | ||
|
||
The :code:`gp.Marginal` class and similar don't have a :code:`prior` method. | ||
Instead they have a :code:`marginal_likelihood` method that is used similarly, | ||
but has additional required arguments, such as the observed data, noise, | ||
or other, depending on the implementation. See the notebooks for examples. | ||
The :code:`conditional` method works similarly. | ||
|
||
Calling the `prior` method will create a PyMC random variable that represents | ||
the latent function :math:`f(x) = \mathbf{f}`:: | ||
|
||
f = gp.prior("f", X) | ||
|
||
:code:`f` is a random variable that can be used within a PyMC model like any | ||
other type of random variable. The first argument is the name of the random | ||
variable representing the function we are placing the prior over. | ||
The second argument is the inputs to the function that the prior is over, | ||
:code:`X`. The inputs are usually known and present in the data, but they can | ||
also be PyMC random variables. If the inputs are an PyTensor tensor or a | ||
PyMC random variable, the :code:`shape` needs to be given. | ||
|
||
Usually at this point, inference is performed on the model. The | ||
:code:`conditional` method creates the conditional, or predictive, | ||
distribution over the latent function at arbitrary :math:`x_*` input points, | ||
:math:`f(x_*)`. To construct the conditional distribution we write:: | ||
|
||
f_star = gp.conditional("f_star", X_star) | ||
|
||
.. _additive_gp: | ||
|
||
Additive GPs | ||
============ | ||
|
||
The GP implementation in PyMC is constructed so that it is easy to define | ||
additive GPs and sample from individual GP components. We can write:: | ||
|
||
gp1 = pm.gp.Marginal(mean_func1, cov_func1) | ||
gp2 = pm.gp.Marginal(mean_func2, cov_func2) | ||
gp3 = gp1 + gp2 | ||
|
||
The GP objects have to have the same type, :code:`gp.Marginal` cannot | ||
be added to :code:`gp.Latent`. | ||
|
||
Consider two independent GP distributed functions, :math:`f_1(x) \sim | ||
\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim | ||
\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of | ||
:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2` and :math:`f_1^* + f_2^*` is | ||
|
||
.. math:: | ||
|
||
\begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^* | ||
\\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim | ||
\text{N}\left( | ||
\begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\ | ||
m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\, | ||
\begin{bmatrix} | ||
K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\ | ||
K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\ | ||
0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\ | ||
0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\ | ||
K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\ | ||
K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**} | ||
\end{bmatrix} | ||
\right) \,. | ||
|
||
Using the joint distribution to obtain the conditional distribution of :math:`f_1^*` | ||
with the contribution due to :math:`f_1 + f_2` factored out, we get | ||
|
||
.. math:: | ||
f_1^* \mid f_1 + f_2 \sim \text{N}\left( | ||
m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\, | ||
K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,. | ||
|
||
|
||
These equations show how to break down GP models into individual components to see how each | ||
contributes to the data. For more information, check out `David Duvenaud's PhD | ||
thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_. | ||
|
||
The GP objects in PyMC keeps track of these marginals automatically. The | ||
following code sketch shows how to define the conditional distribution of | ||
:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for | ||
other implementations. The first block fits the GP prior. We denote | ||
:math:`f_1 + f_2` as just :math:`f` for brevity:: | ||
|
||
with pm.Model() as model: | ||
gp1 = pm.gp.Marginal(mean_func1, cov_func1) | ||
gp2 = pm.gp.Marginal(mean_func2, cov_func2) | ||
|
||
# gp represents f1 + f2. | ||
gp = gp1 + gp2 | ||
|
||
f = gp.marginal_likelihood("f", X, y, sigma) | ||
|
||
idata = pm.sample(1000) | ||
|
||
|
||
To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we | ||
also need to include the additional arguments, :code:`X`, :code:`y`, and | ||
:code:`sigma`:: | ||
|
||
with model: | ||
# conditional distributions of f1 and f2 | ||
f1_star = gp1.conditional("f1_star", X_star, | ||
given={"X": X, "y": y, "sigma": sigma, "gp": gp}) | ||
f2_star = gp2.conditional("f2_star", X_star, | ||
given={"X": X, "y": y, "sigma": sigma, "gp": gp}) | ||
|
||
# conditional of f1 + f2, `given` not required | ||
f_star = gp.conditional("f_star", X_star) | ||
|
||
This second block produces the conditional distributions. Notice that extra | ||
arguments are required for conditionals of :math:`f1` and :math:`f2`, but not | ||
:math:`f`. This is because those arguments are cached when | ||
:code:`.marginal_likelihood` is called on :code:`gp`. | ||
|
||
.. note:: | ||
When constructing conditionals, the additional arguments :code:`X`, :code:`y`, | ||
:code:`sigma` and :code:`gp` must be provided as a dict called `given`! | ||
|
||
Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called, | ||
their conditionals need to be provided with the required inputs. In the same | ||
fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random | ||
variables that can now be used like any other random variable in PyMC. | ||
|
||
Check the :ref:`notebooks <gaussian_processes>` | ||
for detailed demonstrations of the usage of GP functionality in PyMC. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,4 +10,5 @@ model_comparison | |
posterior_predictive | ||
dimensionality | ||
pymc_pytensor | ||
Gaussian_Processes | ||
::: |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.