In this appendix we are deducing the online learning rules for a restricted GP where only the diagonal elements of the matrix C parametrising the posterior kernel are nonzero, similarly to the parametrisation of the covariances in the kernel spaces proposed by [88].
We are doing the simplification by including the constraint in the learning rule: projecting to a subspace of GP's with the kernel specified using only diagonal elements, ie.
and if we use matrix notation and the design matrix = [,...,] then we can write the posterior covariance matrix in the feature space specified by the design matrix and the diagonal matrix C as
In online learning setup the KL-divergence between the true posterior
and the projected one is minimised. Differentiating the KL-divergence
from eq. (74) with respect to a diagonal element Cii
leads to the expression
0 | = | t + 1 - t + 1 + postt + 1 | |
post | = | t - tr(t + 1)t + 1 |
with r(t + 1) the scalar coefficient obtained using the online learning rule and the feature vector corresponding to the new datum.
We have t + 1 equations for t + 1 variables, but the system is not
linear. Substituting the forms for the covariances
and using
the matrix inversion lemma leads to the system of equations:
We see that we have a projection with respect to the unknown matrix Ct + 1, giving no analytic solution. Using Cpost from eq. (219) the solution is written
diagKB-1 + Ct + 1 = diagKB-1 + Cpost |
We see that, to obtain the ``simplified'' solution we still need to
invert full matrices. The posterior covariance is not diagonal either.
As a consequence we will be required to perform iterative
approximations, also considered in [88]. From this we
conclude that the diagonalisation of parameter matrix
C is not
feasible as it does not introduce any computational benefit and
we believe that by keeping the size of the
set at a reasonable
size is a better alternative then a diagonalisation of a possibly
larger
set.