Footnotes

... itself.¹

This is a reasonable assumption if we want to extract information from the data, or equivalently we want to have predictions based on the dataset. A uniform distribution of the data would be completely non-informative.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... Gaussians ²

I acknowledge D. Saad for pointing it out. It turns out however, by studying the KL-distance between the original and a slightly perturbed density function, that the KL-distance only relates to the diagonal elements of the Fisher information matrix, this in fact is an exercise[14, page 334].

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...SchoBurSmo99.³

A dedicated internet page is at: www.kernel-machines.org containing tutorials for the SVM.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

....⁴

The data in the feature space are considered as having zero mean. Subtracting the mean would not lead to conceptual difference, it has been ignored for clarity.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... is:⁵

There might be no exact inverse for K_mm, this is solved by adding a ``jitter'' factor to the diagonal elements in the original kernel matrix making sure it is positive definite.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... code.⁶

The operations in eq. (130) might involve inversions of almost singular matrices. A possible way to deal with the singular matrices to introduce the auxiliary matrix U = P^T $\Lambda$ P and to rewrite eq. (130) as:

$\begin{displaymath}\begin{split} {\boldsymbol { \alpha } }= &{\boldsymbol { K }... ...ymbol { K } }^{-1}\right)^{-1} {\boldsymbol { U } } \end{split}\end{displaymath}$

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... price.⁷

Available from http://lib.stat.cmu.edu/boston.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...Ripley96 ⁸

Available at http://www.stats.ox.ac.uk/pub/PRNN

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...GormanSejnowski88.⁹

Available from http://www.ics.uci.edu/mlearn/MLRepository

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... patterns.¹⁰

Available from http://www.kernel-machines.org/data/

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

... parameters.¹¹

The introduction of $\lambda$ , a parameter to be estimated from the data, in the structure of the prior GP makes the esetimation not consistent with the Bayesian framework. This is not a problem in this section since we are using MAP approximations to the density function.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.