In mathematics, a general principle for studying an object is always from the study of the object itself to the study of the relationship between objects. In functional data analysis, the most important part for studying of the object itself, i.e. one functional data set, is functional principal component analysis (FPCA). And for the study of the relationship between two functional data sets, one popular way is various types of regression analysis. For this post, I only focus on the FPCA. The central idea of FPCA is dimension reduction by means of a spectral decomposition of the covariance operator, which yields functional principal components as coefficient vectors to represent the random curves in the sample.

First of all, let’s define what’s the FPCA. Suppose we observe functions $X_1(\cdot), X_2(\cdot), \cdots, X_n(\cdot)$. We want to find an orthonormal basis $\phi_1(\cdot), \cdots, \phi_K(\cdot)$ such that

$\sum_{i=1}^n\|X_i-\sum_{k=1}^K\phi_k\|^2$

is minimized. Once such a basis is found, we can replace each curve $X_i$ by $\sum_{k=1}^K\phi_k$ to a good approximation. This means instead of working with infinitely dimensional curves $X_i$, we can work with $K-$dimensional vectors $(, \cdots, )^\intercal$. And the functions $\phi_k$ are called collectively optimal empirical orthonormal basis, or empirical functional principal components. Note that once we got the functional principal components, we can get the so called FPC scores $$ to approximate the curves.

For FPCA, we usually adopt the so called “smooth-first-then-estimate” approach, namely,we first pre-process the discrete observations to get smoothed functional data by smoothing and then use the empirical estimators of the mean and covariance based on the smoothed functional data to conduct FPCA.

For the smoothing step, we have to consider individual by individual. For each realization, we can use basis expansion (Polynomial basis is unstable; Fourier basis is suitable for periodic functions; B-spline basis is flexible and useful), smoothing penalties (which lead to smoothing splines by the Smoothing Spline Theorem), as well as local polynomial smoothing:

• Basis expansion: by assuming one realization of the underlying true process $X_i(t)=\sum_{k=1}^Kc_{ik}B_k(t)$, where $\{B_k(\cdot), k=1,2,\cdots, K\}$ are the basis functions, we have the estimation

$min_{\{c_{ik}\}_{k=1}^K}\sum_{j=1}^{m_i}\big(Y_{ij}-\sum_{k=1}^Kc_{ik}B_k(t_{ij})\big)^2.$

• Smoothing penalties: $min\sum_{j=1}^{m_i}\big(Y_{ij}-X_i(t_{ij})\big)^2+\lambda J(X_i(\cdot))$, where $J(\cdot)$ is a measure for the roughness of functions.
• Local linear smoothing: assume at point $t$, we have $X_i(t)\approx X_i(t_0)+X_i'(t_0)(t-t_0)$, then we have $X_i(t_0)$ estimated by the following $\hat{a}$

$min_{a,b}\sum_{j=1}^{m_i}\big(Y_{ij}-a-b(t_{ij}-t_0)\big)^2K_h(t_{ij}-t_0).$

Once we have the smoothed functional data, denoted as $X_i(t), i=1,2,\cdots,n$, we can have the empirical estimator of the mean and covariance as

$\bar{X}(t)=\frac{1}{n}\sum_{i=1}^nX_i(t), \hat{C}(s,t)=\frac{1}{n}\sum_{i=1}^n\big(X_i(t)-\bar{X}(t)\big)\big(X_i(s)-\bar{X}(s)\big).$

And then we can have the empirical functional principal components as the eigenfunctions of the above sample covariance operator $\hat{C}$ (for the proof, refer to the book “Inference for Functional Data with Applications” page 39). Note that the above estimation procedure of the mean function and the covariance function need to have more dense functional data, since otherwise the smoothing step will be not stable. Thus people are also proposing some other estimators of mean function and covariance function, such as the local linear estimator for the mean function and the covariance function proposed by Professor Yehua Li from ISU, which has an advantage that they can cover all types of functional data, sparse (i.e. longitudinal), dense, or in-between. Now the problem is that how to conduct FPCA based on $\hat{C}(s,t)$ in practice. Actually it’s  the following classic mathematical problem:

$\int \hat{C}(s,t)\xi(s)ds=\lambda\xi(t), i.e. \hat{C}\xi=\lambda\xi,$

where $\hat{C}$ is the integral operator with a symmetric kernel $\hat{C}(s,t)$. This is a well-studied problem in computing the eigenvalues and eigenfunctions of an integral operator with a symmetric kernel in applied mathematics. So people can refer to those numerical methods to solve the above problem.

However, two common methods used in Statistics are described in Section 8.4 in the fundamental functional data analysis book written by Professors J. O. Ramsay and  B. W. Silverman. One is the discretizing method and the other is the basis function method. For the discretizing method, essentially, we just discretize the smoothed functions $X_i(t)$ to a fine grid of $N$ equally spaced values that span the interval, and then use the traditional PCA, followed by some interpolation method for other points not belonging to be selected grid points. Now for the basis function method, we illustrate it by assuming the mean function equal to 0:

1. Basis expansion: $X_i(t)=\sum_{k=1}^Ka_{ik}\phi_k(t)$, and then $X=(X_1,X_2,\cdots, X_n)^\intercal=A\phi$, where $A=((a_{ik}))\in{R}^{n\times K}$ and $\phi=(\phi_1,\phi_2,\cdots,\phi_K)^\intercal\in{R}^{K}$;
2. Covariance function: $\hat{C}(s,t)=\frac{1}{n}X^\intercal(s) X(t)=\frac{1}{n}\phi(s)^\intercal A^\intercal A\phi(t)$;
3. Eigenfunction expansion: assume the eigenfunction $\xi(t)=\sum_{k=1}^Kb_k\phi_k(t)$;
4. Problem Simplification: The above basis expansions lead to

$\hat{C}\xi=\int\frac{1}{n}\phi(s)^\intercal A^\intercal A\phi(t)\xi(t)dt= \frac{1}{n}\phi(s)^\intercal A^\intercal A\int\phi(t)\xi(t)dt$

$=\frac{1}{n}\phi(s)^\intercal A^\intercal A\sum_{k=1}^Kb_k\int\phi(t)\phi_k(t)dt=\frac{1}{n}\phi(s)^\intercal A^\intercal AWb$,

where $W=\int\phi(t)\phi^\intercal(t)dt\in{R}^{K\times K}$ and $b=(b_1, b_2, \cdots, b_K)^\intercal\in{R}^{K}$. Hence the eigen problem boils down to $\frac{1}{n}\phi(s)^\intercal A^\intercal AWb=\lambda\phi(s)^\intercal b, \forall s$, which is equivalent to

$\frac{1}{n}A^\intercal AWb=\lambda b.$

Note that the assumptions for the eigenfunctions to be orthonormal are equivalent to $b_i^\intercal W b_i=1, b_i^\intercal W b_j=0, i\neq j$. Let $u=W^{1/2}b$, and then we have the above problem as

$n^{-1}W^{1/2}A^\intercal AW^{1/2}W^{1/2}b=\lambda W^{1/2}b, i.e. n^{-1}W^{1/2}A^\intercal AW^{1/2}u=\lambda u$

which is a traditional eigen problem for symmetric matrix $n^{-1}W^{1/2}A^\intercal AW^{1/2}$.

Two special cases deserve particular attention. One is orthonormal basis which leads to $W=I$. And the other is taking the smoothed functional data $X_i(t)$ as the basis function which leads to $A=I$.

Note that the empirical functional principal components are proved to be the eigenfunctions of the sample covariance operator. This fact connects the FPCA with the so called Karhunen-Lo$\grave{e}$ve expansion:

$X_i(t)=\mu(t)+\sum_{k}\xi_{ik}\phi_k(t), Cov(X_i(t), X_i(s))=\sum_k\lambda_k\phi_k(s)\phi_k(s)$

where $\xi_{ik}$ are uncorrelated random variables with mean 0 and variance $\lambda_k$ where $\sum_k\lambda_k<\infty, \lambda_1\geq\lambda_2\geq\cdots$. For simplicity we assume $\mu(t)=0$. Then we can easily see the connection between KL expansion and FPCA. $\{\phi_k(\cdot)\}_{k}$ is the series of orthonormal basis functions, and $\{=\xi_{ik}, i=1, 2, \cdots, n, k=1, 2, \cdots\}$ are those FPC scores.

So far, we only have discussed how to get the empirical functional principal components, i.e. eigenfunctions/orthonormal basis functions. But to represent the functional data, we have to get those coefficients, which are called FPC scores $\{=\xi_{ik}, i=1, 2, \cdots, n, k=1, 2, \cdots\}$. The simplest way is by numerical integration:

$\hat\xi_{ik}=\int X_i(t)\hat\phi_k(t)dt.$

Note that for the above estimation of the FPC scores via numerical integration, we first need the smoothed functional data $X_i(t)$. So if we only have sparsely observed functional data, this method will not provide reasonable approximations. Professor Fang Yao et al. proposed the so called PACE (principal component analysis through conditional expectation) to deal with such longitudinal data.