[FDA Book] Chapter 8.3 ~ 8.5

8.3 Visualizing the results

8.3.1 Plotting components as perturbations of the mean (mean의 작은 변동 요소를 plotting)

Overall mean function과 PC function의 적절한 배수를 더하거나 나눈 function을 plotting해서 확인하는 것이 유용한 방법이다. (Figure 8.2)
위의 plot을 그릴 때, 어떤 PC function의 배수를 사용해야 할까?
Define a constant $C$ to be the root-mean-square difference between $\hat{\mu}$ and overall time average

$$ C^2 = \frac{1}{T}\lvert\lvert\hat{\mu}-\bar{\mu}\lvert\lvert^2 $$

where $\bar{\mu}=\frac{1}{T}\int \hat{\mu}(t)dt$
plot $\hat{\mu}$ and $\hat{\mu} \pm 0.2C\hat{\gamma}_j,$ where constant $0.2$는 결과 해석을 쉽게 하기 위해 정한 값
various modes of variability를 쉽게 비교하기 위해서는 PC functions에 같은 constant를 사용하는 것이 좋다.

PCA의 중요한 특징은 component별로 각 curve의 score $f_{im}$을 평가하는 것
Figure 8.4

8.2에서 the weight function $\xi_m$을 $PCASSE$를 minimize하는 $K$개의 orthonormal functions을 expansion하여 표현할 수 있음
Good orthonormal set(rotated orthonormal basis function) is defined by

$$ \psi = \mathbf{T}\xi $$

where $\mathbf{T}$ : any orthonormal matrix of order $K$ ($\mathbf{T’T=TT'=I}_{K \times K}$),

$\xi$ : the vector of function $(\xi_1,…,\xi_k)'$
기하학적으로, the vector of functions $\psi$는 rigid(엄격한) rotation of $\xi$
즉, rotation 후에는, $\psi_1$이 the largest component of variation으로 보기 어렵다.
하지만 중요한 점은 the orthonormal basis functions $\psi_1, …, \psi_K$가 unrotated된 funtions보다 $K$-dim original curve에 더 잘 근사하는 점에서 유용하다.
interprete하기 쉬운 some rotated runctions 찾을 수 있을까?

=> $VARIMAX$ rotation
Let $\mathbf{B}$ : $K \times n$ matrix representing the first $K$ PC functions $\xi_1,…,\xi_K$,

지금은 $\mathbf{B}_{m \times n}$의 $m$th row가 구간 $\boldsymbol{\tau}$에서 $n$개의 equally spaced argument values, $\xi_m(t_1),…,\xi_m(t_n)$를 가진다고 가정

values of the rotated basis functions $\psi=\mathbf{T}\xi$의 대응되는 matrix $\mathbf{A}$ will be given by

$$ \mathbf{A}=\mathbf{TB} $$
orthonormal rotation matrix $\mathbf T$를 고르기 위한 $VARIMAX$ strategy는 the values $a_{mj}^2$를 single vector로 늘어뜨린 것의 variation을 maximize하는 것이다.

$\mathbf T$가 rotation matrix이기 때문에, 어떤 rotation을 하더라도 the overall sum of squared values(sum of PC’s variance)는 변하지 않는다.

$$ \sum_m\sum_j a_{mj}^2 = trace(\mathbf{A’A})=trace(\mathbf{B’T’TB})=trace(\mathbf{B’B}) $$
그러므로, $Var(a_{mj}^2)$를 maximize하는 것은 이 값이 상대적으로 크거나 0에 가까울 때 일어나는 경향이 있다. (즉, $a_{mj}$는 strongly positive 또는 near zero 또는 strongly negative)
$VARIMAX$ criterion을 maximize하는 $\mathbf T$를 계산하는 빠르고 안정적인 computational techniques 존재

=> $C$ function for computing the $VARIMAX$ rotation - website
$VARIMAX$ rotation은 orthonormality를 유지하면서 $\psi_m$의 midium-sized values를 억제한다.

=> rotated PC scores는 더이상 uncorrelated가 아니지만 sum of variance는 같다.
$VARIMAX$의 idea를 사용하는 또 다른 방법은 $\mathbf B$가 $n$개의 basis function $\phi$의 측면에서 $\xi_m$을 expansion하는 coefficients를 포함하도록 정의하는 것이다.

=> $\xi_m$ 값을 rotating하는 것 대신, 각 $\xi_m$의 basis expansion coefficients를 rotate

8.4.1 Discretizing the functions

$x_i \rightarrow s_j,$ where $s_j$ is $n$ equally spaced values that spans the interval $\boldsymbol \tau$

=> 변환된 $N \times n$ matrix $\mathbf X$로 standard multivariate PCA를 할 수 있음

$$ \mathbf{Vu=\lambda u} $$
If $n \gg N$ (# variables $\gg$ sample size),
- $n \times n$ matrix $\mathbf V$ 대신에 $\mathbf X$의 $SVD$인 $\mathbf{UDW'}$를 사용
- variance matrix는 $N\mathbf{V=WD^2W'}$를 만족 ($\because \mathbf V=\frac{1}{N}\mathbf{X’X}$)
  
  => $\mathbf V$의 nonzero eigen values는 $\mathbf X$의 singular values의 제곱값과 같고, 대응되는 eigenvectors는 $\mathbf U$의 columns이다.

Functional case

Let
- sample var-cov matrix $\mathbf V=\frac{1}{N}\mathbf{X’X}$ have elements $v(s_j,s_k)$ where $v(s,t)$ is the sample covariance function.
- $\tilde{\boldsymbol \xi}$ : $n$-vector of values $\xi(s_j)$ => $n \times 1$ vector (after discritization)
- $w=\frac{T}{n}$ where $T$ is the length of the interval $\boldsymbol \tau$ (discretization된 points의 간격)
For each $s_j,$

$$ V\xi(s_j)=\int v(s_j,s)\xi(s)ds \approx w\sum v(s_j,s_k) \tilde{\boldsymbol \xi_k}, $$

=> functional eigen equation $V\xi=\rho\xi$ has the approximte discrete form

$$ w\mathbf V \tilde{\boldsymbol \xi}=\rho\tilde{\boldsymbol \xi} $$
위의 eigen equation은 $(8.15)$의 solution과 대응됨
- eigen values $\rho=w\lambda$
- $\int \xi(s)^2ds=1 \Longleftrightarrow w \lVert \tilde{\boldsymbol \xi} \rVert^2=1$ => $\tilde{\boldsymbol \xi}=w^{-1/2}\mathbf u$ if $\mathbf u$ is a normalized eigen vector of $\mathbf V$
Discrete values $\tilde{\boldsymbol \xi}$로부터 the eigen function $\xi$의 approximate을 구하기 위해서는 interpolation을 해야함
- 만약 the discretization values $s_j$의 간격이 좁다면(closely spaced), interpolation method 선택이 큰 효과를 주지 못한다.(어떤 방법을 쓰더라도 비슷비슷하다)

Eigen equation을 discrete or matrix form으로 줄이는 방법은 각 function $x_i$를 알려진 basis functions $\phi_k$의 linear combination으로 표현하는 것이다.
# of basis functions $K$ 결정을 위한 고려사항
- original data의 discrete sampling points의 개수 $n$을 어떻게 결정할지?
- $K<n$ 일 때, 어느 정도의 smoothing level을 사용할지?
- basis function이 original function(curve)을 estimate할 때, 얼마나 efficient하고 powerful한지?

Computation

Each function has basis expansion

$$ x_i(t)=\sum_{k=1}^K c_{ik}\phi_k(t) $$
Matrix version (the simultaneous expansion of all $N$ curves)

$$ \mathbf x = \mathbf {C\phi} $$

where the vector-valued function $\mathbf x=(x_1,…,x_N)'$

and the vector-valued function $\mathbf \phi=(\phi_1,…,\phi_K)'$

and the coefficient matrix $\mathbf C$ is $N \times K$ matrix
Variance-covariance function (matrix form; $N^{-1}X’X$ 형태)

$$ v(s,t)=\frac{1}{N}\boldsymbol \phi(s)'\mathbf{C’C} \boldsymbol\phi(t) $$

Define the order $K$ symmetric matrix $W$ to have entries(성분)

$$ w_{k_1,k_2}=\int \phi_{k_1}\phi_{k_2} $$

or

$$ \mathbf W=\int \boldsymbol{\phi\phi'} \ \text{(matrix version)} $$

The eigen function $\xi$ for the eigen equation has an expansion

$$ \xi(s)=\sum_{k=1}^Kb_k\phi_k(s) $$

or

$$ \boldsymbol \xi(s) = \boldsymbol \phi(s)'\mathbf b \ \text{(matrix version)} $$

This yields

$$ \begin{align} \int v(s,t)\xi(t)dt &= \int \frac{1}{N}\boldsymbol \phi(s)'\mathbf{C’C} \boldsymbol\phi(t)\boldsymbol \phi(t)'\mathbf b dt \\ &= \boldsymbol \phi(s)‘N^{-1}\mathbf{C’CWb} \end{align} $$

Therefore the eigen equation $\int v(s,t)\xi(t)dt=\rho \xi(s)$ can be expressed as

$$ \boldsymbol \phi(s)‘N^{-1}\mathbf{C’CWb} = \rho\phi(s)'\mathbf b $$

Since this equation must hold for all $s$, this implies the purely matrix equation

$$ N^{-1}\mathbf{C’CWb} = \rho \mathbf b $$

But $ \lVert \xi \rVert=1$ 을 통해 다음을 알 수 있다.

$ \lVert \xi \rVert=1\Leftrightarrow \mathbf{b’Wb}=1$
two functions $\xi_1$ and $\xi_2$ will be orthogonal $iff$ the corresponding vectors of coefficients satisfy $\mathbf {b_1’Wb_2}=0$
To get the required PCs, define $\mathbf u = \mathbf{W^{1/2}b}$, solve the equivalent symmetric eigenvalue problem

$$ N^{-1}\mathbf{W}^{1/2}\mathbf{C’C}\mathbf{W}^{1/2}\mathbf{u} = \rho \mathbf u $$

and compute $\mathbf {b=W}^{-1/2}\mathbf{u}$ for each eigen vector.

2가지 경우, 특별한 주의를 기울여야한다.
the basis($ \phi $)가 orthonormal ($\mathbf W=I$)인 경우, functional PCA problem은 결국 coefficient array $\mathbf C$의 standard multivariate PCA로 감소하게 되고, order $K$ symmetric array $N^{-1}\mathbf{C’C}$의 eigen analysis만 수행하면 된다. ($N^{-1}\mathbf{C’C}\mathbf{u} = \rho \mathbf u$)
Observed function의 개수가 너무 많지 않을 경우, observed function $x_i$를 their own basis expansion으로 보는 것이다. (즉, $N$개의 basis functions가 존재하고 그것이 observed functions이다.)

즉, $\mathbf{C=I}$가 되고, problem은 $w_{ij}=\int x_i x_j$의 entry(성분)를 가지는 symmetric matrix $N^{-1}\mathbf W$의 eigen analysis 중 하나가 된다. ($N^{-1}\mathbf{Wu}=\rho\mathbf{u}$)
위의 entry $w_{ij}(=\int x_ix_j)$는 quadrature technique(구적법)으로 계산 가능
Basis function approach로 구할 수 있는 eigen function의 최대 개수는 원칙적으로 basis의 dimension인 $K$이다.
그러나 basis expansion이 observed functions의 근사치를 포함하고 있다면, $K$ eigen functions의 small proportion을 계산하기 위해 $K$항까지 확장하는 것은 적절하지 않다.

Numerical integration or quadrature

$$ \int f(s) ds \approx \sum_{j=1}^nw_jf(s_j) $$
다양한 objectives에 적용하도록 조작할 수 있는 근사치의 3가지 측면
- $n$ : the number of discrete argument values $s_j$
- $s_j$ : the argument values, called $quadrature \ points$
- $w_j$ : the weights, called $quadrature \ weights$, attached to each function value in the sum.

$Trapezoidal \ rule$ (사다리꼴 공식)

Integration interval을 너비 $h$의 equal space로 나누는 방법 ($n-1$ intervals)
$s_j$ : interval의 boundaries, $s_1$ and $s_n$ (lower and upper limits of integration)
The approximation is

$$ \int f(s)ds \approx h[f(s_1)/2+\sum_{j=2}^{n-1}f(s_j)+f(s_n)/2] $$
weights $w_j$ : $h/2,h,…,h,h/2$
$n$에 의해 accuracy가 결정됨
Some important advantages
- raw data는 보통 equally spaced value
- weight는 trivial(사소한)
- 다른 방법들에 비해 accuracy가 낮을 수 있지만, objectives에 충분히 수렴
Section 8.4.1이 trapzoidal rule과 비슷하고 periodic boundary conditions를 사용하면 같은 방법이 된다. ($\because$ $f(s_n)$ and $f(s_1)$ are identical)

Gaussian quadrature schemes

Integrand(피적분함수)의 적절한 additional conditions 하에, fixed $n$에서 더 높은 accuracy를 구하는 quadrature weights와 points를 정의
또 다른 절차는 피적분함수의 curvature(곡률)이 높은 지역에서 잘 분해될 수 있게 적절한 the quadrature points(사분위점)를 고르는 것이다.
즉, 분석에서 고려되는 모든 함수에서 the quadrature points(사분위점)을 정해야한다.
(8.17)에 quadrature schemes를 적용하면

$$ V\xi \approx \mathbf{VW}\tilde{\boldsymbol{\xi}} $$

where matrix $\mathbf V$ contains the values $v(s_j,s_k)$ of the covariance function at the quadrature points

and $\tilde{\boldsymbol{\xi}} = (\xi(s_1),…,\xi(s_n))$

and matrix $\mathbf W$ is a diagonal matrix with diagonal values being the quadrature weights $w_j$

The approximately matrix eigen analysis problem is

$$ \mathbf{VW}\tilde{\boldsymbol{\xi}}=\rho \tilde{\boldsymbol{\xi}} $$

where the orthonormality condition is

$$ \tilde{\boldsymbol \xi}_m’\mathbf{W}\tilde{\boldsymbol \xi}_m = 1 \text{ and } \tilde{\boldsymbol \xi}_{m_1}’ \mathbf{W} \tilde{\boldsymbol \xi}_{m_2} = 0, ~ m_1 \ne m_2 $$

대부분의 quadrature schemes는 positive weights를 쓰기 때문에 8.4.2의 계산과 유사하게 approximate eigen equation의 standard form을 얻을 수 있다.

$$ \mathbf{W^{1/2}VW^{1/2}u}=\rho\mathbf u $$

where $\mathbf u=\mathbf W^{1/2} \tilde{\boldsymbol{\xi}}$ and $\mathbf{u’u}=1$

Procedure
1. Choose $n$, the $w_j$’s, and the $s_j$’s
2. Compute the eigen values $\rho_m$ and eigen vectors $\mathbf u_m$ of $\mathbf{W^{1/2}VW^{1/2}}$
3. Compute $\tilde{\boldsymbol{\xi}}_m=\mathbf{W^{-1/2}u}_m$
4. If needed, use an interpolation technique to convert each vector $\tilde \xi_m$ to a function $\xi_m$
If $n \ll N$, $n$ 이상의 approximate eigen functions을 구할 수 없다. ($n$ : # of quadrature points, $N$ : # of curves)
그러나 대부분의 PCA 방법이 적은 개수의 leading eigen functions이 필요하고, 합리적으로 large $n$이 유용하다.

8.5.1 Defining multivariate functional PCA

Set bivariate functional data using the hip and knee angle data
Notations
- $\texttt{Hip}_1,\texttt{Hip}_2,…,\texttt{Hip}_n$ : the observed hip angle curves
- $\texttt{Knee}_1,\texttt{Knee}_2,…,\texttt{Knee}_n$ : the observed knee angle curves
- $\texttt{Hipmn}$ and $\texttt{Kneemn}$ : estimates of the mean function of $\texttt{Hip}$ and $\texttt{Knee}$ processes
- $v_{HH}$ and $v_{KK}$ : the covariance operator of the $\texttt{Hip}_i$ and $\texttt{Knee}_i$
- $v_{HK}$ : the cross-covariance function, $v_{KH}(t,s)=v_{HK}(s,t)$
- $\xi=(\xi^H,\xi^K)'$ : a principal component of weight function with $\xi^H$(variation in the $\texttt{Hip}$ curves) and $\xi^K$(variation in the $\texttt{Knee}$ curves)
- $\texttt{Angles}_i=(\texttt{Hip}_i,\texttt{Knee}_i)$
Defininiton of an inner product between bivariate functions
- 각 inner product(squared norm)의 합으로 표현
$$ \langle \xi_1,\xi_2 \rangle = \int \xi_1^H\xi_2^H + \int \xi_1^K \xi_2^K $$
- The weighted linear combination
  
  $$ f_i = \langle \xi, \texttt{Angles}_i \rangle = \int \xi^H\texttt{Hip}_i + \int \xi^K \texttt{Knee}_i $$
The eigen equation ($V\xi=\rho\xi$)

$ \int v_{HH}(s,t)\xi^H(t)dt + \int v_{HK}(s,t)\xi^K(t)dt=\rho\xi^H(s) $

$ \int v_{KH}(s,t)\xi^H(t)dt + \int v_{KK}(s,t)\xi^K(t)dt=\rho\xi^K(s) $
적당한 expansion에 있어 points or coefficients의 fine grid에서 각 함수 $\texttt{Hip}_i$와 $\texttt{Knee}_i$를 vector로 대체하여 계산
- $Z_i$ : a single long vector concatenated two vectors, $\texttt{Hip}_i$ and $\texttt{Knee}_i$
$Z_i$에 standard PCA를 적용하고 result PC vectors을 $\texttt{Hip}$과 $\texttt{Knee}$에 대응되는 부분으로 분리
필요하다면 적당한 inverse transform을 적용하여 분석을 완료
만약 한 curve의 variability가 다른 것들에 비해 상당히 클 경우, 대응되는 inner product term의 down-weighting을 고려하고 나머지 절차에서 변경된 부분을 고려하는 것이 바람직하다.

functional data analysis approach to PCA의 한가지 특징은 inner product가 적절히 정의되면 multivariate analysis의 conventional vectors이든 8.2.2의 scalar function이든 8.5.1의 vector-valued functions이든 PCA는 같게 된다.

Reference

Ramsay. & Silverman. (2005), Functional Data Analysis 2nd edition. Springer