Vertex Component Analysis in Hyperspectral Unmixing

Edward Leaver
Icarus Resources LLC

July 10, 2013
Best viewed with MathJax or Firefox

1 Introduction
2 Image Classification and Statistics
  2.1 Image Mean Vector and Covariance Matrix
  2.2 Principal Component Transformation
  2.3 Minimum Noise Fraction
3 Virtual Dimensionality
  3.1 HFC
  3.2 NWHFC
  3.3 NSP
  3.4 OSP
4 Vertex Component Analysis
  4.1 VCA Algorithm

1 Introduction

Remote Sensing Analyis of Alteration Minerology Associated With Natural Acid Drainage
in the Grizzly Peak Caldera, Sawatch Range, Colorado
David W. Coulter
Ph.D. Thesis Colorado School of Mines, 2006

[Picture]

The Gizzly Peak Caldera is located Colorado’s Mineral Belt approximately 15 miles southeast of Aspen. It is the product of a volcanic eruption $\sim 35$ million years ago that ejected some 600 ${km}^{3}$ of magma. With a volcanic explosive index of 7, it was at least four times larger than Tambora.[3, sec 1.5]

Figure 1: West Red (Ruby Mountain) From Enterprise Peak

Coulter sought to identify acidic conditions by the different weathering states of iron oxide, which forms Hematite (red) in acidic conditions, and Goethite (lite yellow) and Jarosite (dark yellow) as pH increases, Jarosite being the most immediate oxidation product of iron sulfide (pyrite).

Figure 2: West Red Iron Endmembers from Aviris:

Red: Hematite	Green: Goethite	Blue: Jarosite
(high pH)		(low pH)

Note on Partial Unmixing:

”Unmixing is critical in imaging spectrometry since virtually every pixel contains some macroscopic mixture of materials. The theory of, and methods for unmixing of spectroscopic signatures are found in a number of sources. Hapke (1993) provides models for linear and non-linear spectral mixing and a discussion of the criteria for using each approach. In Earth remote sensing, a linear mixing model is typically used. Full unmixing which assumes that the spectra of all possible components are known, is described by van der Meer (2000) and Boardman (1989). Since it is often impossible to identify, a priori, all possible components, partial unmixing is an important tool. Match Filtering (MF) (Harsanyi and Chang 1994), Constrained Energy Minimization (CEM – similar to Match Filtering) (Farrand and Harsanyi 1997; Farrand 2001), and Mixture Tuned Match Filtering ( ${MTMF}^{tm}$ ) (Boardman et al. 1995) are commonly used methods for partial unmixing. ${MTMF}^{tm}$ is probably the most popular unmixing method used for geologic remote sensing.”[3, Coulter sec 1.4.4]

(Emphasis Added.)

2 Image Classification and Statistics

2.1 Image Mean Vector and Covariance Matrix

Image Mean Vector and Covariance Matrix [10, White 2005] If image has $P$ bands and $N$ pixels, the mean vector is

{\vec{m}}_{p} = {(m_{1}, m_{2}, . . . m_{P})}^{T} = \frac{1}{N} \sum_{j = 1}^{N} {\vec{f}}_{j}

(1)

where ${\vec{f}}_{j}$ is the jth pixel vector of the image

{\vec{f}}_{j} = {(f_{i}, f_{2}, . . . f_{P})}_{j}^{T}

(2)

The image covariance matrix $C_{PxP}$ is[2, Chang 2013 (6.6)]

C_{PxP} = \frac{1}{N} (F_{PxN} - M_{PxN}) {(F_{PxN} - M_{PxN})}^{T}

(3)

where $F_{PxN}$ is the matrix of $N$ pixel vectors each of length $P$

F_{PxN} = {\vec{f}}_{1}, {\vec{f}}_{2}, {\vec{f}}_{3}, . . ., {\vec{f}}_{N}

(4)

$M_{PxN}$ is the matrix of $N$ identical mean vectors ( $P$ rows by $N$ columns):

M_{PxN} = {\vec{m}}_{P}, {\vec{m}}_{P}, {\vec{m}}_{P}, . . ., {\vec{m}}_{P} = {\vec{m}}_{P} 1_{PxN}

(5)

where $1_{PxN}$ is an $P \times N$ matrix of ones.

2.2 Principal Component Transformation

Principal Component Transformation [7, Smith 1985] Karhunen-Loeve Transformation[11, White 2005] GRASS imagery i.pca: Let

X_{PxN} = (F_{PxN} - M_{PxN}) zero-mean image matrix

(6)

Z_{PxN} = A_{PxP} X_{PxN}

(7)

$F_{PxN}$: input-image multi-pixel vector ( $P$ bands by $N$ pixels)
$M_{PxN}$: mean vector matrix,
$Z_{PxN}$: output-image multi-pixel vector,
$A_{PxP}$: $P \times P$ matrix whose rows are the eigenvectors of the covariance matrix $C_{PxP}$ , arranged by decreasing magnitude of eigenvalue, as typically returned by SVD routines.

\begin{array}{rcl} Z_{i k} & = & \sum_{j = 1}^{P} a_{i j} X_{j k} i = 1, 2 \dots P; k = 1, 2 \dots N & (8) \\ {\vec{Z}}_{k}^{T} & = & A_{PxP} {\vec{X}}_{k}^{T} orthogonal w.r.t the N pixels: & (9) \end{array}

\begin{array}{rcl} λ_{i} δ_{i l} & = & \sum_{k = 1}^{N} Z_{i k} Z_{l k} & (10) \\ λ_{i} & = & \sum_{k = 1}^{N} (\sum_{j = 1}^{P} a_{i j} X_{j k}) (\sum_{m = 1}^{P} a_{l m} X_{m k}) & (11) \\ = & \sum_{j = 1}^{P} \sum_{m = 1}^{P} a_{i j} a_{l m} [\sum_{k = 1}^{N} X_{j k} X_{m k}] & (12) \\ \equiv & \sum_{j = 1}^{P} \sum_{m = 1}^{P} a_{i j} a_{l m} C_{j m} & (13) \\ = & a_{i}^{T} C_{PxP} a_{l} & (14) \end{array}

$C_{PxP}$	the symmetric positive-definite image covariance matrix
${\vec{a}}_{i}$	are its orthonormal eigenvectors with eigenvalue $λ_{i}$ .

Magnitudes of $λ_{i}$ impose ordering on transformed component vectors $Z_{i k} = \sum_{j = 1}^{P} a_{i j} X_{j k}$ . Those with the largest $λ_{i}$ s.t. $λ_{i} ∕ λ_{m a x} > t o l$ are the Principal Components. Tolerance tol should be related to the noise floor.

2.3 Minimum Noise Fraction

Minimum Noise Fraction[9, pg. 38] [4, ] We wish to find a particular coefficient matrix ${a_{i j}; i, j = 1, \dots P}$ that in some sense maximizes the image S/N, assuming the image pixel vectors ${{\vec{X}}_{k}, k = 1, \dots N}$ are the sum of uncorrelated signal and noise:

\begin{array}{rcl} X_{i k} & = & S_{i k} + N_{i k} i = 1, \dots P k = 1, \dots N & (15) \\ Z_{i k} & = & \sum_{j = 1}^{P} a_{i j} X_{j k} = \sum_{j = 1}^{P} a_{i j} (S_{j k} + N_{j k}) & (16) \\ Z_{i}^{T} & = & a_{i}^{T} X_{PxN} where a_{i}^{T} = (a_{i 1}, a_{i 2}, \dots a_{i P}) & (17) \\ = & a_{i}^{T} S_{PxN} + a_{i}^{T} N_{PxN} & (18) \end{array}

Maximize

R = \frac{V a r (Z_{i}^{signal})}{V a r (Z_{i}^{noise})} = \frac{(a_{i}^{T} S_{PxN}) {(a_{i}^{T} S_{PxN})}^{T}}{(a_{i}^{T} N_{PxN}) {(a_{i}^{T} N_{PxN})}^{T}}

(19)

\begin{array}{rcl} R & = & \frac{(a_{i}^{T} S_{PxN}) {(a_{i}^{T} S_{PxN})}^{T}}{(a_{i}^{T} N_{PxN}) {(a_{i}^{T} N_{PxN})}^{T}} = \frac{a_{i}^{T} (S_{PxN} S_{PxN}^{T}) a_{i}}{a_{i}^{T} (N_{PxN} N_{PxN}^{T}) a_{i}} & (20) \\ = & \frac{a_{i}^{T} C_{PxP}^{S} a_{i}}{a_{i}^{T} C_{PxP}^{N} a_{i}} = \frac{a_{i}^{T} (C_{PxP} - C_{PxP}^{N}) a_{i}}{a_{i}^{T} C_{PxP}^{N} a_{i}} & (21) \\ = & \frac{a_{i}^{T} C_{PxP} a_{i}}{a_{i}^{T} C_{PxP}^{N} a_{i}} - 1; IF C_{PxP} = C_{PxP}^{S} + C_{PxP}^{N} & (22) \\ = & λ_{i} - 1 & (23) \end{array}

where $λ_{i}$ is generalized eigenvalue of $C_{PxP}$ wrt $C_{PxP}^{N}$ , and $a_{i}$ are corresponding generalized eigenvectors. Compare with PCA:

λ_{i}^{P C A} = a_{i}^{T} C_{PxP} a_{l} (eq. 14)

(24)

Noise Covariance $C_{PxP}^{N}$ Green suggests $C_{PxP}$ be of unit variance and band-to-band uncorrelated.

Sensor error estimate: (Aviris .rcc file) Call supplied error vector ${e_{j}; j = 1, \dots P}$ . Then
$C_{l m}^{N} = e_{m}^{n} δ_{l m} (n = 1 or 2)$ (25)

is completely uncorrelated. In ideal case all $e_{m}$ are equal:
$\begin{array}{rcl} C_{l m}^{N} & = & e I, and & (26) \\ a_{i}^{T} C_{PxP} a_{i}^{T} & = & λ_{i} a_{i}^{T} C_{PxP}^{N} a_{i} = λ_{i} e (a_{i}^{T} I a_{i}) & (27) \\ = & λ_{i} e (a_{i}^{T} \cdot a_{i}) = λ_{i} e = λ_{i}^{P C A} & (28) \end{array}$

since the eigenvectors $a_{i}$ are orthonormal. $λ_{i} = λ_{i}^{P C A}$ if the variance $e \equiv 1$ .

Homogeneous Area Method [8, sec 2.9.1] If possible find homogenous area of $N_{h}$ pixels in image:

\begin{array}{rcl} {\vec{M}}_{L} & = & \frac{1}{N_{k}} \sum_{k = 1}^{N_{h}} {\vec{X}}_{k} local mean, vector over bands & (29) \\ {\vec{σ}}_{L} & = & \frac{1}{N_{k} - 1} [\sum_{k = 1}^{N_{h}} {({\vec{X}}_{k} - {\vec{M}}_{L})}^{2}]^{1 ∕ 2} & (30) \\ σ_{L i} & = & \frac{1}{N_{k} - 1} {[\sum_{k = 1}^{N_{h}} {(X_{i k} - M_{L i})}^{2}]}^{1 ∕ 2} & (31) \\ C_{L i j} & = & \frac{1}{N_{k} - 1} \sum_{k = 1}^{N_{h}} (X_{i k} - M_{L i}) (X_{j k} - M_{L j}) (general) & (32) \\ = & \frac{δ_{i j}}{N_{k} - 1} \sum_{k = 1}^{N_{h}} {(X_{i k} - M_{L i})}^{2} (zero band-to-band) & (33) \end{array}

Local Means and Local Variances [8, sec 2.9.2]

1.

Divide image into small

N_{b}

pixel blocks (4x4, 5x5,...)

2.

For each (block, band) get local mean and variance:

\begin{array}{rcl} M_{L i} & = & \frac{1}{N_{b}} \sum_{k = 1}^{N_{b}} X_{i k} i = 1, 2, \dots P bands & (34) \\ {σ_{L i}}^{2} & = & \frac{1}{N_{b} - 1} \sum_{k = 1}^{N_{b}} {(X_{i k} - M_{L i})}^{2} & (35) \\ C_{L i j} & = & \frac{(δ_{i j})}{N_{k} - 1} \sum_{k = 1}^{N_{b}} (X_{i k} - M_{L i}) (X_{j k} - M_{L j}) & (36) \end{array}

$C$ is the local $N_{b} \times N_{b}$ covariance matrix.

3.

bin

{{σ_{L i}}^{2}}

into classes between band min and max values.

4.

bin with most blocks represents mean noise of image. Hope this bin is same for all bands...

Local Means and Local Variances (con’t)

5.

Suppose ”most popular” bin is a P-cube, each side ranging

[σ_{i}^{*}, σ_{i}^{*} + △ σ_{i}^{*}]

, contains

N^{*}

points. Then the average value over the bin

\bar{C_{L i j}} = \frac{1}{N^{*}} \sum_{k = 1}^{N^{*}} {(C_{L i j})}_{k}

(37)

is desired noise covariance matrix.

6.

Caveat: Assumes image is ”slowly varying enough” that enough of the

N_{total}

blocks are homogeneous, i.e. their covariance is really due to noise, not true features in the image. Blocks and bins must both be ”small enough” – but not too small!

Other methods: ”unsupervised training” derived-endmember classification schemes e.g. LAS’ search ([12, ]) and GRASS’ cluster/maxlik are based upon local covariance minimization.

3 Virtual Dimensionality

We implement the Harsani-Farrand-Chang (HFC) and noise-whitened Harsani-Farrand-Chang (NWHFC) esimators of the image virtual dimensionality VD = $p \leq L$ as described in [1, C.-I Chang and Du 2004] and [2, C.-I Chang 2013] Chapter 5.3.6.

3.1 HFC

Let $R_{LxN} \equiv [r_{1}, r_{2}, \dots r_{N}]$ be the input image matrix of $N$ pixel (sample) vectors $r_{n}$ each of length $L$ spectral bands: $r_{i} = {(r_{i 1}, r_{i 2}, \dots r_{i, L})}^{T}; 1 \leq i \leq N$ . The image spectral correlation matrix $R_{LxL}$ and covariance matrix $K_{LxL}$ are then

\begin{array}{rcl} R_{LxL} & = & \frac{1}{N} R_{LxN} R_{LxN}^{T} & (38) \\ K_{LxL} & = & \frac{1}{N} (R_{LxN} - μ) {(R_{LxN} - μ)}^{T} where & (39) \\ μ_{l} & = & \frac{1}{N} \sum_{i = 1}^{N} r_{i l} is the vector of image spectral means & (40) \end{array}

Let ${{\hat{λ}}_{1} \geq {\hat{λ}}_{2} \geq \dots \geq {\hat{λ}}_{L}}$ and ${λ_{1} \geq λ_{2} \geq \dots \geq λ_{L}}$ denote the (ordered) correlation and covariance eigenvalues. By assuming that signal sources are nonrandom unknown positive constants and noise is white with zero mean, we may expect

\begin{array}{rcl} {\hat{λ}}_{l} & > & λ_{l} for l = 1, \dots, VD & (41) \\ {\hat{λ}}_{l} & = & λ_{l} = σ_{n l}^{2} for l = VD + 1, \dots, L & (42) \\ (43) \end{array}

where $σ_{n l}^{2}$ is the noise variance in the $l$ th spectral channel. Formulate the VD determination as a binary hypothesis problem:

\begin{array}{rcl} H_{0} : z_{l} & = & {\hat{λ}}_{l} - λ_{l} = 0 versus & (44) \\ H_{1} : z_{l} & = & {\hat{λ}}_{l} - λ_{l} > 0 for l = 1, 2, \dots, L & (45) \end{array}

The null hypothesis $H_{0}$ and the alternative hypothesis $H_{1}$ represent the case that the correlation eigenvalue is equal to its corresponding covariance eigenvalue (no signal), and the case that the corrleation eigenvalue is greater than its corresponding covariance eigenvalue. When $H_{1}$ is true (i.e. $H_{0}$ fails), it implies there is an endmember contributing to the correlation eigenvalue in addition to noise, since the noise energy represented by the eigenvalue of $R_{LxL}$ in that particular component is the same as the one represented by the eigenvalue of $K_{LxL}$ in its corresponding component.

If we assume the noise in each spectral dimension has zero mean and varance $σ_{n l}^{2}$ , then ${\hat{λ}}_{l} = μ_{l}^{2} + σ_{n l}^{2}$ and $λ_{l} = σ_{n l}^{2}$ where $μ_{l}$ is the sample mean in the $l$ th spectral dimension and $σ_{n l}^{2}$ is the channel noise. Despite the fact the $\hat{λ}$ and $λ$ are unknown constants, we can model each pair of eigenvalues $(\hat{λ}, λ)$ under hypothesis $H_{0}$ and $H_{1}$ as random variables with asymptotic probability densities given by

\begin{array}{rcl} p_{0} (z_{l}) & = & p (z_{l} | H_{0}) ≃ N (0, σ_{z l}^{2}) and & (46) \\ p_{1} (z_{l}) & = & p (z_{l} | H_{1}) ≃ N (μ_{l}, σ_{z l}^{2}) for l = 1, 2, \dots, L & (47) \\ N (μ, σ^{2}) & = & \frac{1}{σ \sqrt{2 π}} \exp [\frac{- {(x - μ)}^{2}}{2 σ^{2}}] & (48) \end{array}

respectively, where $μ_{l}$ is an unknown constant for each $l$ and the variance $σ_{z l}^{2}$ is given by

σ_{z l}^{2} = Var [{\hat{λ}}_{l} - λ_{l}] = Var [{\hat{λ}}_{l}] + Var [λ_{l}] - 2 Cov ({\hat{λ}}_{l}, λ_{l})

(49)

“When the total number of samples $N$ is sufficiently large, $Var [{\hat{λ}}_{l}] ≃ 2 {\hat{λ}}_{l}^{2} ∕ N$ and $Var [λ_{l}] ≃ 2 λ_{l}^{2} ∕ N$ . Therefore, the noise variance $σ_{z l}^{2}$ in (42) can be estimated and approximated using (49).” (Although its not obvious exactly how. Play statisitics on collections of subimages, perhaps? In his MATLAB code, Chang approximates $σ_{z l}^{2} ≃ (2 ∕ N) ({\hat{λ}}_{l}^{2} + λ_{l}^{2})$ , so we do as well.)

Chang and Du ([1, eq. 8]) use Schwartz’ inequality to bound

\begin{array}{rcl} Cov ({\hat{λ}}_{l}, λ_{l}) & \leq & \sqrt{Var [{\hat{λ}}_{l}] Var [λ_{l}]} ≃ \frac{2}{N} ({\hat{λ}}_{l} \cdot λ_{l}) so that & (50) \\ σ_{z l}^{2} & \geq & \frac{2}{N} {({\hat{λ}}_{l} - λ_{l})}^{2} & (51) \\ σ_{z l}^{2} & \leq & \frac{2}{N} ({\hat{λ}}_{l}^{2} + λ_{l}^{2}) & (52) \\ (53) \end{array}

From (46), (47), and (49), we define the false alarm probability and detection probability as

\begin{array}{rcl} P_{F} & = & \int_{τ_{l}}^{\infty} p_{0} (z) d z & (54) \\ P_{D} & = & \int_{τ_{l}}^{\infty} p_{1} (z) d z & (55) \end{array}

We then choose a given false-alarm probability $P_{F}$ (for example 0.001) and invert (54) to give

\begin{array}{rcl} τ_{l} & = & σ_{l} \sqrt{2} erfc_inv (2 P_{F}) where & (56) \\ σ_{l} & ≃ & \sqrt{2 ({\hat{λ}}_{l}^{2} + λ_{l}^{2}) ∕ N} & (57) \end{array}

A case of ${\hat{λ}}_{l} - λ_{l} > τ_{l}$ fails the null hypothesis test, and indicates there is signal energy assumed to contribute to the eigenvalue ${\hat{λ}}_{l}$ in the $l$ th spectral dimension. Do note the threshold $τ_{l}$ is different for each spectral dimension $l$ .

3.2 NWHFC

As noted by Chang ([2, Section 5.3.6]), the signature variance $σ_{s l}^{2}$ is generally very small and can be influenced by interband covariances. Interband covariance may be reduced by “noise whitening”: preprocessing by multiplying the input image by a noise covariance matrix estimated by a technique developed by Roger and Arnold, which may be performed prior to the HFC method. This is discussed far too sparsely in ([1, Chang and Du]) and hardly at all in ([2, Chang 2013]), so we’ll merely reference the former and translate Matlab code from the latter (Section A.1).

Noise Estimation :

\begin{array}{rcl} K_{inv} & = & K_{LxL}^{- 1} & (58) \\ κ & = & diag (K_{inv}) & (59) \\ K_{noise} & = & diag (1 . ∕ κ) & (60) \\ (61) \end{array}

The noise covariance matrix $K_{noise}$ is the LxL diagonal matrix whose elements are the inverse of the diagonal elements of $K_{LxL}^{- 1}$ . For noise whitening we want the square root of its inverse, denoted $K_{n}^{- (1 ∕ 2)}$ , also diagonal:

\begin{array}{rcl} K_{n}^{- (1 ∕ 2)} & = & {(\sqrt{K_{noise}})}^{- 1} & (62) \\ {K_{n}^{- (1 ∕ 2)}}_{l, l} & = & \sqrt{K_{l, l}^{- 1}} & (63) \\ Y_{LxN} & = & K_{n}^{- (1 ∕ 2)} R_{LxN} & (64) \\ VD & = & HFC (Y, t) & (65) \end{array}

$Y$ is the noise-whitened image matrix. The NWHFC VD estimate is obtained simply by running the HFC algorithm on $Y$ rather than $R$ .

3.3 NSP

The Noise Subspace Projection method ([1, ]) estimates VD solely from the spectral covariance matrix $K_{LxL}$ . Let the matrix $K_{n}^{- (1 ∕ 2)}$ be as above and define the noise-whitened covariance matrix

\begin{array}{rcl} \bar{K} & = & K_{n}^{- (1 ∕ 2)} K_{LxL} K_{n}^{- (1 ∕ 2)} & (66) \end{array}

As result, the noise variance of each band in the whitened $\bar{K}$ is reduced to unity. Let ${u_{l}}_{l = 1}^{L}$ be a set of eigenvectors generated by $\bar{K}$ , which can then be expressed as

\begin{array}{rcl} \bar{K} & = & \sum_{l = 1}^{VD} {\bar{λ}}_{l} u_{l} u_{l}^{T} + \sum_{l = VD + 1}^{L} {\bar{λ}}_{l} u_{l} u_{l}^{T} & (67) \end{array}

where ${u_{l}}_{l = 1}^{VD}$ and ${u_{l}}_{l = VD + 1}^{L}$ span signal subspace and noise subspace respectively. The variances of the noise components of (67) have been whitened and normalized to unity, so ${\bar{λ}}_{l} = 1; l = VD + 1, \dots, L$ and $\bar{λ} > 1$ otherwise. The problem of VD estimation can then be formulated as a binary hypothesis testing problem:

\begin{array}{rcl} H_{0} : y_{l} & = & {\bar{λ}}_{l} = 1 versus & (68) \\ H_{1} : y_{l} & = & {\bar{λ}}_{l} > 1 for l = 1, 2, \dots, L & (69) \end{array}

where

\begin{array}{rcl} p_{0} (y_{l}) & = & p (y_{l} | H_{0}) ≃ N (1, σ_{y l}^{2}) and & (70) \\ p_{1} (y_{l}) & = & p (y_{l} | H_{1}) ≃ N (μ_{l}, σ_{y l}^{2}) for l = 1, 2, \dots, L & (71) \\ N (μ, σ^{2}) & = & \frac{1}{σ \sqrt{2 π}} \exp [\frac{- {(x - μ)}^{2}}{2 σ^{2}}] & (72) \end{array}

where $μ_{l}$ is an unknown constant for each $l$ and the variance $σ_{y l}^{2}$ is given by

σ_{y l}^{2} = Var [{\bar{λ}}_{l}] ≃ \frac{2 {\bar{λ}}_{l}^{2}}{N}

(73)

which can be further reduced under hypothesis $H_{0}$ to

σ_{y l}^{2} ≃ \frac{2}{N}

(74)

Finally, find the Neyman-Pearson detector $δ_{NP}$ for (68) to determine VD. The false-alarm probability $P_{F}$ is the probability that we think we’ve detected a signal when none was actually present. In terms of eigenvalues ${\bar{λ}}_{l}$ this the probability of measuring a value ${\bar{λ}}_{l} > 1$ when the “true” value was its minimum noise-only value ${\bar{λ}}_{l} = 1$ . For a desired $P_{F}$ we seek a threshold $τ_{l}$ such that

\begin{array}{rcl} P_{F} & = & \int_{1 + τ_{l}}^{\infty} p_{0} (y) d y ≃ \int_{1 + τ_{l}}^{\infty} N (1, σ_{l}^{2}) d y & (75) \\ = & \frac{1}{σ_{l} \sqrt{2 π}} \int_{1 + τ_{l}}^{\infty} \exp [\frac{- {(y - 1)}^{2}}{2 σ_{l}^{2}}] d y & (76) \\ = & \frac{1}{σ_{l} \sqrt{2 π}} \int_{τ_{l}}^{\infty} \exp [\frac{- x^{2}}{2 σ_{l}^{2}}] d x & (77) \\ = & \frac{1}{\sqrt{π}} \int_{τ_{l} ∕ (σ_{l} \sqrt{2})}^{\infty} \exp [- z^{2}] d z & (78) \\ = & \frac{1}{2} erfc (τ_{l} ∕ (σ_{l} \sqrt{2})) & (79) \end{array}

with solution

\begin{array}{rcl} τ_{l} & = & (σ_{l} \sqrt{2}) erfc_inv (2 P_{F}) & (80) \\ ≃ & \frac{2 {\bar{λ}}_{l}}{\sqrt{N}} erfc_inv (2 P_{F}) & (81) \\ ≃ & \frac{2}{\sqrt{N}} erfc_inv (2 P_{F}) & (82) \end{array}

The virtual dimensionality VD is then the number of eigenvalues ${\bar{λ}}_{l}$ whose value exceeds the threshold $1 + τ_{l}$ : ${\bar{λ}}_{l} \geq 1 + τ_{l}; l = 1, 2, \dots, VD$ .

3.4 OSP

Orthogonal Subspace Projection ([2, 5.4.1]). OSP is a general technique that also works well as part of VCA (4).

Assume that there are $p$ signal sources with L bands, ${s_{1}, s_{2}, \dots s_{p}}$ , present in the data to be processed and every data sample vector $r_{i}$ can be expressed by a linear mixture of these p signal sources as

r_{i} = S_{p} α_{i} + n_{i}

(83)

where $S_{p} = [s_{1} s_{2} \dots s_{p}]$ is a signal matrix made up of the p signal sources, ${s_{1}, s_{2}, \dots s_{p}}$ , and $n_{i}$ can be interpreted as the noise vector or model error vector.

Let $P_{p} = S_{p} {(S_{p}^{T} S_{p})}^{- 1} S_{p}^{T}$ be the p-signal projection matrix formed by the signal matrix $S_{p}$ . It can then be used to map all data sample vectors ${r_{1}, r_{2}, \dots, r_{N}}$ into the spectral space linearly spanned by the $p$ signal sources ${s_{1}, s_{2}, \dots s_{p}}$ . In other words, every data sample vector $r_{i}$ can be expressed by a linear mixture of the $p$ signal sources, ${s_{1}, s_{2}, \dots s_{p}}$ , specified by (83) via the $p$ -signal projection matrix $P_{p}$ where the noise $n_{i}$ in (83) is included to account for the linear mixture model error:

\begin{array}{rcl} α_{i} & = & S_{p}^{- 1} (r_{i} - n_{i}) & (84) \\ = & {(S_{p}^{T} S_{p})}^{- 1} S_{p}^{T} (r_{i} - n_{i}) & (85) \end{array}

Since each sample vector produces a different model error and residual, the sample mean vector is used to represent an averaged model error and residual. In this case, from (83) the sample mean vector $μ$ can be expressed by

\begin{array}{rcl} μ & = & (1 ∕ N) [S_{p} \sum_{i = 1}^{N} α_{i} + \sum_{i = 1}^{N} n_{i}] & (86) \\ = & S_{p} ((1 ∕ N) \sum_{i = 1}^{N} α_{i}) + (1 ∕ N) \sum_{i = 1}^{N} n_{i} & (87) \\ = & S_{p} {\bar{α}}_{p} + \bar{n} & (88) \end{array}

where ${\bar{α}}_{p} = (1 ∕ N) \sum_{i = 1}^{N} α_{i}$ and $\bar{n} = (1 ∕ N) \sum_{i = 1}^{N} n_{i}$ . There are two ways to find ${VD}^{OSP}$ .

\begin{array}{rcl} OSP (p) & = & E [{(P_{p} μ)}^{T} (P_{p} μ)] or & (89) \\ OSP (p) & ≃ & {\bar{α}}_{p}^{T} S_{p}^{T} S_{p} {\bar{α}}_{p} & (90) \end{array}

without involving noise covariance matrix estimation. Theoretically, the value of OSP( $p$ ) in (89) increases as the value of $p$ increases. For any given error threshold $𝜖$ , VD can be determined by a stopping rule by $𝜖$ . The value of $p$ determining the OSP( $p$ ) in (89) or (90) is denoted by ${VD}^{OSP}$ . Two criteria are developed to detect the abrupt change of the OSP( $p$ ) value. One that is based on the gradient, denoted by “ $\nabla$ ”, is defined by

\begin{array}{rcl} {VD}_{algorithm}^{OSP, \nabla} (𝜖) & = & \arg {\min_{1 \leq p \leq L} | \frac{OSP (p + 1)}{OSP (p)} - \frac{OSP (p)}{OSP (p - 1)} | < 𝜖} . & (91) \end{array}

The other is based on the difference and is denoted by minus “-”:

\begin{array}{rcl} {VD}_{algorithm}^{OSP, -} (𝜖) & = & \arg {\min_{1 \leq p \leq L} | OSP (p + 1) - OSP (p) | < 𝜖} . & (92) \end{array}

For the difference criterion, the sample mean vector $μ$ is normalized before orthogonal projection so that the values of threshold $𝜖$ for these two criteria are comparable for analysis. The threshold $𝜖$ in (91) and (92) is generally selected according to a sudden drop or a clear gap between two consecutive values of $p$ in plots of the gradient in (91) and difference (92) versus the value of $p$ .

It should be noted the above ${VD}^{OSP}$ definitions involve two key parameters: one is the error threshold $𝜖$ ,and the other is the algorithm used to produce the $p$ -signal matrix $S_{p}$ , which is not specified in (91) and (92). We shall use “VCA” for “algorithm” at least initially, but there are other choices including target-specific algorithms such as ATGP, as well as generic SVD. See ([2, 5.4.1] for further discussion.

4 Vertex Component Analysis

Vertex Component Analysis is an “unsupervised training” derived-endmember classification scheme. We use the notation of [6, Nascimento and Dias], which is slightly different than that used in previous sections.

4.1 VCA Algorithm

Assume linear mixing, and let

: $N$ number of pixels in image
: $L$ number of spectral bands recorded at each pixel
: $p$ number of endmembers, $p \leq L$ , and usually $p ≪ L$ for hyperspectral images.
: $m_{j}$ is an endmember spectral vector of length $L$ , $1 \leq j \leq p$
: $γ_{i}$ is a scale factor modeling illumination variability due to surface topography at pixel $i$
: $α_{i} = {[α_{i 1}, α_{i 2}, . . . α_{i p}]}^{T}$ is the abundance vector containing the fractions of each endmember at pixel $i$ . Positivity: $α_{i k} \geq 0$ and $1_{p}^{T} α_{i} = 1$ , that is $\sum_{k = 1}^{p} α_{i k} = 1$ .
: $x_{i}$ is the “true” noise-free spectral vector of length $L$ at pixel $i \leq N$ .
: $M_{Lxp} = [m_{1}, m_{2}, \dots m_{p}]$ is an Lxp mixing matrix that maps an abundance vector $α$ to a “true” spectral vector $x$ .

Then the recorded spectral vector at pixel $i$ may be given by

\begin{array}{rcl} r_{i} & = & x_{i} + n_{i} = M γ_{i} α_{i} + n_{i} i = 1, \dots N & (93) \\ = & x_{i} + n_{i} = M s_{i} + n_{i} i = 1, \dots N & (94) \\ s_{i} & \equiv & γ_{i} α_{i} scaled abundance vector & (95) \end{array}

Our goal is to find a abundance vectors ${α_{i}, i = 1, \dots N}$ corresponding to some endmember set ${m_{j}, j = 1, \dots p}$ . An appropriate endmember set ${m}$ is to be determined as part of the VCA algorithm. Endmember identification – matching one or more of the VCA generated endmembers to actual specimen spectral samples such as from USGS spectral library or field ground-truth sampling – may be done in subsequent processing steps.

Since the set ${α \in ℜ^{p} : 1^{T} α = 1, α ≽ 0}$ is a simplex, the set $S_{z} = {z \in ℜ^{L} : z = M α, 1^{T} α = 1, α ≽ 0; γ \geq 0}$ is also a simplex. However, even assuming $n = 0$ , the observed vector set belongs to convex cone $C_{p} = {r \in ℜ^{L} : r = M γ α, 1^{T} α = 1, α ≽ 0; γ \geq 0}$ owing to different scale factors $γ$ .

But the projective projection of the convex cone $C_{p}$ onto a properly chosen hyperplane is a simplex with vertices corresponding to the vertices of simplex $S_{x}$ . The simplex $S_{p} = {y \in ℜ^{L} : y = r ∕ (r^{T} u), r \in C_{p}}$ is the projective projection of the convex cone $C_{p}$ onto the hyperplane $r^{T} u = 1$ , where the choice of $u$ assures there are no observed vectors $r$ parallel to the hyperplane ( orthogonal to its defining vector $u$ ): $r_{i}^{T} u ∕ | u | < | r_{i} |, i = 1, . . . N$ .

Geometry Note : I’m going to suggest there’s just a shade more going on here. Suppose we neglect noise and normalize all the pixel values such that $| r_{i} | \equiv 1$ , which is easy enough to do. Consider a three band image, such as an rgb color image. the ${r_{i}}$ are now neatly mapped onto the surface of the first (all axis non-negative) octant of the rgb unit sphere – a pixel’s intensity of any color cannot be negative. For convenience of visualization, assume we may choose $u = (1, 1, 1) ∕ \sqrt{3}$ which, if there were pure pixels lying on each unit axis ie (1,0,0), (0,1,0), (0,0,1) assures the simplex of points projected onto the $u$ plane is a pure equilateral triangle whose verteces correspond to the “1” values on the rgb unit axis.

But real photos rarely if ever span the complete rgb bandwidth. Their normalizations will usually map to some sub-patch of that first unit octant. Their projections onto the $u$ plane will still form a simplex, but will it have more vertices than a triangle? The VCA algorithm asserts “no”, and starts by finding the vertices of that projected simplex on the $u$ plane.

4.1.1 SNR

VCA algorithm accuracy is dependent on image SNR. Several ways to estimate:

Classic definition: SNR = Power(signal)/Power(noise). Select homogeneous blocks from image, areas where the image is expected to be flat so that variations are due to noise. Calculation of data SNR using a (Mean)/(Standard Deviation) method for a homogeneous target. $a_{i} = (1 ∕ N_{b}) \sum_{k = 1}^{N_{b}} r_{i k}$ is the average intensity of the i-th spectral band in the block with $N_{b}$ pixels. $σ_{i} = \sqrt{(1 ∕ N_{b}) \sum_{k = 1}^{N_{b}} {(r_{i k} - a_{i})}^{2}}$ is the standard deviation. Then $S N R_{i} = a_{i} ∕ σ_{i}$ . We may manually select homogeneous regions from the image, such as still water (lakes, ponds) or snowfields, or we may use GRASS i.cluster/i.maklik to (perhaps) find such regions automatically.
VCA does an image SVD or PCA as an initial dimensionality reduction step. We might try looking at the ratio of the Power in the SVD (and/or PCA) components with large singular values – the ones of interest – with the Power in the components with small singular values. Compare this with the ratio of the sum of the large eigenvalues with the sum of the small eigenvalues – or sums of squares of each, to see if there’s an easy way to compute an “effective SNR” that is usable to select between SVD and PCA in the first step of the VCA algorithm.
Compare the above with sensor data e.g. AVRIS .rcc files Aviris has very high SNR, over 500 on a good day[5, Sec. 3.3] Satellites e.g. Hyperion might have a tenth this, SNR is an important consideration and worth some synthetic image modelling. Nascimento and Dias use $1 0^{1.5} L$ for their ${SNR}_{th}$ and SNR estimator. We note with an SNR greater than 500 (on a good day), AVIRIS is considered extemely good.[5, Kruse JPL 2002] But with 224 bands it will spectacularly fail this particular test, as $1 0^{1.5} \times 224 = 31.62 \times 224 = 7083$ . We speculate this test is insufficent for any hyperspectral image, and something better related to the number of significant principal components (usually d) might be in order. Nascimento and Dias have quite a bit on the topics of estimating SNR and the virtual dimensionality (VD) of an image. See [6, eqs 10 - 13] and C.-I Chang and Q. Du, Estimation of number of spectrally distinct signal sources in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 608-619, Mar. 2004. [1, C.-I Chang and Du].

4.1.2 Algorithm 1: Vertex Component Analyis (VCA)

Inputs:

: $R_{LxN} \equiv [r_{1}, r_{2}, \dots r_{N}]$ is the image matrix of $N$ pixel vectors $r_{n}$ each of length $L$ spectral bands.
: $p$ , desired dimensionality of problem, the number of endmembers. $p$ may be estimated as the number of “large” eigenvalues of the image covariance matrix. $p \leq L$ .
: ${SNR}_{th}$ is the fixed threshold value $1 0^{1.5} L$ of the SNR measured wrt the signal subspace. Different values might be used for different SNR estimators. See above.

Outputs:

: $\hat{M}$ is the $L \times p$ estimated mixing matrix returned by VCA.

Notations:

{[\hat{M}]}_{:, j}

is the jth column of

\hat{M}

{[\hat{M}]}_{:, i : k}

are the ith through kth columns of

\hat{M}

{[X]}_{:, [i n d e x]}

is the matrix formed from the columns of

X

listed in index vector

[i n d e x]

A^{+}

denotes the pseudo-inverse of pxp matrix

A

as computed by SVD.

A^{+}

is what you think it is. Write the SVD as

A = U Σ V^{T}

where

Σ

is the pxp diagonal matrix whose diagonal elements are the (ordered) singular values:

Σ = diag {w_{1}, w_{2}, \dots w_{p}}

Then the pseudo-inverse may be written

A^{+} = V Σ^{+} U^{T}

where

Σ^{+} = diag {w_{1}^{+}, w_{2}^{+}, \dots w_{p}^{+}}

and the pseudo-inverse of any scalar

w^{+} = 0

w ≃ 0

and

w^{+} = w^{- 1}

otherwise. Thus

A^{+} = A^{- 1}

A

is non-singular. If

P = A A^{+}

and

Q = A^{+} A

, then

$P$ is the orthogonal projector onto the range of $A$
$Q$ is the orthogonal projector onto the range of $A^{T}$
$(I - P)$ is the orthogonal projector onto the kernel (null-space) of $A^{T}$
$(I - Q)$ is the orthogonal projector onto the kernel (null-space) of $A$
$P = A A^{+} = U Σ V^{T} V Σ^{+} U^{T} = U Σ Σ^{+} U^{T}$ and $Σ Σ^{+} = diag {1, 1, \dots 0, \dots}$ has d non-zero diagonal entries, $d \leq p$ . $P$ is a full pxp symetric matrix, none of whose components contain values from lower right p-d block of $U$ . If p=d then $P$ and $Q$ are the identity matrix.

See Moore-Penrose_pseudoinverse In particular “ the pseudoinverse for matrices related to $A$ can be computed by applying the Sherman-Morrison-Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[12][13] exist that exploit the relationship.” – a fact that might be quite useful here.

U_{d}

is an Lxd matrix whose columns are the first d eigenvectors of either

R^{T} R

(which is LxL) or the LxL spectral covariance matrix

C

U_{d}

is the left eigenmatrix computed by SVD whose columns are ordered (left to right) in order of decreasing singular value. Its first d columns are the most significant (principal) relative to the rest. We usually select d such that the singular values

ω_{i} : {i > d}

are in some sense “too small to be significant”.

1: Compute SNR and Virtual Dimensionality $p$
2: if $(SNR > {SNR}_{th})$ then

\begin{array}{rcl} d & : = & p; & (96) \\ X & : = & U_{d}^{T} R; U_{d} obtained by SVD of the correlation matrix of R . R is LxN and X is dxN & (97) \\ u & : = & mean (X); u is a 1 x d vector & (98) \\ {[Y]}_{:, j} & : = & {[X]}_{:, j} ∕ ({[X]}_{:, j}^{T} u); {projective projection} & (99) \end{array}

else

\begin{array}{rcl} d & : = & p - 1; & (100) \\ {[X]}_{:, j} & : = & U_{d}^{T} ({[R]}_{: j} - {\bar{r}}_{j}); U_{d} obtained by PCA of (R - \bar{r}); R is LxN and X is dxN & (101) \\ c & : = & \arg \max_{j = 1 \dots N} ∥ {[X]}_{:, j} ∥; & (102) \\ c & : = & [c, c \dots c]; c is a 1 x N vector & (103) \\ Y & : = & [\overset{X}{c}] & (104) \end{array}

Here ${\bar{r}}_{j}$ is the sample mean of ${[R]}_{: j}$ , for $j = 1, \dots, N$ .
end if

14: $A : = [0, 0, \dots 0];$ initialize $A$ , a pxp auxiliary matrix, and its pseudo inverse $A^{+}$
$Xcor : = dxd spectral correlation matrix of X$ , and Uxcor, its SVD eigenvectors.
15: for ( $i = 1 to p$ ) do

\begin{array}{rcl} w & : = & Uxcor [i]; {the i th left SVD eigenvector of Xcor} & (105) \\ f & : = & ((I - A A^{+}) w) ∕ (∥ (I - A A^{+}) w ∥); {f is a vector orthonormal to the subspace spanned by {[A]}_{:, 1 : i}} & (106) \\ v & : = & f^{T} Y; & (107) \\ k & : = & \arg \max_{j = 1, \dots, N} | v_{j} |; \overset{{find the projection extreme, the index of the component of Y}{which has the maximum projection in the f direction}} & (108) \\ {[A]}_{:, i} & : = & {[Y]}_{:, k}; (if i = 1 find A^{+} by SVD, else by an update algorithm) & (109) \\ {[i n d i c e]}_{i} & : = & k; {store the pixel index of the extreme, i.e. the i-th endmember} & (110) \end{array}

22: end for
Stopping Criteria: if $p$ in 15: above has been pre-selected e.g. by HFC as described in section (3.1), we’re done. But if OSP (or similar) is desired, there are other tests that might be applied.

23: if $(SNR > {SNR}_{th})$ then
24: $\hat{M} : = U_{d} {[X]}_{:, [i n d i c e]}; \hat{M}$ is a $L \times p$ estimated mixing matrix.
25: else
26: $\hat{M} : = U_{d} {[X]}_{:, [i n d i c e]} + \bar{r}; \hat{M}$ is a $L \times p$ estimated mixing matrix.
27: end if

4.1.3 VCA Programming Notes

There’s a fair bit of SVD here, we’ll use LAPACK routines LAPACKE_sgesvd(..) and friends. See http://www.netlib.org/lapack/lug/node32.html
local implementations are in /usr/lib64/liblapacke.so and /usr/include/lapacke/lapacke.h. The LAPACK ”e” variant is an Intel implementation. We might eventually compile a custom version with local compiler optimization switches, but this one will get us going. Be sure to -DHAVE_LAPACK_CONFIG_H and use the #included primitive data types.
Image comparison: do “man compare” – mathematically and visually annotate the difference between an image and its reconstruction. From ImageMagick, looks pretty sophisticated

4.1.4 Icarus Image GUI Notes

Needs:

Opts.write(output_configfile) Need to be able to save Options configuration at any time with a “save options” button. Likewise will need a “read options” button that is active whenever user wants to read in previous saved values. A separate “Options Editor” window might be appropriate. (Done)
direct hyperspectral image display in false colors. See ColorDisplayforHyperspectralImagery-TGRS-VIS2.pdf in RemoteSensing/doc. (TBD. current hyperspectral display is true-color only from visible bands)
overlay on separate high-res USGS aerial photo if available. (Done)
overlay Various virt_endmembers (virtual endmembers) in false color (TBD. Got to get VCA working for images without pure endmembers first.)
virt_endmember to chem_endmember (chemical endmembers) conversion e.g. by constreained least-squares from selected spectral libraries (TBD)
overlay Various chem_endmembers in false color on original false-color image and/or aerial photo. (TBD)
highlight “weak target” pixels. (TBD)
Wavelength (nm) to Hue: Hue = (650 - wavelength)*240/(650-475);
Adjust this formula for the different spectral ranges you want for true or false images. http://www.mathworks.com/matlabcentral/answers/17011-color-wave-length-and-hue
http://www.physics.uoguelph.ca/applets/Intro_physics/refraction/LightRefract.java (Partly done. We only specify a fixed visible range at this point (true color only).
HSV to RGB is complicated but the C code is at http://www.cs.rit.edu/ ncs/color/t_convert.html
But its also in Qt: QColor Class, whose methods also include alpha. We may also implement alpha blending in software. See setOverlayImage.cpp and alphablend.h.
QColorMap, QGraphicsScene, QImage, QImageIOHandler, QImageIOPlugin QImageReader QImageWriter
Qt has convient conversion methods.
CIE. But what you really want – conversion from spectral power distribution (SPD) to RGB is at
http://hyperphysics.phy-astr.gsu.edu/hbase/vision/cie.html
Google “CIE color matching functions”
http://jcgt.org/published/0002/02/01/
XYZ to RGB: http://www.easyrgb.com/index.php?X=MATH&H=01#text1 (Done)

References

[1] C.-I Chang and Q. Du. Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 42(3):608–619, March 2004.

[2] Chein-I Chang. Hyperspectral Data Processing: Algorithm Design and Analysis. John Wiley & Sons, 111 River St, Hoboken, NJ 07030, 2013.

[3] David W. Coulter. Remote Sensing Analysis of Alteration Mineralogy Associated with Natural Acid Drainage in the Grizzly Peak Caldera, Sawatch Range, Colorado. PhD thesis, Colorado School of Mines, Golden, Colorado, 2006.

[4] A.A. Green, M. Berman, P. Switzer, and M.D. Graig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. Journal of Geophysical Reseach, 90:797 – 804, 1988.

[5] Fred A. Kruse. Comparison of aviris and hyperion for hyperspectral mineral mapping. http://w.hgimaging.com/PDF/Kruse_JPL2002_AVIRIS_Hyperion.pdf, 2002.

[6] José M. P. Nascimento and José M. Bioucas Dias. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4), April 2005.

[7] M.O. Smith, P.E. Johnson, and J.B. Adams. Quantitative determination of mineral types and abundances from reflectance spectra using principal component analysis. IEEE Transactions on Geoscience and Remote Sensing, 36:65 – 74, 1985.

[8] Frank D. van der Meer and Steven M. de Jong. Imaging Spectroscopy. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[9] Frank D. van der Meer, Steven M. de Jong, and W. Bakker. Imaging Spectroscopy: Basic analytical techniques, pages 17–62. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[10] R. A. White. Image mean and covariance: http://dbwww.essc.psu.edu/lasdoc/user/covar.html, 2005.

[11] R. A. White. Karhunen-loeve transformation: http://dbwww.essc.psu.edu/lasdoc/user/karlov.html, 2005.

[12] R. A. White. Search unsupervised training site selection: http://dbwww.essc.psu.edu/lasdoc/user/search.html, 2005.