SSA-Opt: High-Performance Singular Spectrum Analysis

A fast, header-only C library for Singular Spectrum Analysis with Python bindings, FFT-accelerated operations, and Intel MKL optimization.

17-30× faster than Rssa (the gold-standard R implementation).

1. What is SSA?

Singular Spectrum Analysis decomposes time series into:

Trends - Long-term movements
Periodic components - Cycles, seasonality
Noise - Random fluctuations

Unlike Fourier analysis, SSA is non-parametric and adapts to your data's structure. Ideal for financial analysis, signal processing, and anywhere you need clean signal extraction.

2. Results

2.1.1 Signal Denoising: 159× Noise Reduction

2.1.2 Signal Denoising with Cadzow

SSA recovers clean signal from heavy noise (SNR: 4.1 → 26.1 dB).

2.2 Trend Extraction: Bitcoin Price Analysis

Ensemble SSA extracts smooth trend from volatile Bitcoin prices (r = 0.987).

2.3.1 Forecasting: Airline Passengers

2-year ahead forecast capturing trend and 12-month seasonality (RMSE: 47.5).

2.3.2 Forecasting: Finance

3. Benchmarks

3.1 Speed: SSA-Opt vs Rssa(Golden Standard for SSA)

Benchmarked with benchmark_vs_rssa.py on Intel Core i9-14900KF, MKL 2025.0.3:

Components: k = 50
Window length: L = N/4 (standard SSA choice)
Signal: Synthetic (trend + periodic + noise)

N	L	k	SSA-Opt (ms)	Rssa (ms)	Speedup	Correlation
500	125	50	2.0	33.9	17.0×	0.9767
1000	250	50	1.6	46.8	28.5×	0.9938
2000	500	50	2.5	64.3	25.3×	0.9982
5000	1250	50	7.2	148.3	20.5×	0.9995
10000	2500	50	12.2	370.8	30.5×	0.9999
20000	5000	50	26.3	618.9	23.6×	1.0000

Summary: Average 24× speedup, peak 30.5× at N=10,000. Throughput: 625 SSA/sec at N=1000.

Rssa uses PROPACK (Lanczos bidiagonalization). SSA-Opt uses randomized SVD with MKL.

3.2 MKL Configuration Impact

Using mkl_config.h for hybrid CPU optimization (P-core affinity, thread tuning):

N	Default MKL	Optimized	Improvement
1000	2.7 ms	2.1 ms	1.25×
5000	10.7 ms	8.9 ms	1.20×
10000	23.4 ms	20.7 ms	1.13×
20000	41.3 ms	36.1 ms	1.14×

The mkl_config.h header auto-detects hybrid CPUs (Intel 12th-14th gen) and pins threads to P-cores only, avoiding E-core slowdown.

3.3 Accuracy Validation

Test Signal	Correlation with True Signal
N=500, SNR=7dB	0.9895
N=1000, SNR=7dB	0.9973
N=10000, SNR=7dB	0.9999
N=20000, SNR=7dB	1.0000

4. Features

Core SSA

3 Decomposition Methods - Sequential power iteration, block power iteration, and randomized SVD
FFT-Accelerated - O(N log N) Hankel matrix-vector products via convolution theorem
R2C Optimization - Real-to-complex FFT exploits Hermitian symmetry (50% memory savings)
Malloc-Free Hot Path - prepare() + decompose_randomized() for zero-allocation streaming

Analysis & Reconstruction

Grouped Reconstruction - Combine arbitrary component subsets
W-Correlation Matrix - Fast DSYRK-based weighted correlation for component grouping
Automatic Pair Detection - Find sine/cosine pairs via singular value + W-correlation matching
Variance Explained - Per-component and cumulative variance statistics
Component Statistics - Scree plot data, gap detection, automatic rank selection

Forecasting & Prediction

R-Forecast (LRF) - Linear Recurrent Formula for extrapolation
V-Forecast - Alternative vector forecast method
Verticality Check - Stability indicator for forecast reliability

Advanced Features

ESPRIT - Frequency/period estimation from eigenvectors
Cadzow Iterations - Iterative rank reduction for enhanced denoising
Gap Filling - Missing value imputation (iterative + simple methods)
MSSA - Multivariate SSA for multi-channel/correlated series

Performance

MKL-Optimized - Intel MKL for FFT, BLAS, LAPACK, and RNG
Batched FFT - Process multiple vectors in single MKL call
Cached FFTs - Pre-computed eigenvector FFTs for fast reconstruction
OpenMP Ready - Parallel-friendly design

Integration

Header-Only - Single #include "ssa_opt.h"
Python Bindings - Full ctypes wrapper with NumPy integration
Minimal Dependencies - Just Intel MKL (or compatible BLAS/LAPACK)

5. Quick Start

5.1 Python (Recommended)

from ssa_wrapper import SSA # Load your data prices = np.array([...]) # Your time series # Decompose ssa = SSA(prices, L=250) ssa.decompose(k=30) # Uses fast randomized SVD # Extract components trend = ssa.reconstruct([0]) cycle = ssa.reconstruct([1, 2]) # Forecast forecast = ssa.forecast([0, 1, 2], n_forecast=50) # Analysis print(f"Variance explained: {ssa.variance_explained(0, 2):.1%}") pairs = ssa.find_periodic_pairs()

5.2 C

#define SSA_OPT_IMPLEMENTATION #include "ssa_opt_r2c.h" #include "mkl_config.h" int main() { // Initialize MKL (call once at startup) mkl_config_ssa_full(1); // 1 = verbose // Setup SSA_Opt ssa = {0}; ssa_opt_init(&ssa, signal, N, L); // Decompose (randomized is fastest) ssa_opt_decompose_randomized(&ssa, k, 8); // Reconstruct trend int trend_group[] = {0}; double* trend = malloc(N * sizeof(double)); ssa_opt_reconstruct(&ssa, trend_group, 1, trend); // Cleanup ssa_opt_free(&ssa); free(trend); }

5.3 After Decomposition

Access results directly via the struct:

ssa->sigma[i] // Singular values (descending order) ssa->eigenvalues[i] // Squared singular values (variance explained) ssa->U[i * L] // Left singular vector i (length L) ssa->V[i * K] // Right singular vector i (length K)

6. Algorithm

6.1 Trajectory Matrix (Hankel Embedding)

SSA embeds a 1D time series into a 2D trajectory matrix. Given signal x = [x₀, x₁, ..., x_{N-1}] and window length L, we construct an L × K Hankel matrix where K = N - L + 1:

 Column 0 Column 1 Column 2 ... Column K-1 ┌─────────────────────────────────────────────────┐ Row 0 │ x₀ x₁ x₂ ... x_{K-1} │ Row 1 │ x₁ x₂ x₃ ... x_K │ Row 2 │ x₂ x₃ x₄ ... x_{K+1} │ ⋮ │ ⋮ ⋮ ⋮ ⋱ ⋮ │ Row L-1│ x_{L-1} x_L x_{L+1} ... x_{N-1} │ └─────────────────────────────────────────────────┘ H[i,j] = x[i+j] (constant along anti-diagonals)

This matrix is never explicitly formed - it would require O(L × K) ≈ O(N²) memory.

6.2 Singular Value Decomposition

We compute the top-k singular triplets of H:

H ≈ σ₀·u₀·v₀ᵀ + σ₁·u₁·v₁ᵀ + ... + σₖ₋₁·uₖ₋₁·vₖ₋₁ᵀ

Each triplet (σᵢ, uᵢ, vᵢ) represents one component. Larger σ = more signal energy.

6.3 Reconstruction (Diagonal Averaging)

Each component forms a rank-1 matrix, then averaged back to 1D along anti-diagonals:

t=0: x̃[0] = X[0,0] t=1: x̃[1] = (X[0,1] + X[1,0]) / 2 t=2: x̃[2] = (X[0,2] + X[1,1] + X[2,0]) / 3 ...

6.4 Forecasting (Linear Recurrent Formula)

SSA signals satisfy a linear recurrence. The LRF extracts coefficients from left singular vectors to predict future values:

x[n] = a₁·x[n-1] + a₂·x[n-2] + ... + a_{L-1}·x[n-L+1]

7. Optimizations

7.1 FFT-Accelerated Matrix-Vector Product

Hankel matrix-vector multiplication equals convolution:

Forward: y = H·v Adjoint: z = Hᵀ·u y[i] = Σⱼ x[i+j]·v[j] z[j] = Σᵢ x[i+j]·u[i] = (x ∗ v)[i] = (x ∗ flip(u))[L-1+j] ┌─────────────────────────────────────────────────────────┐ │ conv(a,b) = IFFT( FFT(a) ⊙ FFT(b) ) │ │ │ │ Complexity: O(N²) direct → O(N log N) with FFT │ └─────────────────────────────────────────────────────────┘ Pre-compute FFT(signal) once at init. Each matvec: 1 FFT + 1 pointwise multiply + 1 IFFT

7.2 Block Power Method

Process b vectors simultaneously instead of sequentially:

Sequential (b=1): Block (b=32): ┌───┐ ┌───────────────────┐ │ v │ → H·v → u → Hᵀ·u → v │ v₀ v₁ v₂ ... v₃₁ │ = V_block └───┘ └───────────────────┘ ↓ × k components H·V_block (32 FFTs batched) × 100 iterations ↓ = 6400 individual FFTs QR(U_block) ↓ Hᵀ·U_block (32 FFTs batched) ↓ = 200 batched calls total Benefits: • MKL batched FFT: single function call for 32 transforms • Memory locality: V_block columns contiguous in memory • BLAS3: QR uses GEMM instead of GEMV

7.3 Periodic QR

Every iteration: vs. Every 5 iterations: iter 1: QR iter 1: (skip) iter 2: QR iter 2: (skip) iter 3: QR iter 3: (skip) iter 4: QR iter 4: (skip) iter 5: QR iter 5: QR ← ⋮ ⋮ 100 QR calls 20 QR calls (5× reduction) Stability maintained by: • Final QR before Rayleigh-Ritz extraction • Deflation orthogonalization against previous blocks each iteration

7.4 Rayleigh-Ritz Extraction

After block iteration converges, vectors span the correct subspace but aren't individually optimal:

┌─────────────────────────────────────────────────────┐ │ 1. Project: M = U_blockᵀ · (H · V_block) [b × b] │ │ │ │ 2. Small SVD: M = Uₛ · Σ · Vₛᵀ [b × b] │ │ │ │ 3. Rotate: U_final = U_block · Uₛ │ │ V_final = V_block · Vₛ │ │ σ_final = diag(Σ) │ └─────────────────────────────────────────────────────┘ Cost: one 32×32 SVD (microseconds) → optimal singular values

7.5 GEMM-Based Operations

All multi-vector operations use BLAS Level 3:

Operation	BLAS2 (sequential)	BLAS3 (block)
Orthogonalize against previous	k × GEMV	Single GEMM
Project out components	k × AXPY	Single GEMM

GEMM achieves near-peak FLOPS due to cache reuse and SIMD.

7.6 Randomized SVD

For very large problems, randomized SVD is fastest:

1. Random projection: Y = H · Ω, Ω is K × (k+p) random 2. Orthogonalize: Q = qr(Y) 3. Project: B = Qᵀ · H 4. SVD of small matrix: B = U_B · Σ · Vᵀ 5. Recover: U = Q · U_B

Complexity: O((k+p) × N log N) vs O(k × iterations × N log N) for power iteration.

7.7 R2C FFT Optimization

Real-to-Complex FFT exploits conjugate symmetry:

Output: N/2+1 complex values (vs N for C2C)
50% less memory for FFT buffers
~17% faster than complex-to-complex

7.8 Cached FFTs for W-Correlation

W-correlation matrix requires FFTs of all component reconstructions. With caching:

Operation	Without Cache	With Cache
wcorr_matrix	2n forward FFTs	0 forward FFTs
Speedup	1×	3×

Call ssa_opt_cache_ffts() once after decomposition if computing W-correlation multiple times.

8. Installation

8.1 Python

# Build the shared library cd MKL mkdir build && cd build cmake .. -DUSE_MKL=ON cmake --build . --config Release # Copy ssa.dll/libssa.so to py/ folder # Then: cd ../py python -c "from ssa_wrapper import SSA; print('OK')"

8.2 C/C++ (Header-Only)

#define SSA_OPT_IMPLEMENTATION #include "ssa_opt_r2c.h"

Link with MKL:

# Windows cl /O2 your_code.c /I"%MKLROOT%\include" mkl_rt.lib # Linux gcc -O3 your_code.c -I${MKLROOT}/include -lmkl_rt -lm

9. Benchmarking

Run the benchmark suite:

cd py # Speed comparison vs Rssa python benchmark_vs_rssa.py --speed # Compare MKL configurations python benchmark_vs_rssa.py --mkl-config # Internal method comparison (seq/block/randomized) python benchmark_vs_rssa.py --internal # All benchmarks python benchmark_vs_rssa.py --all

Parameter sensitivity tests:

python ssa_parameter_tests.py --snr # Noise robustness python ssa_parameter_tests.py --window # L sensitivity python ssa_parameter_tests.py --components # k sweep python ssa_parameter_tests.py --all

10. Limitations

Memory: O(k × N) for storing singular vectors
Not for streaming: Batch processing only
Minimum size: N > 100 recommended

11. References

Golyandina, N., & Zhigljavsky, A. (2013). Singular Spectrum Analysis for Time Series. Springer.
Korobeynikov, A. (2010). Computation- and space-efficient implementation of SSA. Statistics and Its Interface, 3(3).
Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness. SIAM Review, 53(2).

12. License

GPL 3.0 - See LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.vscode		.vscode
MKL		MKL
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

License

Tugbars/Singular-Spectrum-Analysis

Folders and files

Latest commit

History

Repository files navigation