A fast, header-only C library for Singular Spectrum Analysis with Python bindings, FFT-accelerated operations, and Intel MKL optimization.
17-30× faster than Rssa (the gold-standard R implementation).
Singular Spectrum Analysis decomposes time series into:
- Trends - Long-term movements
- Periodic components - Cycles, seasonality
- Noise - Random fluctuations
Unlike Fourier analysis, SSA is non-parametric and adapts to your data's structure. Ideal for financial analysis, signal processing, and anywhere you need clean signal extraction.
SSA recovers clean signal from heavy noise (SNR: 4.1 → 26.1 dB).
Ensemble SSA extracts smooth trend from volatile Bitcoin prices (r = 0.987).
2-year ahead forecast capturing trend and 12-month seasonality (RMSE: 47.5).
Benchmarked with benchmark_vs_rssa.py on Intel Core i9-14900KF, MKL 2025.0.3:
- Components: k = 50
- Window length: L = N/4 (standard SSA choice)
- Signal: Synthetic (trend + periodic + noise)
| N | L | k | SSA-Opt (ms) | Rssa (ms) | Speedup | Correlation |
|---|---|---|---|---|---|---|
| 500 | 125 | 50 | 2.0 | 33.9 | 17.0× | 0.9767 |
| 1000 | 250 | 50 | 1.6 | 46.8 | 28.5× | 0.9938 |
| 2000 | 500 | 50 | 2.5 | 64.3 | 25.3× | 0.9982 |
| 5000 | 1250 | 50 | 7.2 | 148.3 | 20.5× | 0.9995 |
| 10000 | 2500 | 50 | 12.2 | 370.8 | 30.5× | 0.9999 |
| 20000 | 5000 | 50 | 26.3 | 618.9 | 23.6× | 1.0000 |
Summary: Average 24× speedup, peak 30.5× at N=10,000. Throughput: 625 SSA/sec at N=1000.
Rssa uses PROPACK (Lanczos bidiagonalization). SSA-Opt uses randomized SVD with MKL.
Using mkl_config.h for hybrid CPU optimization (P-core affinity, thread tuning):
| N | Default MKL | Optimized | Improvement |
|---|---|---|---|
| 1000 | 2.7 ms | 2.1 ms | 1.25× |
| 5000 | 10.7 ms | 8.9 ms | 1.20× |
| 10000 | 23.4 ms | 20.7 ms | 1.13× |
| 20000 | 41.3 ms | 36.1 ms | 1.14× |
The mkl_config.h header auto-detects hybrid CPUs (Intel 12th-14th gen) and pins threads to P-cores only, avoiding E-core slowdown.
| Test Signal | Correlation with True Signal |
|---|---|
| N=500, SNR=7dB | 0.9895 |
| N=1000, SNR=7dB | 0.9973 |
| N=10000, SNR=7dB | 0.9999 |
| N=20000, SNR=7dB | 1.0000 |
- 3 Decomposition Methods - Sequential power iteration, block power iteration, and randomized SVD
- FFT-Accelerated - O(N log N) Hankel matrix-vector products via convolution theorem
- R2C Optimization - Real-to-complex FFT exploits Hermitian symmetry (50% memory savings)
- Malloc-Free Hot Path -
prepare()+decompose_randomized()for zero-allocation streaming
- Grouped Reconstruction - Combine arbitrary component subsets
- W-Correlation Matrix - Fast DSYRK-based weighted correlation for component grouping
- Automatic Pair Detection - Find sine/cosine pairs via singular value + W-correlation matching
- Variance Explained - Per-component and cumulative variance statistics
- Component Statistics - Scree plot data, gap detection, automatic rank selection
- R-Forecast (LRF) - Linear Recurrent Formula for extrapolation
- V-Forecast - Alternative vector forecast method
- Verticality Check - Stability indicator for forecast reliability
- ESPRIT - Frequency/period estimation from eigenvectors
- Cadzow Iterations - Iterative rank reduction for enhanced denoising
- Gap Filling - Missing value imputation (iterative + simple methods)
- MSSA - Multivariate SSA for multi-channel/correlated series
- MKL-Optimized - Intel MKL for FFT, BLAS, LAPACK, and RNG
- Batched FFT - Process multiple vectors in single MKL call
- Cached FFTs - Pre-computed eigenvector FFTs for fast reconstruction
- OpenMP Ready - Parallel-friendly design
- Header-Only - Single
#include "ssa_opt.h" - Python Bindings - Full ctypes wrapper with NumPy integration
- Minimal Dependencies - Just Intel MKL (or compatible BLAS/LAPACK)
from ssa_wrapper import SSA # Load your data prices = np.array([...]) # Your time series # Decompose ssa = SSA(prices, L=250) ssa.decompose(k=30) # Uses fast randomized SVD # Extract components trend = ssa.reconstruct([0]) cycle = ssa.reconstruct([1, 2]) # Forecast forecast = ssa.forecast([0, 1, 2], n_forecast=50) # Analysis print(f"Variance explained: {ssa.variance_explained(0, 2):.1%}") pairs = ssa.find_periodic_pairs()#define SSA_OPT_IMPLEMENTATION #include "ssa_opt_r2c.h" #include "mkl_config.h" int main() { // Initialize MKL (call once at startup) mkl_config_ssa_full(1); // 1 = verbose // Setup SSA_Opt ssa = {0}; ssa_opt_init(&ssa, signal, N, L); // Decompose (randomized is fastest) ssa_opt_decompose_randomized(&ssa, k, 8); // Reconstruct trend int trend_group[] = {0}; double* trend = malloc(N * sizeof(double)); ssa_opt_reconstruct(&ssa, trend_group, 1, trend); // Cleanup ssa_opt_free(&ssa); free(trend); }Access results directly via the struct:
ssa->sigma[i] // Singular values (descending order) ssa->eigenvalues[i] // Squared singular values (variance explained) ssa->U[i * L] // Left singular vector i (length L) ssa->V[i * K] // Right singular vector i (length K)SSA embeds a 1D time series into a 2D trajectory matrix. Given signal x = [x₀, x₁, ..., x_{N-1}] and window length L, we construct an L × K Hankel matrix where K = N - L + 1:
Column 0 Column 1 Column 2 ... Column K-1 ┌─────────────────────────────────────────────────┐ Row 0 │ x₀ x₁ x₂ ... x_{K-1} │ Row 1 │ x₁ x₂ x₃ ... x_K │ Row 2 │ x₂ x₃ x₄ ... x_{K+1} │ ⋮ │ ⋮ ⋮ ⋮ ⋱ ⋮ │ Row L-1│ x_{L-1} x_L x_{L+1} ... x_{N-1} │ └─────────────────────────────────────────────────┘ H[i,j] = x[i+j] (constant along anti-diagonals) This matrix is never explicitly formed - it would require O(L × K) ≈ O(N²) memory.
We compute the top-k singular triplets of H:
H ≈ σ₀·u₀·v₀ᵀ + σ₁·u₁·v₁ᵀ + ... + σₖ₋₁·uₖ₋₁·vₖ₋₁ᵀ Each triplet (σᵢ, uᵢ, vᵢ) represents one component. Larger σ = more signal energy.
Each component forms a rank-1 matrix, then averaged back to 1D along anti-diagonals:
t=0: x̃[0] = X[0,0] t=1: x̃[1] = (X[0,1] + X[1,0]) / 2 t=2: x̃[2] = (X[0,2] + X[1,1] + X[2,0]) / 3 ... SSA signals satisfy a linear recurrence. The LRF extracts coefficients from left singular vectors to predict future values:
x[n] = a₁·x[n-1] + a₂·x[n-2] + ... + a_{L-1}·x[n-L+1] Hankel matrix-vector multiplication equals convolution:
Forward: y = H·v Adjoint: z = Hᵀ·u y[i] = Σⱼ x[i+j]·v[j] z[j] = Σᵢ x[i+j]·u[i] = (x ∗ v)[i] = (x ∗ flip(u))[L-1+j] ┌─────────────────────────────────────────────────────────┐ │ conv(a,b) = IFFT( FFT(a) ⊙ FFT(b) ) │ │ │ │ Complexity: O(N²) direct → O(N log N) with FFT │ └─────────────────────────────────────────────────────────┘ Pre-compute FFT(signal) once at init. Each matvec: 1 FFT + 1 pointwise multiply + 1 IFFT Process b vectors simultaneously instead of sequentially:
Sequential (b=1): Block (b=32): ┌───┐ ┌───────────────────┐ │ v │ → H·v → u → Hᵀ·u → v │ v₀ v₁ v₂ ... v₃₁ │ = V_block └───┘ └───────────────────┘ ↓ × k components H·V_block (32 FFTs batched) × 100 iterations ↓ = 6400 individual FFTs QR(U_block) ↓ Hᵀ·U_block (32 FFTs batched) ↓ = 200 batched calls total Benefits: • MKL batched FFT: single function call for 32 transforms • Memory locality: V_block columns contiguous in memory • BLAS3: QR uses GEMM instead of GEMV Every iteration: vs. Every 5 iterations: iter 1: QR iter 1: (skip) iter 2: QR iter 2: (skip) iter 3: QR iter 3: (skip) iter 4: QR iter 4: (skip) iter 5: QR iter 5: QR ← ⋮ ⋮ 100 QR calls 20 QR calls (5× reduction) Stability maintained by: • Final QR before Rayleigh-Ritz extraction • Deflation orthogonalization against previous blocks each iteration After block iteration converges, vectors span the correct subspace but aren't individually optimal:
┌─────────────────────────────────────────────────────┐ │ 1. Project: M = U_blockᵀ · (H · V_block) [b × b] │ │ │ │ 2. Small SVD: M = Uₛ · Σ · Vₛᵀ [b × b] │ │ │ │ 3. Rotate: U_final = U_block · Uₛ │ │ V_final = V_block · Vₛ │ │ σ_final = diag(Σ) │ └─────────────────────────────────────────────────────┘ Cost: one 32×32 SVD (microseconds) → optimal singular values All multi-vector operations use BLAS Level 3:
| Operation | BLAS2 (sequential) | BLAS3 (block) |
|---|---|---|
| Orthogonalize against previous | k × GEMV | Single GEMM |
| Project out components | k × AXPY | Single GEMM |
GEMM achieves near-peak FLOPS due to cache reuse and SIMD.
For very large problems, randomized SVD is fastest:
1. Random projection: Y = H · Ω, Ω is K × (k+p) random 2. Orthogonalize: Q = qr(Y) 3. Project: B = Qᵀ · H 4. SVD of small matrix: B = U_B · Σ · Vᵀ 5. Recover: U = Q · U_B Complexity: O((k+p) × N log N) vs O(k × iterations × N log N) for power iteration.
Real-to-Complex FFT exploits conjugate symmetry:
- Output: N/2+1 complex values (vs N for C2C)
- 50% less memory for FFT buffers
- ~17% faster than complex-to-complex
W-correlation matrix requires FFTs of all component reconstructions. With caching:
| Operation | Without Cache | With Cache |
|---|---|---|
| wcorr_matrix | 2n forward FFTs | 0 forward FFTs |
| Speedup | 1× | 3× |
Call ssa_opt_cache_ffts() once after decomposition if computing W-correlation multiple times.
# Build the shared library cd MKL mkdir build && cd build cmake .. -DUSE_MKL=ON cmake --build . --config Release # Copy ssa.dll/libssa.so to py/ folder # Then: cd ../py python -c "from ssa_wrapper import SSA; print('OK')"#define SSA_OPT_IMPLEMENTATION #include "ssa_opt_r2c.h"Link with MKL:
# Windows cl /O2 your_code.c /I"%MKLROOT%\include" mkl_rt.lib # Linux gcc -O3 your_code.c -I${MKLROOT}/include -lmkl_rt -lmRun the benchmark suite:
cd py # Speed comparison vs Rssa python benchmark_vs_rssa.py --speed # Compare MKL configurations python benchmark_vs_rssa.py --mkl-config # Internal method comparison (seq/block/randomized) python benchmark_vs_rssa.py --internal # All benchmarks python benchmark_vs_rssa.py --allParameter sensitivity tests:
python ssa_parameter_tests.py --snr # Noise robustness python ssa_parameter_tests.py --window # L sensitivity python ssa_parameter_tests.py --components # k sweep python ssa_parameter_tests.py --all- Memory: O(k × N) for storing singular vectors
- Not for streaming: Batch processing only
- Minimum size: N > 100 recommended
- Golyandina, N., & Zhigljavsky, A. (2013). Singular Spectrum Analysis for Time Series. Springer.
- Korobeynikov, A. (2010). Computation- and space-efficient implementation of SSA. Statistics and Its Interface, 3(3).
- Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness. SIAM Review, 53(2).
GPL 3.0 - See LICENSE file.