Mirror from this repository
This is an implementation of the CANDECOMP/PARAFAC model for tensor factorization of non-negative data. Missing values are handled by marginalization, i.e. ignored during optimization. The code was used in a project at Technical University of Denmark as part of one of the authors master degree, that ultimately lead to the publication Non-negative Tensor Factorization with missing data for the modeling of gene expressions in the Human Brain. All implementation was done in MATLAB.
- cpNonNeg.m - main MATLAB function
- cpNonNeg_sub.m - NMF solver for CP-subproblem
- krprod.m - Kathri-Rao product for tensors
- matricizing.m - matricizing operation
- tmult.m - tensor multiplication (mode specific)
- unmatricizing.m - tensor reconstruction from matrix
The following script (available in the repository) shows a basic usage of the code. NB! The code will terminate quickly and not neccessarily give meaningful results (due to the random data).
% example script %% Generate synthetic data D_true = 5; N = [1000 50 25]; % Tensor dimensions Nx = length(N); F = cell(Nx,1); for i = 1:Nx F{i} = rand(N(i),D_true); end % Diagonal identity tensor I=zeros(D_true*ones(1,Nx)); for j=1:D_true I(j,j,j)=1; end Y=tmult(I,F{1},1); % Data tensor for ip = 2:Nx Y=tmult(Y,F{ip},ip); end sig2 = 0.5; % noise level C = 5; % affine transformation to ensure non-negatitivty X = Y + sqrt(sig2)*randn(N) + C*ones(N); assert(min(X(:))>0); %% Holdout missing data p = 0.20; % holdout fraction (missing data) NE = prod(size(X)); % number of elements in tensor R = rand(NE,1)>(1-p); % holdout logical indices X(R) = nan; % missing values are treated as NaN %% Model specification D = 5; % number of latent componenents in the model Finit = cell(Nx,1); % initialization of factors (default) scale = nanstd(X(:)); % scale of data for i = 1:Nx Finit{i}=(scale.^(1/Nx))*rand(N(i),D); end % options options.maxiter = 250; % number of iterations options.mu = 0; % no multiplicative update steps are taken options.hals = 1; % hierarchical alternating least sqaures steps are taken %% Run [FACT,SSEv,CPUt]=cpNonNeg(X,D,Finit,options); % FACT gives back factors in a cell array just as Finit was initialized % SSEv is the sum of squared (reconstruction) errors in each iteration % (vector) % CPUt is the CPU time used in each iterationWritten by: Søren Føns Vind Nielsen and Morten Mørup CogSys, Technical University of Denmark, May 2014