Multiplying S and V is exactly what you have to do to perform dimensionality reduction with SVD/LSA.
>>> C = np.array([[1, 0, 1, 0, 0, 0], ... [0, 1, 0, 0, 0, 0], ... [1, 1, 0, 0, 0, 0], ... [1, 0, 0, 1, 1, 0], ... [0, 0, 0, 1, 0, 1]]) >>> from scipy.linalg import svd >>> U, s, VT = svd(C, full_matrices=False) >>> s[2:] = 0 >>> np.dot(np.diag(s), VT) array([[ 1.61889806, 0.60487661, 0.44034748, 0.96569316, 0.70302032, 0.26267284], [-0.45671719, -0.84256593, -0.29617436, 0.99731918, 0.35057241, 0.64674677], [ 0. , 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. , 0. ]])
This gives a matrix where all but the last few rows are zeros, so they can be removed, and in practice this is the matrix you would use in applications:
>>> np.dot(np.diag(s[:2]), VT[:2]) array([[ 1.61889806, 0.60487661, 0.44034748, 0.96569316, 0.70302032, 0.26267284], [-0.45671719, -0.84256593, -0.29617436, 0.99731918, 0.35057241, 0.64674677]])
What the PDF describes on page 10 is the recipe to get a low-rank reconstruction of the input C. Rank != dimensionality, and the shear size and density of the reconstruction matrix make it impractical to use in LSA; its purpose is mostly mathematical. One thing you can do with it is check how good the reconstruction is for various values of k:
>>> U, s, VT = svd(C, full_matrices=False) >>> C2 = np.dot(U[:, :2], np.dot(np.diag(s[:2]), VT[:2])) >>> from scipy.spatial.distance import euclidean >>> euclidean(C2.ravel(), C.ravel()) # Frobenius norm of C2 - C 1.6677932876555255 >>> C3 = np.dot(U[:, :3], np.dot(np.diag(s[:3]), VT[:3])) >>> euclidean(C3.ravel(), C.ravel()) 1.0747879905228703
Sanity check against scikit-learn's TruncatedSVD (full disclosure: I wrote that):
>>> from sklearn.decomposition import TruncatedSVD >>> TruncatedSVD(n_components=2).fit_transform(C.T) array([[ 1.61889806, -0.45671719], [ 0.60487661, -0.84256593], [ 0.44034748, -0.29617436], [ 0.96569316, 0.99731918], [ 0.70302032, 0.35057241], [ 0.26267284, 0.64674677]])