python - Different results when using sklearn RandomizedPCA with sparse and dense matrices -
i getting different results when randomized pca
sparse , dense matrices:
import numpy np import scipy.sparse scsp sklearn.decomposition import randomizedpca x = np.matrix([[1,2,3,2,0,0,0,0], [2,3,1,0,0,0,0,3], [1,0,0,0,2,3,2,0], [3,0,0,0,4,5,6,0], [0,0,4,0,0,5,6,7], [0,6,4,5,6,0,0,0], [7,0,5,0,7,9,0,0]]) csr_x = scsp.csr_matrix(x) s_pca = randomizedpca(n_components=2) s_pca_scores = s_pca.fit_transform(csr_x) s_pca_weights = s_pca.explained_variance_ratio_ d_pca = randomizedpca(n_components=2) d_pca_scores = s_pca.fit_transform(x) d_pca_weights = s_pca.explained_variance_ratio_ print 'sparse matrix scores {}'.format(s_pca_scores) print 'dense matrix scores {}'.format(d_pca_scores) print 'sparse matrix weights {}'.format(s_pca_weights) print 'dense matrix weights {}'.format(d_pca_weights)
result:
sparse matrix scores [[ 1.90912166 2.37266113] [ 1.98826835 0.67329466] [ 3.71153199 -1.00492408] [ 7.76361811 -2.60901625] [ 7.39263662 -5.8950472 ] [ 5.58268666 7.97259172] [ 13.19312194 1.30282165]] dense matrix scores [[-4.23432815 0.43110596] [-3.87576857 -1.36999888] [-0.05168291 -1.02612363] [ 3.66039297 -1.38544473] [ 1.48948352 -7.0723618 ] [-4.97601287 5.49128164] [ 7.98791603 4.93154146]] sparse matrix weights [ 0.74988508 0.25011492] dense matrix weights [ 0.55596761 0.44403239]
the dense version gives results normal pca, going on when matrix sparse? why results different?
in case of sparse data, randomizedpca
not center data (mean removal) might blow memory usage. explains observe.
i agree "feature" poorly documented. please feel free report issue on github track , improve doc.
edit: fixed discrepancy in scikit-learn 0.15: randomizedpca not deprecated sparse data. instead use truncatedsvd same pca without trying center data.
Comments
Post a Comment