python - Different results when using sklearn RandomizedPCA with sparse and dense matrices -


i getting different results when randomized pca sparse , dense matrices:

import numpy np import scipy.sparse scsp sklearn.decomposition import randomizedpca  x = np.matrix([[1,2,3,2,0,0,0,0],                [2,3,1,0,0,0,0,3],                [1,0,0,0,2,3,2,0],                [3,0,0,0,4,5,6,0],                [0,0,4,0,0,5,6,7],                [0,6,4,5,6,0,0,0],                [7,0,5,0,7,9,0,0]])  csr_x = scsp.csr_matrix(x)  s_pca = randomizedpca(n_components=2) s_pca_scores = s_pca.fit_transform(csr_x) s_pca_weights = s_pca.explained_variance_ratio_  d_pca = randomizedpca(n_components=2) d_pca_scores = s_pca.fit_transform(x) d_pca_weights = s_pca.explained_variance_ratio_  print 'sparse matrix scores {}'.format(s_pca_scores) print 'dense matrix scores {}'.format(d_pca_scores) print 'sparse matrix weights {}'.format(s_pca_weights) print 'dense matrix weights {}'.format(d_pca_weights) 

result:

sparse matrix scores [[  1.90912166   2.37266113]  [  1.98826835   0.67329466]  [  3.71153199  -1.00492408]  [  7.76361811  -2.60901625]  [  7.39263662  -5.8950472 ]  [  5.58268666   7.97259172]  [ 13.19312194   1.30282165]] dense matrix scores [[-4.23432815  0.43110596]  [-3.87576857 -1.36999888]  [-0.05168291 -1.02612363]  [ 3.66039297 -1.38544473]  [ 1.48948352 -7.0723618 ]  [-4.97601287  5.49128164]  [ 7.98791603  4.93154146]] sparse matrix weights [ 0.74988508  0.25011492] dense matrix weights [ 0.55596761  0.44403239] 

the dense version gives results normal pca, going on when matrix sparse? why results different?

in case of sparse data, randomizedpca not center data (mean removal) might blow memory usage. explains observe.

i agree "feature" poorly documented. please feel free report issue on github track , improve doc.

edit: fixed discrepancy in scikit-learn 0.15: randomizedpca not deprecated sparse data. instead use truncatedsvd same pca without trying center data.


Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -