python - Different results when using sklearn RandomizedPCA with sparse and dense matrices -

March 15, 2012

i getting different results when randomized pca sparse , dense matrices:

import numpy np import scipy.sparse scsp sklearn.decomposition import randomizedpca  x = np.matrix([[1,2,3,2,0,0,0,0],                [2,3,1,0,0,0,0,3],                [1,0,0,0,2,3,2,0],                [3,0,0,0,4,5,6,0],                [0,0,4,0,0,5,6,7],                [0,6,4,5,6,0,0,0],                [7,0,5,0,7,9,0,0]])  csr_x = scsp.csr_matrix(x)  s_pca = randomizedpca(n_components=2) s_pca_scores = s_pca.fit_transform(csr_x) s_pca_weights = s_pca.explained_variance_ratio_  d_pca = randomizedpca(n_components=2) d_pca_scores = s_pca.fit_transform(x) d_pca_weights = s_pca.explained_variance_ratio_  print 'sparse matrix scores {}'.format(s_pca_scores) print 'dense matrix scores {}'.format(d_pca_scores) print 'sparse matrix weights {}'.format(s_pca_weights) print 'dense matrix weights {}'.format(d_pca_weights)

result:

sparse matrix scores [[  1.90912166   2.37266113]  [  1.98826835   0.67329466]  [  3.71153199  -1.00492408]  [  7.76361811  -2.60901625]  [  7.39263662  -5.8950472 ]  [  5.58268666   7.97259172]  [ 13.19312194   1.30282165]] dense matrix scores [[-4.23432815  0.43110596]  [-3.87576857 -1.36999888]  [-0.05168291 -1.02612363]  [ 3.66039297 -1.38544473]  [ 1.48948352 -7.0723618 ]  [-4.97601287  5.49128164]  [ 7.98791603  4.93154146]] sparse matrix weights [ 0.74988508  0.25011492] dense matrix weights [ 0.55596761  0.44403239]

the dense version gives results normal pca, going on when matrix sparse? why results different?

in case of sparse data, randomizedpca not center data (mean removal) might blow memory usage. explains observe.

i agree "feature" poorly documented. please feel free report issue on github track , improve doc.

edit: fixed discrepancy in scikit-learn 0.15: randomizedpca not deprecated sparse data. instead use truncatedsvd same pca without trying center data.

Search This Blog

KHS

python - Different results when using sklearn RandomizedPCA with sparse and dense matrices -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -