How to take PCA of large sparse matrices without losing a row

Question

Amigo on 29 Oct 2019

1
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/487985-how-to-take-pca-of-large-sparse-matrices-without-losing-a-row

Answered: arushi on 14 Aug 2024

I am working on a project where I am required to take PCA of a sparse matrix, that when converted to dense becomes 553 * 26315. The matrix was formaed by taking TFIDF of 553 * 25. Running PCA as

[coeff,score,latent,~,explained] = pca(X)

returns coeff = 26315 * 552; score = 553 *552, latent and explained are 552 * 1. I want to confirm if I am suppose to transpose X before applying the function or coeff after wards. I am to use the output as features in my work, so if I have a target array of 553 values, I would certainly need 553 row. Also, if I set Economy false, Matlab fails to run it as the matrix is too large; is there a way to make sparse matrices small without loosing informatio before applying PCA.

Not exact part of the exact topic, but could someone comment if using PSO (before or without PCA) makes good sense.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

arushi on 14 Aug 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/487985-how-to-take-pca-of-large-sparse-matrices-without-losing-a-row#answer_1498814

Hi Amigo,

When working with Principal Component Analysis (PCA) on large sparse matrices, it's important to ensure that the data is oriented correctly and that the PCA implementation is efficient. Here are some key points and steps to address your concerns:

1. Matrix Orientation - For PCA, the rows of the input matrix X typically represent observations (samples), and the columns represent variables (features). Given your data dimensions:

X is 553 (samples) by 26315 (features).

The output dimensions you mentioned:

coeff: 26315 x 552
score: 553 x 552
latent and explained: 552 x 1

These dimensions suggest that the function is working correctly, as PCA reduces the number of features while retaining the number of samples.

2. Sparse Matrix Handling - MATLAB's pca function may not handle sparse matrices efficiently for large datasets. To address this, consider using specialized functions for sparse matrices or dimensionality reduction techniques that are more suitable for sparse data.

3. Economy Mode - Setting the 'Economy' mode to false can cause issues with large matrices due to memory constraints. Keeping it true is advisable to save memory.

4. Feature Reduction Before PCA - To reduce the size of the matrix without losing significant information, you can use techniques such as:

Truncated Singular Value Decomposition (SVD): This can be applied directly to sparse matrices and is often used for large-scale PCA.
Feature Selection: Select a subset of features based on some criteria (e.g., variance threshold).

Hope this helps.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to take PCA of large sparse matrices without losing a row

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to take PCA of large sparse matrices without losing a row

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments