fit
Syntax
Description
The incremental fit
function fits an incremental principal
component analysis (PCA) object (incrementalPCA
) to
streaming data.
returns an incremental PCA model IncrementalMdl
= fit(IncrementalMdl
,X
)IncrementalMdl
, which represents the
input incremental PCA model IncrementalMdl
fit using the predictor data
X
. Specifically, the incremental fit
function fits the model to the incoming data and stores the updated PCA properties in the
output model IncrementalMdl
.
also sets the observation weights IncrementalMdl
= fit(IncrementalMdl
,X
,Weights=weights
)weights
.
[
additionally returns the principal component scores IncrementalMdl
,Xtransformed
] = fit(IncrementalMdl
,X
)Xtransformed
.
Examples
Perform Incremental Principal Component Analysis Using Initial Model
Perform principal component analysis (PCA) on an initial data chunk, and then create an incremental PCA model that incorporates the results of the analysis. Fit the incremental model to streaming data and analyze how the model evolves during training.
Load and Preprocess Data
Load the human activity data set.
load humanactivity
For details on the human activity data set, enter Description at the command line.
The data set includes observations containing 60 variables. To simulate streaming data, split the data set into an initial chunk of 1000 observations and a second chunk of 10,000 observations.
Xinitial = feat(1:1000,:); Xstream = feat(1001:11000,:);
Perform Initial PCA
Perform PCA on the initial data chunk by using the pca
function. Specify to center the data and keep 10 principal components. Return the principal component coefficients (coeff
), principal component variances (latent
), and estimated means of the variables (mu
).
[coeff,~,latent,~,~,mu]=pca(Xinitial,Centered=true,NumComponents=10);
Create Incremental PCA Model
Create a model for incremental PCA that incorporates the PCA results from the initial data chunk.
IncrementalMdl = incrementalPCA(Coefficients=coeff,Latent=latent, ...
Means=mu,NumObservations=1000);
details(IncrementalMdl)
incrementalPCA with properties: IsWarm: 1 NumTrainingObservations: 0 WarmupPeriod: 0 Mu: [0.7764 0.4931 -0.3407 0.1108 0.0707 0.0485 0.3931 -1.1100 0.0646 0.1703 -1.1020 0.0283 0.0836 -1.0797 0.0139 0.9328 1.2892 1.6731 2.0729 2.5181 2.9511 0.0128 0.0062 0.0039 0.0027 0.0020 0.0016 0.9322 ... ] (1x60 double) Sigma: [] ExplainedVariance: [10x1 double] EstimationPeriod: 0 Latent: [10x1 double] Coefficients: [60x10 double] VariableWeights: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] NumComponents: 10 NumPredictors: 60
IncrementalMdl
is an incrementalPCA
model object. All its properties are read-only. Because Coefficients
and Latent
are specified, the model is warm, meaning that the fit
function returns transformed observations.
Fit Incremental Model
Fit the incremental model IncrementalMdl
to the data by using the fit
function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:
Process 100 observations.
Overwrite the previous incremental model with a new one fitted to the incoming observations.
Store
topEV
, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.
n = numel(Xstream(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); topEV = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); IncrementalMdl = fit(IncrementalMdl,Xstream(ibegin:iend,:)); topEV(j) = IncrementalMdl.ExplainedVariance(1); end
IncrementalMdl
is an incrementalPCA
model object fitted to all the data in the stream. The fit
function fits the model to the data chunk and updates the model properties.
Analyze Incremental Model During Training
Plot the explained variance value of the component with the highest variance to see how it evolves during training.
figure plot(topEV,".-") ylabel("topEV") xlabel("Iteration") xlim([0 nchunk])
The highest explained variance value is 33% after the first iteration, and rapidly rises to 80% after five iterations. The value then gradually approaches 97%.
Perform Incremental Principal Component Analysis Without Prior Information
Create a model for incremental principal component analysis (PCA) and specify to standardize the data.
IncrementalMdl = incrementalPCA(StandardizeData=true); details(IncrementalMdl)
incrementalPCA with properties: IsWarm: 0 NumTrainingObservations: 0 WarmupPeriod: 1000 Mu: [] Sigma: [] ExplainedVariance: [0x1 double] EstimationPeriod: 1000 Latent: [0x1 double] Coefficients: [] VariableWeights: [1x0 double] NumComponents: 0 NumPredictors: 0
IncrementalMdl
is an incrementalPCA
model object. All its properties are read-only. By default, the software sets the hyperparameter estimation period and the warm-up period to 1000 observations. The model must be warm before the incremental fit
function outputs transformed data.
Load and Preprocess Data
Load the NYCHousing2015 sample data set.
load NYCHousing2015
The data set includes 10 variables with information on the sales of properties in New York City in 2015.
Preprocess the data set. Remove the categorical variables BOROUGH
, NEIGHBORHOOD
and BUILDINGCLASSCATEGORY
. Convert the datetime
array (SALEDATE
) to month numbers and change zeros in LANDSQUAREFEET
, GROSSSQUAREFEET
, SALEPRICE
, and YEARBUILT
to NaN
s.
NYCHousing2015 = removevars(NYCHousing2015,["BOROUGH", ... "NEIGHBORHOOD","BUILDINGCLASSCATEGORY"]); NYCHousing2015.SALEDATE = month(NYCHousing2015.SALEDATE); NYCHousing2015.LANDSQUAREFEET(NYCHousing2015.LANDSQUAREFEET == 0) = NaN; NYCHousing2015.GROSSSQUAREFEET(NYCHousing2015.GROSSSQUAREFEET == 0) = NaN; NYCHousing2015.SALEPRICE(NYCHousing2015.SALEPRICE == 0) = NaN; NYCHousing2015.YEARBUILT(NYCHousing2015.YEARBUILT == 0) = NaN;
The fit
function of incrementalPCA
does not use observations that contain a missing value. Remove these observations from the data set.
NYCHousing2015=rmmissing(NYCHousing2015);
The incrementalPCA
functions do not accept data in table format. Convert the data set to array format and keep only the first 5000 observations.
streamingData = table2array(NYCHousing2015(1:end,:)); streamingData=streamingData(1:5000,:);
Fit Incremental Models
Fit the incremental model IncrementalMdl
to the data using the fit
function. To simulate a data stream, fit the model in chunks of 100 observations at a time. At each iteration:
Process 100 observations.
Overwrite the previous incremental model with a new one fitted to the incoming observations.
Store
isWarm
, theIsWarm
property ofIncrementalMdl
, to see how it evolves during incremental fitting.Store
topEV
, the explained variance value of the component with the highest variance, to see how it evolves during incremental fitting.Store
meanXtr
, the mean of the transformed data output by thefit
function, to see how it evolves during incremental fitting.
n = numel(streamingData(:,1)); numObsPerChunk = 100; nchunk = floor(n/numObsPerChunk); meanXtr = zeros(nchunk,1); isWarm = zeros(nchunk,1); % Incremental fitting for j = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); [IncrementalMdl,Xtr] = fit(IncrementalMdl,streamingData(ibegin:iend,:)); isWarm(j) = IncrementalMdl.IsWarm; topEV(j) = IncrementalMdl.ExplainedVariance(1); meanXtr(j)=mean(Xtr(:)); end
IncrementalMdl
is an incrementalPCA
model object fitted to all the data in the stream. fit
fits the model to the data chunk and outputs the transformed input data.
Analyze Incremental Model During Training
To see how the IsWarm indicator, the explained variance value of the component with the highest variance, and the mean of the transformed input data per chunk evolve during training, plot them on separate tiles.
figure tiledlayout(3,1); nexttile plot(isWarm,".-") ylabel("IsWarm") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(topEV,".-") ylabel("Top EV") xlabel("Iteration") xlim([0 nchunk]) nexttile plot(meanXtr,".-") ylabel("Mean of Transformed Data") xlabel("Iteration") xlim([0 nchunk])
Because EstimationPeriod
= 1000, fit
processes 1000 observations to determine hyperparameters before updating the PCA properties of IncrementalMdl
. After the estimation period, the top explained variance value initially fluctuates between 58% and 85%, and then gradually approaches 50%. Because WarmupPeriod
= 1000, fit
processes an additional 1000 observations after the estimation period before IncrementalMdl
becomes warm and outputs transformed data. The mean of the transformed data fluctuates between –0.3 and 0.08.
Input Arguments
IncrementalMdl
— Incremental PCA model
incrementalPCA
model object
Incremental PCA model, specified as an incrementalPCA
model object. You can create
IncrementalMdl
by calling incrementalPCA
directly.
X
— Chunk of predictor data
floating-point matrix
Chunk of predictor data, specified as a floating-point matrix of
n observations and IncrementalMdl.NumPredictors
variables. The rows of X
correspond to observations, and the
columns correspond to variables. The software ignores observations that contain at least
one missing value.
Note
If
IncrementalMdl.NumPredictors
= 0,fit
infers the number of predictors fromX
, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes fromIncrementalMdl.NumPredictors
,fit
issues an error.fit
supports only numeric input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Usedummyvar
to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.
Data Types: single
| double
weights
— Chunk of observation weights
floating-point vector of positive values
Chunk of observation weights, specified as a floating-point vector of positive
values. fit
weighs the observations in
X
with the corresponding values in weights
.
The size of weights
must equal n, the number of
observations in X
.
By default, weights
is
ones(
.n
,1)
Data Types: single
| double
Output Arguments
IncrementalMdl
— Updated incremental PCA model
incrementalPCA
model object
Updated incremental PCA model, returned as an incrementalPCA
model object.
Xtransformed
— Principal component scores
floating-point matrix
Principal component scores, returned as a floating-point matrix. The rows of
Xtransformed
correspond to observations, and the columns
correspond to components. If IncrementalMdl
is not warm
(IsWarm=false
), all values of Xtransformed
are
returned as NaN
. The data type of Xtransformed
is the same as X
.
Version History
Introduced in R2024a
See Also
incrementalPCA
| pca
| reset
| transform
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)