Updated 09 Jul 2019
This standalone version of the EOF function is no longer being maintained. It still works fine, but you'll find the most up-to-date version in the Climate Data Toolbox for MATLAB here: https://www.mathworks.com/matlabcentral/fileexchange/70338. If the eof function has been useful for you, please cite our Climate Data Toolbox for MATLAB paper!
This function simplifies the process of applying Empirical Orthogonal Functions (spatiotemporal principal component analysis) to 3D datasets such as climate data. EOF analysis is not terribly difficult to implement, but much time is often spent trying to figure out how to reshape a big 3D dataset, get the EOFs, and then un-reshape. This function does all the reshaping for you, and performs EOF analysis in a computationally efficient manner. The analysis method is a streamlined and optimized version of Guillame MAZE's caleof function, method 2.
For a full description and an in-depth tutorial describing how to perform EOF analysis on climate data, click on the Example tab above.
Greene, C. A., Thirumalai, K., Kearney, K. A., Delgado, J. M., Schwanghart, W., Wolfenbarger, N. S., et al. (2019). The Climate Data Toolbox for MATLAB. Geochemistry, Geophysics, Geosystems, 20. https://doi.org/10.1029/2019GC008392
Can you add a "rotated" EOF feature to the present function? The rotated EOF is useful to unravel dominant modes in large analysis domain.
Thans for sharing.^-^
Three new updates today in response to some issues that have been raised in this comments section:
1. The issue of sometimes complex solutions is now fixed. Previously, for some of the higher order modes, (e.g., around mode 790 and higher when solving for more than 800 modes), the eigenvalues were so small that there was nothing left but numerical noise. Naturally, some of that numerical noise was negative in sign, which resulted in complex eof maps and principal components for all modes to accommodate those few higher-order complex modes. The eof function now forces all negative eigenvalues to zero, thus preventing results from going complex.
2. "percent of variance explained" is no longer rounded. Previously, there was a line that used the fix command to round each value down to the nearest 10th place, and that prevented the sum(expv) from ever equaling 100. Philosophically it's a bit silly to let the eof function round to some arbitrary value, so if you'd like to round your explained variances you'll now have to do it yourself. Thanks to Aart Zwaan for identifying this issue.
3. I've added a section in the Tutorial on how to interpret the percent of variance explained.
@Aart, that's a great suggestion. I'll make the change in the next update. Thanks for tracking that down.
Really nice and useful function.
The only thing that was weird to me is why fix() is used for the expv. This leads to a cumulative explained variance that is not equal to 100%. I would suggest changing fix() to round().
I have been trying to define the significance of the modes by computing their sampling error as lambda(2/N)^1/2, where lambda is the eigenvalue and N is the no. of realizations (North et al. 1982), does anyone know how to compute sampling error.
I have been trying to generate AO/AAO index as described by CPC, does anyone know how do I project anomaly fields to generate Pseudo-PCs. Thanks
I was also testing the code in your examples and found that I know the problem raised by Qiao Sun on 4 May 2017 and Hao Guo 15 Jun 2017.
They claimed that the code in the examples has results of complex double. The reason is because that after excuting the code 'sst = detrend3(sst,t);' The order of rows in pc seems to be reversed. That is, the first mode (expv=36.5) shows up in the last row in pc (r:802). I'm afraid they are misleaded by the Description 'pc(1,:) is the time series of the first (dominant) mode of varibility.' Actually, they should check the expv for the correct order of where the first mode is in pc.
Actually, I'm one of the users of the mfile, not the author. They are wrote by Chad Greene.
I had analyzed SST by these codes. I can answer your question.
In your three cases, the common doubt is that when and why do you multiply '-1'.
You don't need to multiply '-1'. In the SST example, multiplying '-1' is just for matching the result of Messie and Chavez (2011).
Here, you should note that multiplying '-1' is for eof_maps and pc at the same time. For example,in your three cases, the first eof mode multiply '-1'. Accordingly, the PC1 multiply '-1' at the same time. Or, if you don't want to match the paper in Messie and Chavez (2011), you don't need multiply '-1'. The result of eof1 * pc1 (multiplication) is the same.
Hi can zu
I couldn't understand your idea regarding the multiplication of (-1). As I have understood, you are trying to match figure Messie and Chavez (2011). Please try to elaborate clearly for these following queries:
1) In the code of eof.m, there is
%% Flip signs to provide consistent results:
ind = sign(pc(:,1))<0;
eof_maps(:,:,ind) = eof_maps(:,:,ind)*(-1);
pc(ind,:) = pc(ind,:)*-1;
Whether this part is in general or somehow for matching the result with Messie and Chavez (2011)? I mean for an arbitrary data is it required to change?
2) In the example you have mentioned
s = [-1 1 -1 1 -1 1]; % (sign multiplier to match Messie and Chavez 2011)
this 's' is multiplied with first 6 eof.
Whether this part is in general or somehow for matching the result with Messie and Chavez (2011)? I mean for an arbitrary data how I will decide which one to be multiplied with "-1" and which one to be "1"? What for the others?
3) In another part you have mentioned
anomaly(t,-pc(1,:)) % First principal component is enso
Whether this is in general or somehow for matching the result with Messie and Chavez (2011)? I mean for an arbitrary data how to decide whether -1 should be multiplied with or not? whether that -1 should be multiplied with every pc?
Please try to clarify the above. My purpose is not to match the figures of Messie and Chavez (2011) but to understand what could I do for my data. Please reply with clear information with these above three cases.
I wonder that the spatial data need multiply the weights.
I loaded "PacOcean" and a simple problem appears when using the code "[eof_maps, pc, expv] = eof (sst)" :
Subscripted assignment dimension mismatch.
Why? The matrix (sst) is in 3d format (lat, lon,temporal data).
Help me, please!
(1).12 months is one year, four seasons.
(2).As we know,the sun reaches the tropic of cancer once a year in the northern hemisphere and vice versa in the southern hemisphere.It means the period of positive sst phase is semi-annual i.e. 6 months.And in the one cycle , it includes the positive sst phase and negative sst phase in one hemisphere. In this cycle, it is sst change of ditterent months,i.e. seasonal singal. The annual signal is for the change of different years.If you want to get it,you can do the 12-months running average(some papers use the method) i.e. nonseasonal singal.
(3).In the code, it multiplies -1 with both PC and EOF mode. You don't need to do this. The result are the same.Since the signal is product of PC and EOF mode i.e. PC1xMode1, multiplication with "-1" or not do not change for the signal.
Thank you very much for your answer. I couldn't understand how you are telling a season consists of 12 months i.e. one year? As I know in India there are four seasons defined as: Winter Season: January-February, Pre Monsoon Season: March-May, Southwest Monsoon Season: June-September, Post Monsoon Season: October - December. Will you want to mean one year as one season?
My 2nd query is what do you mean by "Semi-annual"? Is it having period of 6 months (i.e. 0.5 year)? But in the example shown here depicts 1 cycle/year i.e. the signal having period 1 year (i.e. 12 months) and hence it is an annual signal. How to distinguish?
The 3rd query is about the negative sign. In the example multiplication with -1 has been done to satisfy the figure Messie and Chavez (2011). I have seen in the code that multiplication with "-1" is done. My query is whether -1 multiplication is needed? If I am doing with some other datasets then is it required? how to decide which mode should be multiplied? is it require to multiply -1 with both PC and EOF for a particular mode?
Please clarrify these queries.
I'd like to point out some odd behaviour (arguably a bug) that this package's function mycaleof shares with the caleof function in Guillaume Maze's PCAtools. The variance fraction (expv or expvar) that is returned for each EOF is the fraction of the variance explained by all calculated EOFs, not of the total variance. For example, I take the code in "TUTORIAL: From raw climate reanalysis data to ENSO, PDO, etc" but call the eof function with n (the number of EOFs to be calculated) equal to 6. Modes 1 & 6 account for 57.3% & 5.2% of the variance. I then set rerun it with n equal to 801. All the EOF patterns are identical, but Modes 1 & 6 now account for only 36.5% & 3.3%. If I modify my code so that Guillaume Maze's caleof function is called instead of mycaleof, then I get the above results when using methods 2 and 4 (calling eigs and svds respectively) but with methods 1 and 3 (calling eig and svd respectively) the fraction of explained variance does not depend on the argument n and is equal to 36.5% & 3.3% for Modes 1 & 6. I conclude that (and I haven't looked at this section of the code in detail yet): a) the denominator in expvar is the variance of all the EOFs that have been calculated, not of the total variance; b) caleof methods 2 and 4, and mycaleof, calculate only the EOFs that they return, but caleof methods 1 and 3 calculated all eofs even if they return fewer.
I'm also learning the codes.In my opinion,the "Seasonal Variation" is the singal changing with different seasons (constitute by 12 different months).In the PC1 ,you can see the temporal variation of mode1.It's a semi-annual variations corresponding with SST changes caused by solar motion.
This is an excellent software to me as I have just started learning. One point is not clear to me which is mentioned below:
In the example you have shown that if EOF applied on the original sst then the 1st mode will show the seasonal variation with high explained variance. Why it is called "Seasonal Variation"? Because I have seen one cycle takes one year to complete. Will you want to mean a season is equal to one year?
To get rid of that variation basically "monthly anomaly" is found. I couldn't connect the two i.e. "Seasonal Variation" and "Monthly anomaly". Although this is an usual practice but will you please put some light on that?
Excellent code. I would like to suggest that the results are normalized, i.e. the temporal eigen-vectors to 1 and the corresponding eigen-values to the total of the variances, so when we multiply the amplitude of eigen-vector and eigen-value is equal to the actual amplitude variability of the time series (i.e. the sst).
Fig 10 in Deser, Clara, et al(2009) may be not EOF1,it doesn't mark. The SST just be anomaly.
Hi Chad Greene,
very great code! I wonder that the data (SST dataset in the example) need to be detrended, the mean removed, and the seasonal cycle removed,but in the eof subfunction (mycaleof) be detrended again (F = detrend(M,'constant');).
And another paper Deser, Clara, et al(2009) http://www.cgd.ucar.edu/staff/cdeser/docs/deser.sstvariability.annrevmarsci10.pdf
Fig 10, just do SST anomaly and get the EOF1 as PDO in North Pacific,this is a little different with Fig5 in Messie and Chavez (2011) which is EOF3 corresponding to PDO.
Hi Hao and Qiao,
Indeed, following the example file
[eof_maps,pc,expv] = eof(sst);
does produce a complex double result. However, take a look at the magnitude of the imaginary component relative to the total magnitude. Below I'm multiplying by 100 to plot the imaginary component as a percentage of the total magnitude:
The horizontal bands indicate there is exactly zero imaginary component for modes 1 through 795. It goes imaginary beyond mode 795, but in climate science we don't typically use any modes past the first few. And there's a reason for that: If you look at how much variance is explained by mode 796:
If you get a complex double result, check the magnitude of the imaginary component relative to the rest of the signal. And see how much percent variance is explained. If the imaginary component is negligible, you can get rid of it by
eof_maps = real(eof_maps);
pc = real(pc);
Hope this helps,
It is good and note worthy, however, I have the same problem with Qiao Sun, the example data has some problem with complex double for the results of eof_maps and pc.
It is great code. I used the example data PacOcean.mat you provided, but in the line [eof_maps,pc,expv] = eof(sst), the results of eof_maps and pc were complex double.
Fixed the issues that arose from rounding the explained variance values, fixed the issue of results going complex for large numbers of modes, updated and expanded the Tutorial.
Typo fix in the documentation.
Added a simple example in the tutorial.
Inspired by: PCAtool, PCA (Principial Component Analysis), Shade Anomaly, PCA (Principal Component Analysis), PCA and ICA Package, PCA, Empirical orthogonal function (PCA) estimation for EEG time series, Face recognition using PCA, Principal Component Analysis for large feature and small observation, trend, Fast SVD and PCA, pca, borders, Empirical Orthogonal Function (EOF) analysis, Empirical Orthogonal Function (EOF) with Spatiotemporal Convertion, cmocean perceptually-uniform colormaps, Principal Component Analysis, PCA, anomaly, detrend3