Sort data by day of year
5 views (last 30 days)
Show older comments
Hello,
I have a set of data with 12,000+ rows (and a number of columns, but I don't think is related to my question). Each row represents a day of data for some variable (starting on January 1 of some year, say 1960, the second row would be January 2nd, and so forth; it does this for over 30 years of data). So, the data might look something like:
- 25 (this would correspond to January 1 of the first year)
- 32 (this would correspond to January 2)
- 44 (this would correspond to January 3)
- 11
- .
- .
- .
- 29 (this would be the last day of data in the final year of the whole set)
What I need to do is sort this data so that I have all the January 1 data grouped together, all the Jan 2 data, etc (for all 365 days of the year). My goal is to take the average of all Jan 1 data, all Jan 2 data, and so forth, again, doing this for 365 days of the year for the whole dataset (30+ years). I should account for leap years, although it doesn't particularly matter which year was a leap year.
Any suggestion?
3 Comments
Answers (3)
Walter Roberson
on 25 Jan 2012
"the cyclist"'s questions are good ones.
Once you have a vector that corresponds the day number of the year for each line, with the day number being an integer from 1 onward, then you can use
accumarray(DayNumber(:), YourData(:,TheAppropriateColumn))
The default operation of accumarray is to find the mean.
You will, as you say, have to account for leap years, but I will presume here that you fold all necessary logic for that in to the calculation of the day number. All entries with matching day number will be averaged in the above call.
0 Comments
Andrei Bobrov
on 25 Jan 2012
EDIT 2 [14:16(UTC+4) 26.01.2012]
eg:
a = randi([11 44],1600,1);
k =datevec(datenum('01-Jan_1960')+(0:numel(a)-1).');
id = strfind(k(:,3)',[29 1]);
c = 366*ones(numel(a),1);
idx = setdiff(1:numel(a),id);
c(idx) = rem((1:numel(idx))'-1,365)+1;
out = accumarray(c,a,[],@mean);
0 Comments
the cyclist
on 25 Jan 2012
An extremely pedantic, non-clever solution, assuming you have to infer the dates, but know the starting date:
MONTHSPERYEAR = 12;
DAYSPERMONTH = 31;
START_YEAR = 1960;
START_MONTH = 1;
START_DAY = 1;
NUMBER_ROWS = 12000;
randomData = rand(NUMBER_ROWS,1);
startDateNumber = datenum(START_YEAR,START_MONTH,START_DAY);
dateNumbers = startDateNumber + (0:NUMBER_ROWS-1);
dateVectors = datevec(dateNumbers);
indexToYear = 1;
indexToMonth = 2;
indexToDay = 3;
thisDayOfTheYearData = cell(MONTHSPERYEAR,DAYSPERMONTH);
for nm = 1:MONTHSPERYEAR
for nd = 1:DAYSPERMONTH
indexToThisDate = (dateVectors(:,indexToMonth)==nm) & (dateVectors(:,indexToDay)==nd);
thisDayOfTheYearData{nm,nd} = randomData(indexToThisDate);
end
end
2 Comments
the cyclist
on 25 Jan 2012
This pulls all the daily data out into its own element of a cell array. You can then do any operation (e.g. the mean) on those data.
Jake_K
on 22 Feb 2018
Hi Cyclist.
I have a question since I'm not really looking through. What do I have to add to your code if I want to calculate for every month the difference of the first and last value of the month.
E.g. for January 2015 I measured 5°C on the 1/1/2015 and 13°C on the 31/1/2015. So I get back 8°C. So I can finally write into a matrix the differences. Something like this:
1/1/2015 8
1/2/2015 3
1/3/2015 5
And so on. I would be glad if you could help me. Thanks a lot!
See Also
Categories
Find more on Dates and Time in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!