Break array or timetable into smaller versions and find max value
5 views (last 30 days)
Show older comments
I have a 10000x2 array which can be converted into a timetable. What I want to do is break the timetable into each year and find the maximum value for each year in column 1 and repeat in column 2.
data = randi(100, 10000, 2); %create example data
%Turn it into a timetable
data_timetable = array2timetable(data, 'StartTime', datetime(1990, 10, 1), 'TimeStep', caldays(1));
%Now find max value for each year in each column
I thought of doing something like this
ystart = 1990;
max_vals = [];
for i = 1 :28 %1 to 28 because it goes to 2018
for j = 1:length(data)
if year(data_timetable.Time(j)) == ystart && data(j,1) > max_vals
max_vals = data(j,1);
elseif year(data_timetable.Time(j)) == (ystart+i) && data(j,1) > max_vals
max_vals = [max_vals, data(j,1)];
end
end
end
but I know the logic behind this code is wrong and it also shows the following error
Any suggestions?
EDIT:
data = randi(100, 10000, 2);
data_timetable = array2timetable (data, 'StartTime', datetime(1990, 10,1), 'TimeStep',caldays(1));
y = unique(year(data_timetable.Time));
c = cell(length(y),1);
for i = 1:length(c)
c{i} = data_timetable(year(data_timetable.Time) == y(i),:);
end
I managed to separate the data into a cell containing multiple timetables, but how do I get the maximum values in each timetable and the date when they occur?
0 Comments
Accepted Answer
Voss
on 3 Jan 2023
Edited: Voss
on 3 Jan 2023
Here's one way. This finds the first maximum value in each year (for each column of data), and stores the value and the date it occurred.
data = randi(100, 10000, 2);
data_timetable = array2timetable (data, 'StartTime', datetime(1990, 10,1), 'TimeStep',caldays(1));
yy = year(data_timetable.Time); % store the years to avoid having to call year() inside the loop
y = unique(yy);
Ny = numel(y);
c = cell(Ny,1);
Ncols = size(data_timetable,2);
max_vals = zeros(Ny,Ncols); % initialize max_vals
max_dates = NaT(Ny,Ncols); % initialize the dates when max_vals occur
for i = 1:Ny
c{i} = data_timetable(yy == y(i),:);
[max_vals(i,:),idx] = max(c{i}{:,:},[],1); % max() of contents of c{i}, along 1st dimension, outputting index as well as max value
max_dates(i,:) = c{i}.Time(idx); % date of maximum value is c{i}.Time(idx)
end
max_vals
max_dates
% plot the data and maxima:
plot(data_timetable.Time,data_timetable{:,:})
hold on
plot(max_dates(:,1),max_vals(:,1),'mo',max_dates(:,2),max_vals(:,2),'ro')
% zoom in on a portion, so we can see the max values:
xlim(data_timetable.Time([401 1000]))
ylim([90 105])
9 Comments
Voss
on 12 Jan 2023
You're welcome!
"Do you have to initialise max_vals with NaNs instead of zeros because there is no way to check if c{i} is empty inside the loop otherwise?"
No, you can check if c{i} is empty regardless of how max_vals is initialized.
"Because if you use zeros and an if statement to check for zeros and remove them, this would possibly cause problems within the data as some values might be 0."
That's it exactly!
More Answers (1)
Eric Sofen
on 4 Jan 2023
Normally, groupsummary or varfun should be the starting point for grouped calculations like this, but your particuar problem is a bit tricky because you also want the datetime when the max occurred. So you need rowfun, similar to the last example in the "Grouped Calculations" example.
data = randi(100, 10000, 2); %create example data
%Turn it into a timetable
data_timetable = array2timetable(data, 'StartTime', datetime(1990, 10, 1), 'TimeStep', caldays(1));
data_timetable.Year = data_timetable.Time.Year; % Need Year as an explicit grouping variable for rowfun
% We want Time as a first-class variable, so convert to table.
data_table = timetable2table(data_timetable);
t1 = rowfun(@findMax, data_table, "GroupingVariable","Year", "InputVariables",["Time","data1"], "OutputVariableNames",["Max1","Max1Time"]);
t2 = rowfun(@findMax, data_table, "GroupingVariable","Year", "InputVariables",["Time","data2"], "OutputVariableNames",["Max2","Max2Time"])
t1.GroupCount = [];
t2.GroupCount = [];
tfinal = outerjoin(t1,t2,"Keys","Year","MergeKeys",true)
plot(data_timetable.Time,data_timetable{:,1:2})
hold on
plot(tfinal.Max1Time,tfinal.Max1,'go',tfinal.Max2Time,tfinal.Max2,'ro')
% zoom in on a portion, so we can see the max values:
xlim(data_timetable.Time([401 1000]))
ylim([90 105])
function [maxVal,maxTime] = findMax(times,vals)
% Return time at which maximum element of vals occurred
[maxVal,maxIndex] = max(vals);
if ~isnan(maxVal)
maxTime = times(maxIndex);
else
maxTime = NaT;
end
end
See Also
Categories
Find more on Calendar in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!