How I can eliminate days with incomplete hours?
1 view (last 30 days)
Show older comments
Hi everyone. I can't find a solution to my issue in the web. I have hourly averages data from a meteorological station. However there are some missing hours due to the lack of measurements. In this cases, I would like to remove the complete day, in order to keep only the days in which there are measurements in a whole day (24 data per day). Look at this example; The meteorological station is giving the next output, in which columns 1,2,3 and 4 corresponds to year, month, day and hour, respectively, and column 5 correspond to the measurement:
2010 1 1 0 12.3
2010 1 1 1 12.4
2010 1 1 2 3
2010 1 1 3 34.7
2010 1 1 4 54
2010 1 1 5 565
2010 1 1 6 2
2010 1 1 7 334
2010 1 1 8 23
2010 1 1 9 223
2010 1 1 10 32
2010 1 1 11 233
2010 1 1 12 544
2010 1 1 13 23
2010 1 1 16 12
2010 1 1 17 12
2010 1 1 18 3
2010 1 1 19 34
2010 1 1 20 54
2010 1 1 21 565
2010 1 1 22 2
2010 1 1 23 334
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 3 0 565
2010 1 3 1 2
2010 1 3 2 334
2010 1 3 3 12
2010 1 3 5 12
2010 1 3 6 3
2010 1 3 7 34
2010 1 3 8 54
2010 1 3 9 565
2010 1 3 10 2
2010 1 3 11 334
2010 1 3 12 23
2010 1 3 13 223
2010 1 3 14 32
2010 1 3 15 233
2010 1 3 16 544
2010 1 3 17 23
2010 1 3 18 12
2010 1 3 19 12
2010 1 3 23 3
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2.2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54.7
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544.6
2010 1 4 23 344.4
In the table of above, there were some missing hours; in day 1: hour 14 and hour 15; in day 3: hour 4, hour 20, hour 21 and hour 22. For that reason, is necessary don't take into account day 1 and day 3, and generate the next output:
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544
2010 1 4 23 344
Thanks in advance for your help!
0 Comments
Answers (4)
Marieke Klijn
on 21 Sep 2016
Hi Miguel,
For your question I loaded your copied data and named it 'data'. I assume that every day contains 24 hours (0-23) and each day has a number which is unique per dataset. So I loop over each day and check if there are 24 hour entries. If this is not the case then the rows are removed from the dataset.
days = unique(data(:,3));
for ii = 1:length(days)
hours_in_day = find(data(:,3) == days(ii));
if length(hours_in_day) < 24
data(hours_in_day,:) = [];
end
clear hours_in_day
end
Hope this works for you!
Cheers, Marieke
3 Comments
Geoff Hayes
on 21 Sep 2016
Miguel - depending upon your version of MATLAB, you could use hist or http://www.mathworks.com/help/matlab/ref/histogram.html to determine the number of recorded hours in each day. For example, using your above data and hist we would see that
[hourCount,days] = hist(data(:,3),1:max(data(:,3)))
returns
hourCount =
22 24 20 24
days =
1 2 3 4
So we know that the first and third days are incomplete. We could then remove these days from the data by checking to see which element of column three is a member of those days that are incomplete and set it to an empty element (which removes it from the array)
data(ismember(data(:,3),days(hourCount ~= 24)),:) = [];
In the above, we rely on the fact that
ismember(data(:,3),days(hourCount ~= 24))
returns a logical array of ones and zeros for each row of data where a one indicate that the row is missing an hour and a zero indicates otherwise. There may be a more efficient way of doing the above..I seem to recall that ismember can be expensive.
George
on 21 Sep 2016
Edited: George
on 21 Sep 2016
- Find groups on year, month, day.
- Validate that your hours meets what you need (e.g., numel == 24 or somesuch)
- Extract valid dates
This should mostly do it.
weather = readtable('weather.xlsx');
G = findgroups(weather.Year, weather.Month, weather.Day);
validDays = splitapply(@(x) (numel(x) == 24), weather.Hour, G);
weather(validDays(G),:)
Where this may fall down would be if you had days with a dupe reading, but still 24 readings. You can do different anonymous functino if that concerns you.
0 Comments
Andrei Bobrov
on 22 Sep 2016
[~,~,c] = unique(EXAMPLE(:,1:3),'rows');
t = nonzeros((1:max(c))'.*(accumarray(c,1) == 24));
out = EXAMPLE(ismember(c,t),:);
0 Comments
See Also
Categories
Find more on Dates and Time in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!