Grouping times by end to start time

1 view (last 30 days)
Ryan
Ryan on 6 Apr 2023
Answered: Peter Perkins on 6 Apr 2023
Hey everyone, I have to group event times together and can’t think of the best way to do it. I have two arrays of values. The first are start times and the second is end times. I need to group a collection of start and end times together if the gap time between each group’s end time and start time is less than than some given value. For example, with a max gap time of 5, and start times of [5, 7, 17, 21, 35, 37] and end times of [12, 9, 22, 23, 38, 41], i need to group start time as such: {[5, 7, 17, 21], [35, 37]}, while end times should look like this: {[12, 9, 22, 23], [38, 41]}. This occurs like this: {[5]}, {[12]} since event 1 has the earliest start time, it is stored first, the the other event’s start times are checked to see if they are within gap time from the first event’s end time. This leads to these start times: {[5, 7, 17]}, and these end times: {[12, 9, 22]}. Then again, the other event’s are checked to see if their start times are within the gap time of the group’s maximum end time (in this case 22). Accordingly, the new start times are {[5, 7, 17, 21]}, and end time are {[12, 9, 22, 23]}. The new group’s end time is 23. Now no other event’s have start times within the gap time. So this process repeats with the remaining events. Thank you everyone!

Answers (4)

Chunru
Chunru on 6 Apr 2023
t1 = [5, 7, 17, 21, 35, 37];
t2 = [12, 9, 22, 23, 38, 41];
t2_c = cummax(t2)
t2_c = 1×6
12 12 22 23 38 41
gap = [0 t1(2:end) - t2_c(1:end-1)]
gap = 1×6
0 -5 5 -1 12 -1
gap = gap > 5
gap = 1×6 logical array
0 0 0 0 1 0
idx = find(diff([-inf gap])> 0)
idx = 1×2
1 5
idx = [idx length(t1)+1];
for i=1:length(idx)-1
t1out{i} = t1(idx(i):idx(i+1)-1);
t2out{i} = t2(idx(i):idx(i+1)-1);
end
t1out, t2out
t1out = 1×2 cell array
{[5 7 17 21]} {[35 37]}
t2out = 1×2 cell array
{[12 9 22 23]} {[38 41]}

Image Analyst
Image Analyst on 6 Apr 2023
If you have the stats toolbox, I'd use dbscan
% Optional initialization steps
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 14;
markerSize = 30;
startTimes = [5, 7, 17, 21, 35, 37];
endTimes = [12, 9, 22, 23, 38, 41];
subplot(2, 1, 1);
plot(startTimes, endTimes, 'b.', 'MarkerSize', markerSize);
grid on
xlabel('StartTimes', 'FontSize',fontSize)
ylabel('EndTimes', 'FontSize',fontSize)
title('Raw, unclassified points', 'FontSize',fontSize)
%--------------------------------------------------------------------------------------------------------------------
% Measure the distance between points.
xy = [startTimes(:), endTimes(:)]
xy = 6×2
5 12 7 9 17 22 21 23 35 38 37 41
distances = pdist2(xy, xy) % Just to see the distance between points.
distances = 6×6
0 3.60555127546399 15.6204993518133 19.4164878389476 39.6988664825584 43.1856457633784 3.60555127546399 0 16.4012194668567 19.7989898732233 40.3112887414927 43.8634243989226 15.6204993518133 16.4012194668567 0 4.12310562561766 24.0831891575846 27.5862284482674 19.4164878389476 19.7989898732233 4.12310562561766 0 20.5182845286832 24.0831891575846 39.6988664825584 40.3112887414927 24.0831891575846 20.5182845286832 0 3.60555127546399 43.1856457633784 43.8634243989226 27.5862284482674 24.0831891575846 3.60555127546399 0
%--------------------------------------------------------------------------------------------------------------------
% Do clustering with the "dbscan" algorithm.
% [classNumbers, corepts] = dbscan(distances, searchRadius, minPointsPerCluster, 'Distance','precomputed')
searchRadius = 5; % It's in the same cluster if the point is within this of other points.
minPointsPerCluster = 2; % We need to have at least this many point to be considered a valid cluster.
[classNumbers, isACorePoint] = dbscan(xy, searchRadius, minPointsPerCluster)
classNumbers = 6×1
1 1 2 2 3 3
isACorePoint = 6×1 logical array
1 1 1 1 1 1
%--------------------------------------------------------------------------------------------------------------------
% Plot the clusters in unique colors.
subplot(2, 1, 2);
numClusters = max(classNumbers);
cMap = turbo(numClusters);
for k = 1 : numClusters
thisClustersIndexes = classNumbers == k;
plot(startTimes(thisClustersIndexes), endTimes(thisClustersIndexes), '.-', ...
'MarkerSize', markerSize, 'LineWidth', 3, 'Color', cMap(k, :))
hold on;
end
grid on
xlabel('StartTimes', 'FontSize',fontSize)
ylabel('EndTimes', 'FontSize',fontSize)
title('Now classified into groups', 'FontSize',fontSize)

Image Analyst
Image Analyst on 6 Apr 2023
Edited: Image Analyst on 6 Apr 2023
Not sure how you got your results but they don't seem to follow your definition of the gap time being the time between the start time and the end time: "the gap time between each group’s end time and start time". This is what I get:
startTimes = [5, 7, 17, 21, 35, 37];
endTimes = [12, 9, 22, 23, 38, 41];
% Measure "gaps" defined as difference between end times and start times.
gapTimes = endTimes - startTimes
gapTimes = 1×6
7 2 5 2 3 4
gapThreshold = 5;
% Find elements with a gap less than the threshold.
indexes = gapTimes <= gapThreshold;
startShort = startTimes(indexes)
startShort = 1×5
7 17 21 35 37
endShort = endTimes(indexes)
endShort = 1×5
9 22 23 38 41
% Find elements with a gap more than the threshold.
indexes = gapTimes > gapThreshold;
startLong = startTimes(indexes)
startLong = 5
endLong = endTimes(indexes)
endLong = 12
Or did you really mean "the gap time is the time between one group’s end time and the start time of the next group"?
  2 Comments
Ryan
Ryan on 6 Apr 2023
You are correct, I explained it poorly, the gap time is the time between one group’s end time and the start time of another group.
Image Analyst
Image Analyst on 6 Apr 2023
But some of your events overlap and have negative gap times:
startTimes = [ 5, 7, 17, 21, 35, 37];
endTimes = [12, 9, 22, 23, 38, 41];
% Measure "gaps" defined as difference between end times and start times.
gapTimes = endTimes(1:end-1) - startTimes(2:end)
gapTimes = 1×5
5 -8 1 -12 1
What do you want to do in the case that the events overlap?

Sign in to comment.


Peter Perkins
Peter Perkins on 6 Apr 2023
This is a lot like hierarchical clustering, if you happen to have the Statistics Toolbox.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!