filling missed timestamps using prediction

2 views (last 30 days)
Good day.
I have a file with timestamp records where timestamps were recorded every time my sensor detects a barrier and assigns to timestamp the moment the barrier passes through the detector and the moment it leaves, and writes both times (entering and leaving) to the logging file. However, i faced a problem with my computer shutting off randomly while my experiment was running while the barrier kept passing all the time through the detector and the computer was off. Therefore, I have many gaps in my data because every day I switch the computer on and continue logging the data.
Is there a way I can fill the gaps with a logic prediction using the data I have collected? I have attached the data file I have, I appreciate all the help in advance.
  3 Comments
ennes mulla
ennes mulla on 5 Jul 2021
Thank you for asking
the data in the file is recorded as follows. The 1st column is the date of the day, the 2nd column is the entering time of the barrier through the detector you can call it (gate opened). The 3rd column is when the barrier retract and leaves the detector (gate closed). For instance, the first line in the data file is.
Tue Jun 22 2021 16:07:49 PM 4:07:50 PM
[day date] --------- [gate opened]--[gate clsoed]
Basically, I have a small piece of metal that its tip passed through light barrier detector KY010 Module then retract back and repeat again. I used this to count the number of cycles of reciprocating mechanism i used for a fatigue test. The period of one cylce vareis between 1.3 to 4 seconds. I hope could make things clear
dpb
dpb on 5 Jul 2021
Gotcha. My eyes glazed over and failed to note the two times after the date stamp...

Sign in to comment.

Accepted Answer

dpb
dpb on 5 Jul 2021
Edited: dpb on 5 Jul 2021
tData=readtable('data.csv');
tData.Var1=datetime(tData.Var1,'InputFormat','eee MMM dd yyyy');
tData.Properties.VariableNames={'Date','Open','Close'};
tData.Span=seconds(tData.Close-tData.Open);
gives the preview of the data as
>> head(tData)
ans =
8×4 table
Date Open Close Span
___________ ________ ________ ____
22-Jun-2021 16:07:49 16:07:50 1
22-Jun-2021 16:07:54 16:07:57 3
22-Jun-2021 16:08:01 16:08:03 2
22-Jun-2021 16:08:06 16:08:09 3
22-Jun-2021 16:08:13 16:08:16 3
22-Jun-2021 16:09:01 16:09:02 1
22-Jun-2021 16:09:02 16:09:03 1
22-Jun-2021 16:09:04 16:09:04 0
>>
So, then let's look at the distribution of Span --
>> u=[unique(tData.Span) histc(tData.Span,unique(tData.Span))]
u =
-86399 5
-39549 1
-35965 1
-32346 1
-32339 1
0 40665
1 291387
2 353
3 3
14 1
68479 1
>>
From this my first conclusion is you need to be recording at least one more digit of resolution in the time stamps as you have quite a number of times that are recorded as identically zero with only one-second resolution.
Let's see what the oddball time frames look like:
>> tData(find(tData.Span>20)+[-2:2],:)
ans =
5×4 table
Date Open Close Span
___________ ________ ________ _____
27-Jun-2021 00:35:16 00:35:16 0
27-Jun-2021 00:35:17 00:35:18 1
27-Jun-2021 00:35:18 19:36:37 68479
27-Jun-2021 19:36:37 19:36:38 1
27-Jun-2021 19:36:38 19:36:39 1
>>
So, that is a time frame during one day with missing frames; the Close time of the timestamp before matches that of the Open when recording picked up so within the recording resolution that's permissible.
Next, what about the negative Spans --
tData(find(tData.Span==-86399)+[-2:2],:)
ans =
25×4 table
Date Open Close Span
___________ ________ ________ ______
22-Jun-2021 23:59:57 23:59:57 0
22-Jun-2021 23:59:58 23:59:59 1
22-Jun-2021 23:59:59 00:00:00 -86399
23-Jun-2021 00:00:00 00:00:01 1
23-Jun-2021 00:00:02 00:00:03 1
23-Jun-2021 23:59:56 23:59:57 1
23-Jun-2021 23:59:57 23:59:58 1
23-Jun-2021 23:59:59 00:00:00 -86399
24-Jun-2021 00:00:00 00:00:01 1
24-Jun-2021 00:00:01 00:00:02 1
28-Jun-2021 23:59:56 23:59:57 1
28-Jun-2021 23:59:58 23:59:58 0
28-Jun-2021 23:59:59 00:00:00 -86399
29-Jun-2021 00:00:00 00:00:01 1
29-Jun-2021 00:00:02 00:00:02 0
29-Jun-2021 23:59:56 23:59:57 1
29-Jun-2021 23:59:58 23:59:59 1
29-Jun-2021 23:59:59 00:00:00 -86399
30-Jun-2021 00:00:00 00:00:01 1
30-Jun-2021 00:00:02 00:00:03 1
02-Jul-2021 23:59:57 23:59:57 0
02-Jul-2021 23:59:58 23:59:59 1
02-Jul-2021 23:59:59 00:00:00 -86399
03-Jul-2021 00:00:01 00:00:01 0
03-Jul-2021 00:00:02 00:00:03 1
>>
The issue w/ these is that your data recording doesn't have the facility to include the date change when the day rolls over at midnight so the Close time of the next morning is less than that of the midnight hour the night before.
If this turns out to be a 59:59 to 00:00 one-second interval, then you get the exact match of -86399, one second less than the number of seconds in a day. That's a pretty easy one to find and fix.
(*) NB: the above code will output stuff in a non-sequentially order manner; I rearranged by the timestamp chronologically so don't be surprised if you run the exact code when result isn't the same initially. Output the individual locations instead of the vectorized form to get them in order directly.
>> tData(find(tData.Span==-39549)+[-2:2],:)
ans =
5×4 table
Date Open Close Span
___________ ________ ________ ______
25-Jun-2021 20:33:10 20:33:11 1
25-Jun-2021 20:33:12 20:33:13 1
25-Jun-2021 20:33:13 09:34:04 -39549
26-Jun-2021 09:34:04 09:34:05 1
26-Jun-2021 09:34:05 09:34:06 1
>>
Here and the places similar that are negative Span but smaller are those that also have missing values but are not just the one-second rollover at midnight as above.
>> tMiss=(tData.Date(ix)+1+tData.Close(ix)) - (tData.Date(ix)+tData.Open(ix))
tMiss =
duration
13:00:51
>>
is the missing interval.
To interpolate into these missing values, with the data you have to date you're pretty limited in choices. I'd probably just use the observed frequency of
>> u=u(iswithin(u(:,1),0,15),:) % keep only those within reasonable range
u =
0 40665
1 291387
2 353
3 3
14 1
>> p=u(:,2)/sum(u(:,2))*100 % empirical distribution of span
p =
12.2334
87.6592
0.1062
0.0009
0.0003
>>
You would probably want to either fit an empirical distribution or arbitrarily fill in some guesstimated values for those spans between the 3-second observation and whatever you think is the maximum reasonable value.
As noted, above, however, it looks to me as though you need more precision in the time stamp and the recording should include both the date and the time for both Open and Close events. You could cut down the size of the log by reducing the format of the date stamp to something less verbose.
You need to make sure the collection computer is on a UPS so power glitches won't shut it down and also make sure any screensavers, powersavers, etc., are inactive so it doesn't power itself down. Those steps should greatly reduce if not entirely eliminate the issues going forward.
  2 Comments
ennes mulla
ennes mulla on 5 Jul 2021
Thank you so much for your answer. I truly appreciate it and appreciate your effort.
I have a small question, the last two line of code you wrote didn't work for me
%when I wrote this line I get the following
tMiss=(tData.Date(ix)+1+tData.Close(ix)) - (tData.Date(ix)+tData.Open(ix))
Unrecognized function or variable 'ix'.
% Alos, with the last function you called
u=u(iswithin(u(:,1),0,15),:)
Unrecognized function or variable 'iswithin'.
Do you know why I get this?? I am using MatLab R2020b
dpb
dpb on 5 Jul 2021
Edited: dpb on 5 Jul 2021
ix=find(tData.Span==-39549);
tMiss=(tData.Date(ix)+1+tData.Close(ix)) - (tData.Date(ix)+tData.Open(ix))
Pick whatever index you want to look at.
iswithin
>> type iswithin
function flg=iswithin(x,lo,hi)
% returns T for values within range of input
% SYNTAX:
% [log] = iswithin(x,lo,hi)
% returns T for x between lo and hi values, inclusive
flg= (x>=lo) & (x<=hi);
>>
is my utility wrapper function to encapsulate the two logical tests into a simple function reference for simplicity in the user code. I've used it for so long I often forget it's not a builtin MATLAB function.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!