Can I remove the date from big data set text file?
2 views (last 30 days)
Show older comments
I have a huge text file that contains data points from a laser. It reads as the date (as in the 04/12/2023) followed by the time (as in 15:43:42.225) and then the corresponding out put value (as in 0.7756). The problem I am running into is that the date can be read as the 12th of April or te 4th of December. After running the code that I have it throws this error message:
The DATETIME data was created using format 'MM/dd/uuuu HH:mm:ss.SSS' but also matched 'dd/MM/uuuu HH:mm:ss.SSS'.
To avoid ambiguity, supply a datetime format using SETVAROPTS, e.g.
opts = setvaropts(opts,varname,'InputFormat','MM/dd/uuuu HH:mm:ss.SSS');
I don't know how to use setvaropts, so I looked it up. However, all the code I have tested since hasn't worked. I get a lot of unknown variable messages. The thing is, I really don't care about the date in my data set. So, is there a way to completely ignore it so my code will run without getting stuck on that part?
This is my original code in case it is useful:
data = readtable('time0.txt');
t0 = data{:,1};
y0 = data{:,2};
[pks,locs]=findpeaks(y0,"MinPeakProminence",2);
Average0 = mean(diff(locs));
1 Comment
Accepted Answer
dpb
on 17 Apr 2023
Edited: dpb
on 17 Apr 2023
Not easily, no you can't ignore the date because the file is tab delimited and the date/time is a single string. To ignore it also leaves you without the time. The help message showed you how to use setvartype, just follow the directions...
fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1358713/time0.txt';
opts=detectImportOptions(fn) % create a default options object first; show what one looks like
We observe it recognized first of the two columns as a datetime; there are only two variables, so it is tab-delimited and the time string was written as only the one string. We only have to eliminate the conundrum of which date format is the correct one. It presumed the month/day/year would be the more likely so gave that in the help message; if that's not correct, then swap those two. So, follow the instructions...
opts=setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuuu HH:mm:ss.SSSSSS');
tData=readtable(fn,opts); % now read with the opts struct to tell it...
head(tData)
And, if it's only a timeseries as looks like may be, then you can convert the datetime to a duration and have only the elapsed time which is probably what you're interested in...and make more user-friendly names besides--
tData.Properties.VariableNames={'Time','Response'};
tData.Time=tData.Time-tData.Time(1);
Which shows the time-of-day format isn't particularly useful for this purpose so
tData.Time.Format='mm:ss.SSSSSS';
head(tData)
The duration output formatting is pretty weak; while the actual durations are stored with full precision, how to look at them is quite restrictive; never could figure out why TMW did that. It might be more convenient to convert to microsecs...or, if the data were sampled with a fixed A/D sample rate and not free-run, then as you say, just forget the first variable entirely and use/generate your time vector from the sampling rate and number of samples.
That's easy enough to just ignore if want to go that direction; in that case in the opts object, just say
opts.VariableNames(2)={'Response'}; % set the name here
opts.SelectedVariableNames=opts.VariableNames(2); % read it only
tData=readtable(fn,opts); % now read with the opts struct to tell it...
head(tData)
Now you don't have to tell it what the datetime format is; it's ignored so doesn't matter...
And, that's all there is to using setvaropts! <VBG>
3 Comments
dpb
on 20 Apr 2023
Edited: dpb
on 20 Apr 2023
fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1361888/time0.txt';
opts = detectImportOptions(fn);
opts = setvaropts(opts,opts.VariableNames(1),'InputFormat','MM/dd/uuu HH:mm:ss.SSSSSS');
tData = readtable(fn,opts);
tData.Properties.VariableNames = {'Time','Response'};
tData.Time = tData.Time - tData.Time(1);
tData.Time.Format = 'mm:ss.SSSSSS';
subplot(2,1,1)
findpeaks(tData.Response);
xlim([0 250])
subplot(2,1,2)
findpeaks(tData.Response,"MinPeakProminence",2);
xlim([0 250])
There's no magnitude of that size in the data; you've screened all of it out.
Once you have the data in the table, use it; there's no reason to create more copies of it in some other fashion. You'll also not I reverted back to the tData variable to refresh memory that it is a table
Moral: ALWAYS PLOT YOUR DATA FIRST!!!!
More Answers (0)
See Also
Categories
Find more on Data Distribution Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!