Datastore readsize - unexpected behavior

6 views (last 30 days)
Anders
Anders on 23 Jun 2023
Commented: Rik on 23 Jun 2023
I would expect the code below to read 40k lines from my datastore at each pass but for reasons unkown to me the number of lines varies between the passes.
ds = tabularTextDatastore(filename,'ReadSize',40000);
c = 0;
while hasdata(ds)
c = c + 1;
TT = read(ds);
T = height(TT);
if c==1
t_total = T;
else
t_total = t_total + T;
end
disp("Done with " +t_total +" ticks.")
end
This procedes the output :
Done with 40000 ticks.
Done with 45096 ticks.
Done with 85096 ticks.
Done with 90190 ticks.
Done with 130190 ticks.
I would expect the increment to be 40k each time. The data is timestamped and based on the timestamp the data in the csv file "filename" does not seem to be corrupt in any way. That is, there are no missing timestamps when reading the data. Is there anything I can do so that I will get 40k lines at each pass (except the last pass of course) ?.
  3 Comments
Anders
Anders on 23 Jun 2023
Sorry, I should have been more careful with the code example. Fixed that now. The actual data I'm using is proprietary so I'm not allowed to share it. Would it be helpful with an example file with the same structure?
Rik
Rik on 23 Jun 2023
Anything that reproduces this problem is fine. You care about the actual data, we don't. For this problem, the only thing that matters is that the data produces the same results.

Sign in to comment.

Answers (1)

Sanskar
Sanskar on 23 Jun 2023
Hi Anders!
What I understand from your question is that you want to read 40k lines from your datastore but you are getting random lines after first iteration of the loop.
'ReadSize' property which you are using call to read at most number of rows which is given as argument.
But 'hasdata' function doesn't guarantee that exactly 'ReadSize' number of rows will be passed.
Instead of 'hasdata' you can use 'isDone()' to check if all the data has been read from dataset.
Following is the modified code:
ds = tabularTextDatastore(filename, 'ReadSize', 40000);
c = 0;
while ~isDone(ds) % Use isDone instead of hasdata
c = c + 1;
if c == 1
t_total = T;
else
t_total = t_total + T;
end
data = read(ds); % Read exactly 40,000 lines at each pass
disp("Done with " + t_total + " ticks.")
end
Following are the link of dcumentation for isDone():
  1 Comment
Anders
Anders on 23 Jun 2023
Edited: Anders on 23 Jun 2023
Hi Sanskar,
I get an Unrecognized function or variable 'isDone'. Is isDone part of some toolbox? When I type which isDone I get a 'not found' message.
If I understand the documentation correctly isDone is used for system objucts and cannot be used with datastores.

Sign in to comment.

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!