How to read just a part of a binary file with a predefined end position or a predefined amount of Bytes?

22 views (last 30 days)
Hi. I have searched a lot to find the answer, but was not successful.
I want to get data records ({'uint16' 'uint16' 'uint16' 'uint8' 'uint8'} = 8 Bytes) out of a binary file.
The files have millions of records with 1 min time steps and a given start date.
Up to now, I was able to define the start position by skipping the wanted time duration (1 record of 8 Bytes = 1 min) with fseek.
My problem is, that I can not find a solution how to define the end position or the amount of records for fread.
One solution would be to use a Loop in which the record length is added to fseek each run and the rest of the file is skipped after every record. But this is grossly inefficient and likely would need even more time than reading the whole file and picking the wanted part out of the resulting matrix, I guess.
I hope you understand what I want to ask...
I need something like fread(fileID,start_position,end_position or number of records).
Thanks in advance.
  3 Comments
Sebastian
Sebastian on 12 Dec 2018
Edited: Sebastian on 12 Dec 2018
Hi Image Analyst,
I don't have evidence, but since it takes more time to get every cell from a matrix by using a loop-function than directly accessing the matrix, I concluded that the loop-attempt would increase the needed time also for this purpose.
I just read the memmapfile instruction. As far as I understood also with this function only comes the possibility to define an offset to skip the first n Bytes, but not the possibility to define a number of wanted records or an end position. I just can't understand why they did not add such an input argument when they implemented the offset argument...
The thing is, I know that it does not take ages to read a binary file. In my case it takes 15 to 20 seconds... But I just started a new project and will have to use this function a lot of times for the next 3 years. So saving a few seconds each time will add up to a not insignificant amount of time.
Image Analyst
Image Analyst on 12 Dec 2018
I deal with 3-D CT images of up to 20 GB in size and I use fseek() and fread() to read slices out of the middle of the file and it's pretty quick, like a second or two. I'm not aware of any other ways, so you might call the Mathworks and ask them. How big are your files?

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 12 Dec 2018
I'm not entirely sure I completely understand, maybe that's what you want:
recordstart = ??? %some integer value. Index of first desired record
numrecords = ??? %how many records to get
filepath = ??? %path of the file
recordtypes = {'uint16', 'uint16', 'uint16', 'uint8', 'uint8'};
recordsizes = [2, 2, 2, 1, 1]; %size of each type in bytes. Must match recordtypes
fid = fopen(filepath, 'r')
fseek(fid, (recorstart - 1) * sum(recordsizes), 'bof');
data = fread(fid, [sum(recordsizes), numrecords], '*uint8'); %read numrecords as uint8
data = mat2cell(data, recordsizes, numrecords);
data = cellfun(@(bytes, data) typecast(bytes(:), data), data', recordtypes, 'UniformOutput', false);
  1 Comment
Sebastian
Sebastian on 13 Dec 2018
Edited: Sebastian on 13 Dec 2018
Thanks a lot!
That's what I wanted. In the beginning of writing my function I came across the input argument 'sizeA'. But I searched on and forgot about it. I think I falsely assumed that this command would still read the whole binary file and then just rearange the output...

Sign in to comment.

More Answers (0)

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!