Large Text File Datastore
16 views (last 30 days)
Show older comments
Christopher McNamara
on 8 Mar 2019
Answered: Cris LaPierre
on 9 Mar 2019
Hello,
I am trying to read rather large text files and post process them for relevant data in MATLAB. The order of size of the text file will be about 20-50 GB on average and I unfortunately have no control over the formatting of the file as it is an output from another program. The text file which is output contains large amounts of whitespace, text and other non-relevant data but does have a consistent structure to it that I have been able to decond and extract the relevant numerical information from for files with smaller sizes that can fit into memory. But now I have to make it work on a larger scale.
I cannot share the format of the file as it is restricted but I can describe it. The file generated can be delimited by the page-break delimiter (char(12)) and then each page has a specific format depending on the information it contains. Essentially my current approach does the following:
1) Read the text file in: A = fileread(File)
2) Split the file into its pages via P = regexp(A,char(12),'split')
3) Loop through each page found and use further splitting commands to extract needed numerical data and organize it
4) Output a data structure (MATLAB struct) of organized data from the function
This works well so far but I cannot get the file to read in for larger files (first step) because of out of memory errors. After doing some searching it seems like I may be able to use a datastore or tall array to somehow get past this but I am unsure if this will be scalable or I should try a different approach. Can someone make a suggestion? Is the current function scalable by simply converting to usage of tall arrays.
As a side note to the use of datastore, the text file is NON-TABULAR data if that is relevant.
Thank you.
0 Comments
Accepted Answer
Cris LaPierre
on 9 Mar 2019
From the documentation, "Tall arrays provide a way to work with data backed by a datastore that can have millions or billions of rows."
Tall Arrays are how you would store the data you pull from the text file. To load, it sounds like you still need to create a datastore object.
0 Comments
More Answers (0)
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!