Reading data into matlab
2 views (last 30 days)
Show older comments
Hi, I have a text file with space separated numbers that I need to import into Matlab to do some processing on. Can not use the "load" command to import the whole file because it's way too big (5Gb). Text file looks like this:
1.2 4.2 5.2 5.33 6.45 7.64 3.45 7.34 ........
2.34 5.23 .235 .2343 2.34 3.4 3.42........
and so on with
What I'd like to do is be able to read in and Store first 10 values of each row into a column vector. Then the next 10 values of each row and o on...
to have something like:
X=[row1 (1 thru 10); row2 (1 thru 10);...]
or more generally,
y=[row1 (start position thru end position;.....]
Any help appreciated,
Thank you!
0 Comments
Accepted Answer
Walter Roberson
on 31 Oct 2011
I'm not so sure this will make you any happier, but...
To read in columns P through Q (inclusive) of file XYZ.TXT, ignoring H lines of headers:
fid = fopen('XYZ.TXT','rt');
Then for each combination of columns:
fseek(fid, 0, -1); %rewind
result = textscan( [repmat('%*f',1,P-1) repmat('%f',1,Q-P+1) '%*[^\n]'], 'HeaderLines', H, 'CollectOutput', 1);
cols.(sprintf('C%d_%d',P,Q)) = result{1};
clear result
When you are done reading as much as you can hold or as you want to deal with:
fclose(fid);
Feel free to use something other than a structure to hold the values.. keeping in mind that you have not specified that you will be using the same number of values each time so a plain numeric array might not work.
There is a more elegant way to skip leading columns, which I know about 3 days ago, but I'm having a heck of a time digging it up at the moment.
3 Comments
Walter Roberson
on 31 Oct 2011
%*f format means to read a floating point number and discard it. We repeat this read-discard enough times to read through to the column before the first one we are interested in.
%f format means to read a floating point number and save it. We repeat this read-save enough times to read from columns P to Q inclusive, which is Q-P+1 times.
%*[^\n] format means to find a sequence of characters that can match any character (including space) _except_ for \n which means newline in this context -- i.e., read to end of line. The * part means to discard it. Overall this means that we read whatever is left over after column Q on the line and discard it.
CollectOutput means to put all of the %f values read (columns P through Q) in to a single numeric array.
testscan() always wraps its output in a cell array even if only one item is output, so the result{1} extracts the numeric array.
sprintf('C%d_%d',P,Q) constructs strings like C7_15 intended to symbolize column 7 through 15.
cols.() the string above is dynamic field name referencing of a structure. So the assignment would be to (e.g.)
cols.C7_15
More Answers (0)
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!