Import text files with character and numeric data

Hello, I have the following text file (please find attached). I want to import it into matlab and I need only numeric data. The text is not required. I tried this using the import function in matlab. The problem I have is the number of columns are not known and keeps on changing. So the generated code is not working when the number of columns change. How can I import the data with any number of columns and rows. Moreover, the data file I attached is a smaller version. The number of rows in original data file goes over 3 million. How can I import the text file of this type as fast as possible ?
Thank you.

 Accepted Answer

s=importdata('file.txt')
data=s.data
text=s.textdata
colheaders=s.colheaders

9 Comments

Karthik
Karthik on 16 Jul 2015
Edited: Karthik on 16 Jul 2015
Thanks for the response. How can I extract the numbers associated with the result "text".
I doubt that it can work this way. If you need to extract the array of numbers only, you can do it this way:
fId = fopen( 'Raw.txt', 'r' ) ;
data = textscan( fId, '%f %f %f', 'HeaderLines', 22 ) ;
fclose( fId ) ;
Then if you prefer to deal with a numeric array instead of a cell array of columns:
data = horzcat( data{:} ) ;
Now if you also need the numbers associated with the parameters from the header, one way to do it is to use a regular expression:
% - Similar to what we did above, but we get the file content in
% a string buffer.
content = fileread( 'Raw.txt' ) ;
data = textscan( content, '%f %f %f', 'HeaderLines', 22 ) ;
data = horzcat( data{:} ) ;
% - Now we process the buffer with REGEXP.
tokens = regexp( content, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
With that you get:
>> data
data =
1.0e+04 *
0.0000 -0.8247 -0.9921
0.0000 -0.7204 -1.0678
0.0000 -0.8800 -1.2426
0.0000 -0.7581 -1.0489
0.0000 -0.7281 -1.1200
0.0001 -0.6932 -1.0733
0.0001 -0.6615 -0.9821
0.0001 -0.7036 -1.0141
0.0001 -0.6607 -1.1401
0.0001 -0.5457 -0.9972
0.0001 -0.6714 -0.9440
0.0001 -0.9144 -1.0676
>> parameters
parameters =
normal: 6.1000
dow: 1
Num: 209
ionconc: 1
Desnoise: 100
Time: 0.0080
hotmol: 0
dex: 1
elay: 11250
Des: 16
Max: 1500
Offset: 0
Mode: 1
Note that you can use IMPORTDATA, but you have to specifiy the delimiter (a tab in your case) and the number of header lines:
conent = importdata( 'Raw.txt', '\t', 22 ) ;
>> content
content =
data: [12x3 double]
textdata: {22x3 cell}
colheaders: {'X' 'Wide' 'Resolution'}
Hope it helps!
Thanks for the response. The problem is my header lines are not fixed. They keep on changing. How to make it automated.
Can you provide a few files with different headers?
If you always had the 'Resolution' column header though, you could do something like:
% - Read file content.
content = fileread( 'Raw.txt' ) ;
% - Split on 'Resolution' column header.
content = strsplit( content, 'Resolution' ) ;
% - Parse array.
data = textscan( content{2}, '%f %f %f' ) ;
data = horzcat( data{:} ) ;
% - Parse parameters.
tokens = regexp( content{1}, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
Karthik
Karthik on 17 Jul 2015
Edited: Karthik on 17 Jul 2015
Hi, i cannot predict the number of lines in the file before the data starts. I attached couple of files. I dont understand why the code mentioned by Azzi is not working for smaller files. The same code is working well for files that have more than 30 data points. Thanks
Ok, the code in my comment above (with the split) should work. I almost never use IMPORTDATA to be honest, because I don't know what it does internally (see note *) and I never know whether it will work later if my format evolves a little. So I always develop parsers specifically for what I need to do, and I implement some flexibility if/when needed.
Note *: you can see how IMPORTDATA was implemented by typing
open importdata
in the command window. But again, you can reverse engineer this version to understand a bit better, but it is difficult to know how it will evolve in the future.
Karthik
Karthik on 17 Jul 2015
Edited: Karthik on 17 Jul 2015
Thanks. I got it. How can I specify the numbers in parameters as input in my next line of the program.
The file number? You can build a string using SPRINTF, for example
for fileId = 1 : 10
filename = sprintf( 'Raw%d.txt', fileId )
content = fileread( filename ) ;
...
end
But you can also use DIR to get e.g. all text files, whatever their name:
D = dir( '*.txt' ) ;
for fileId = 1 : length( D )
filename = D(fileId).name ;
content = fileread( filename ) ;
...
end
This would catch Raw.txt for example, which has no number.
I just re-read your comment and realized that I misunderstood. The variable parameters is a struct, a variable with fields:
>> class( parameters )
ans =
struct
Its fields can be dot-indexed. If you want to address/index the field elay for example, you do it this way:
>> parameters.elay
ans =
11250
This is a numeric field of type/class double:
>> class( parameters.elay )
ans =
double
so you can compute with it:
>> parameters.elay / 10
ans =
1125

Sign in to comment.

More Answers (0)

Asked:

on 16 Jul 2015

Edited:

on 17 Jul 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!