Load multiple text files one after another

Hello,
I have 30 text files with the format "Data1.x_pre" and "Data1.x_post" with x being numbered from 1-30. I want to load in and run through one MATLAB script. Currently, I am loading all of 30 text files separately which is quite tediuos. What I basically want to do is load the first text file ("Data1.1_pre"), run it through the script and collecting the output in a new matrix "Alldata". Then, I'd like to close the first text file and continue with the second file ("Data1.1_post") and insert the output into matrix "Alldata". After that: Closing "Data1.1_post", loading next text file ("Data1.2_pre") and so on.
My section to load and read these text files:
textFilename = ['Data1.1pre.txt'];
fid = fopen(textFilename);
block = 1;
newLine = fgets(fid);
while ~feof(fid);
EEGData{1,q}(1,block) = textscan(fid, '%f %f %f %f', 'CollectOutput', true);
fgets(fid);
block = block + 1;
end
Is there a way to solve these two problems (reading all text files and collecting all the outputs into one common matrix) with only little changes to my current code?
Thank you in advance!

 Accepted Answer

per isakson
per isakson on 18 Sep 2014
Edited: per isakson on 24 Sep 2014
&nbsp
After 17 comments it's time I add some working code. I still don't understand the intended stucture of the cell array, EEGData. The following code is my solution to OP's underlying problem as I understand it. My steps:
  • download of Data1.01_pre.txt and creation of another five files by copy&rename.
  • move of the six files to a separate folder, my_eeg_data. I think it helps to store experimental data in dedicated folders. However, the function, cssm, doesn't depend on a separate data folder.
  • creation of the function, cssm (attached). I find it easier to develop and use functions compared to scripts. I understand that OP wish to use the "filename" as key to reference the data. (The names of the files are not valid Matlab variable names.)
  • demo of the function, cssm
Demo:
>> eeg_data_pre = cssm( 'h:\m\cssm\my_eeg_data', 'Data*pre*.txt' )
>> eeg_data_post = cssm( 'h:\m\cssm\my_eeg_data', 'Data*post*.txt' )
>> eeg_data = cat( 1, eeg_data_pre, eeg_data_post );
>> keys(eeg_data)
ans =
Columns 1 through 3
'Data1.01_post.txt' 'Data1.01_pre.txt' 'Data1.02_post.txt'
Columns 4 through 6
'Data1.02_pre.txt' 'Data1.03_post.txt' 'Data1.03_pre.txt'
>> num = eeg_data( 'Data1.02_post.txt' );
>> whos num
Name Size Bytes Class Attributes
num 12x20 1920 double
eeg_data is an instance of containers.Map
&nbsp
cssm (I use body text to comment the code.)
The function comprises to parts: creation of a "list" of file names and a loop over all file names to read the data
function eeg_data = cssm( folder, glob )
dir is a robust way to retrieve the names of the files. It avoids the problem of reconstructing the names.
sad = dir( fullfile( folder, glob ) );
Possibly the "list" requires some manipulations. Here I sort it with respect to the number, which is between the dot and the underscore. However, with the leading zero as in this case it is not needed. And with the solution in this function it is not needed anyhow.
cac = regexp( {sad.name}, '(?<=\.)\d{1,2}(?=_)', 'match' );
num = str2double( [cac{:}] );
[~,ixs] = sort( num );
sad = sad( ixs );
len = length( sad );
The number of columns could have been retrieved from the text file.
nCol = 20;
Here I use containers.Map because it makes it possible to use the names of the files as key values. Had the names of the files been valid Matlab names I would have used a struct.
eeg_data = containers.Map( 'KeyType', 'char', 'ValueType', 'any' );
for jj = 1 : len
filespec = fullfile( folder, sad(jj).name );
[fid,msg] = fopen( filespec );
This test might not be justified since the names are created by dir
assert( not(fid==-1) ...
, 'MY:eeg_data:CannotOpenFile'...
, 'Failed to open "%s": "%s"' ...
, filespec, msg )
I read the data to a temporary variable. I use repmat to create the format string. It saves me from counting to twenty. The defaults takes care of the list delimiter.
buf = textscan( fid, repmat('%f', [1,nCol] )...
, 'CollectOutput' , true ...
, 'Headerlines' , 1 );
eeg_data( sad(jj).name ) = buf{:};
end
end
&nbsp
For me this was a worthwhile exercise with containers.Map. This is the first time I concatenate maps. Is that documented or does it go without saying?

More Answers (2)

Pinga
Pinga on 19 Sep 2014
Thank you for the link. Unfortunately, this was not quite the solution I was looking for, since the script contains "data-relative" functions.
Is there no function at all which I could put at the end of the script telling MATLAB to clear the workspace and to start again from the beginning with the next .txt-file? I don't want to copy the script 30 times below the existing.
Any help would be appreciated.

6 Comments

per isakson
per isakson on 20 Sep 2014
Edited: per isakson on 20 Sep 2014
  • Your "answer" should have been a comment to my answer, not a new answer.
  • Did you read the FAQ entry, How can I process a sequence of files?. Why not use that approach?
  • "data-relative" functions" &nbsp what does that refer to?
  • I don't see the rational behind the cell array EEGData, why not use something simpler
  • What is q in EEGData{1,q}(1,block)?
  • The while-loop looks unnecessarily complicated to me
  • "function ... telling MATLAB to clear the workspace and to start again" &nbsp With my words that means "clear, goto". AFAIK: there is no such function.
  • Sorry about that, I should have.
  • Yes, I have but I haven't come far with adopting it to my case:
for k = 01:30
% Create a text file name, and read the file.
textFileName = ['Data1.1pre.txt' num2str(k) '.txt'];
if exist(textFileName, 'Data1.1pre.txt')
fid = fopen(textFileName, 'rt');
textData = fread(fid);
fclose(fid);
else
fprintf('File %s does not exist.\n', textFileName);
end
end
I get an error ("Error using exist The optional second input to exist must be 'var', 'builtin', 'class', 'dir' or 'file'.". Additionally, what is not clear to me, is where/how I've to place "k". Is it meant to place it this way?:
textFileName = ['Data1.kpre.txt' num2str(k) '.txt'];
  • In the script, I'm am searching for specific values in a variable and let MATLAB make a matrix with their corresponding row number. With help of this "row number-cell", further calculation are being made.
  • I have 100 seperate EEG data, each lasting for 10 seconds. Therefore, I don't want to mix them but keep them seperately. This is way I want them to put in different blocks {1,1}{1,1:100}.
  • I've started using/learning MATLAB this summer and am still struggling with it and therefore might write my scripts not very efficiently - how would you make this while-loop less complicated?
  • That's too bad. But if the approach you gave me the link to would work in my case, I'm happy to use it.
Thank you for any help!
The error message is about
exist( textFileName, 'Data1.1pre.txt' )
There is a problem with the syntax of the function, exist. &nbsp I think it pays off to read the documentation carefully. If you find it difficult to understand run examples. There are many in the Matlab documentation.
Doc says that the function, exist, returns a whole number.
Doc says: if expression, statements, end evaluates an expression, and executes a group of statements when the expression is true.
Matlab allows all sorts of obscure constructs, e.g.
>> if 't', disp('true'), else, disp('false'), end
true
>> if 'false', disp('true'), else, disp('false'), end
true
>> if 17, disp('true'), else, disp('false'), end
true
IMO: Do not start using constructs like these and you will make fewer mistakes in the future.
If your data files are in the Matlab search path, the preferred syntax is
if exist( textFileName, 'file' ) == 2
if they are not, you must provide the full filespec
if exist( fullfile( folder_name, textFileName ), 'file' ) == 2
per isakson
per isakson on 21 Sep 2014
Edited: per isakson on 21 Sep 2014
  • Do you have the data files in a special folder?
Thank you for your help. The exist error message has disappeared now. And no, the data files aren't in a special folder. The script and the data files are all in the same folder.
I've added a working function to my answer.

Sign in to comment.

To format your filenames:
for filenumber = 1:30
for postpre = {'post', 'pre'}
filename = sprintf('Data1.%d_%s.txt', filenumber, postpre{1});
if exist(filename, 'file')
%open file and process
else
%report error
end
end
end
if the for postpre = ... looks too weird for you, you can replace it with
postpre = {'post', 'pre'};
for postpreindex = 1:2
filename = sprintf('Data1.%d_%s.txt', filenumber, postpre{postpreindex});

14 Comments

IMO: sprintf makes more readable code compared to concatenation. I make fewer mistakes when using sprintf.
Thank you. To read in all these .txt-files into a cell array as before, I used the second part of my previous script:
for filenumber = 1:30
for postpre = {'post', 'pre'}
filename = sprintf('Data1.%d_%s.txt', filenumber, postpre{1});
if exist(filename, 'file') %open file and process
else %report error
end
end
end
for q = 1
fid = fopen(filename);
block = 1
newLine = fgets(fid);
while ~feof(fid);
EEGData{1,q}(1,block) = textscan(fid, '%f %f %f %f', 'CollectOutput', true);
fgets(fid);
block = block + 1;
end
end
I now get an error message for fgets:
newLine = fgets(fid);
which says "Invalid file identifier. Use fopen to generate a valid file identifier." As mentioned above, I have all text text-files and the script in the same working directory.
fopen does not issue a warning or error when it fails, it returns -1, which later causes the error you see.
An experiment
>> fid = fopen('abcd.xyz');
>> fid
fid =
-1
where abcd.xyz does not exist.
I guess there is some problem with the value of filename and that the value of fid is -1.
Thank you, yes, this indeed the case. I've been debugging for quite a while now and I haven't been able to identify the problem. I used
[fid,error] = fopen(filename)
and it returned to me "No such file or directory" which, IMO, is weird, since I assigned a file to filename in line 3.
How did you do that? Evidently the file just plain does not exist. Are you sure you used fullfile() and prepended the folder?
filename = fullfile(folder, baseFileName);
Check with exist
if ~exist(filename, 'file')
errorMessage = sprintf('This file does not exist:\n%s', filename)
uiwait(warndlg(warningMessage));
end
Professional programming requires that one asserts that fid>=3 or something.
filespec = fullfile( folder_spec, filename );
[fid,msg] = fopen( filespec );
assert( fid >= 3, ...., msg )
I usually approach the problem something like
sad = dir( fullfile( folder_spec, 'Data*.txt' ) ) ;
for jj = 1 : length( sad )
filespec = fullfile( folder_spec, sad(jj).name );
fid = fopen( filespec );
etc.
end
Thank you both for your reply - I really appreciate it! Unfortunately, I'm not quite sure how to apply your recommendations (sorry, I'm a MATLAB beginner and slightly overtaxed...). Should I insert the exist-code after the first for-loop? With what do I need to replace "folder_spec" and "sad" and for what do they stand for?
Maybe the code attached is useful. To attach one of the 30 .txt-file isn't possible since the file size is too big. But basically, one .txt-file consist of 20 columns and 25'000 rows.
I assume that the three first lines of the text-files looks something like
colh1, colh2, colh3, colh4
11,12,13,14
21,22,23,24
now I see that there are 20 columns. Proposal: post the12 first lines of one text-file.
Pinga, I'm not sure why you accepted an answer that doesn't seem to answer your question.
Anyway, take time to understand the code I gave in my answer, do not just paste it. The two comments:
%open file and process
%report error
were intended for you to replace with whatever you intended to do.
As it is, your code does nothing useful. Your copy of my loop generates filename, test if they exist and then does nothing but loop back. You then start your loop (which only has one iteration), try to open the last file generated by my loop. That file may not exist since you didn't do anything with the %report error line.
Here's what you should have done:
for filenumber = 1:30
for postpre = {'post', 'pre'}
filename = sprintf('P4.%d_%s.txt', filenumber, postpre{1});
if exist(filename, 'file')
%open file and process
fid = fopen(filename);
block = 1
newLine = fgets(fid); %did you mean fgetl(fid) ?
while ~feof(fid)
EEGData{filename}(block) = textscan(fid, '%f %f %f %f', 'CollectOutput', true);
fgets(fid);
block = block + 1;
end
else
%report error
fprintf('File %s does not exist.\n', filename);
end
end
end
Thank you once again! Attached the first 12 rows of the .txt-file "Data1.01_pre". As mentioned above there are 30 .txt-file in total. There is one "pre" and one "post" .txt-file for each of the 15 measured subjects. Each .txt-file contains of 70 measurements of a duration of 10 seconds. So, the first column of each .txt-file shows the timepoint (starts with 0 sec and ends with 10 sec.). This is repeated 70 times. EEGData{1,1} should therefore contain a 15x70 cell.
Guillaume, your code doesn't generate a fid=-1 anymore but gives another error message: "Scalar cell array indices required in this assignment." for the line
EEGData{filename}(block) = textscan(fid, '%f %f %f %f', 'CollectOutput', true);
In addition, in the single digit .txt-files (e.g. "Data1.01_pre"), I put a zero ahead (01, 02, 03,...). MATLAB tells me, that "Data1.01_pre" - "Data1.09_pre" don't exist (same with the post-files).
As I want to keep the format EEGData{1,1}{1,1:70} and to store the "pre"- and "post"-data in two different cells (EEGData{1,1} and EEGData{1,2}, I have made some small changes (see attached file).
The scalar cell array merror has disappeared but instead it shows me another error message: "The right hand side of this assignment has too few values to satisfy the left hand side." for the line
EEGData{1,1}{filename,block} = textscan(fid, '%f %f %f %f', 'CollectOutput', true); % there are actually 20 '%f, I shortened this part
As I said, take time to understand what the code does. If necessary, use the debugger to step through it and look at the states of the variable.
The EEG line does not work because I made a mistake. filename was meant to be filenumber. It still wouldn't have worked properly though because I'd forgotten about the postpre loop. Use a counter instead, same as you've done with block to iterate into your EEG cell array or have a 2-d array based on filenumber and a post pre counter, e.g:
postpre = {'pre_all', 'post_all'};
for filenumber = 01:30
for postpreindex = 1:2
filename = sprintf('Ddata1.%d%s.txt', filenumber, postpre{postpreindex});
if exist(filename, 'file') %open file and process
fid = fopen(filename);
block = 1
newLine = fgets(fid); %did you mean fgetl(fid) ?
while ~feof(fid)
EEGData{filenumber,postpreindex}{block} = textscan(fid, '%f %f %f %f', 'CollectOutput', true);
fgets(fid);
block = block + 1;
end
else
%report error
fprintf('File %s does not exist.\n', filename);
end
end
end
Having a loop of the form:
for postpre = {'pre_all'}
is pointless. You're just iterating over one element. It's the same as writing:
postpre = 'pre_all'
"which says "Invalid file identifier" &nbsp I guess the problem is the format string in
filename = sprintf('Data1.%d_%s.txt', filenumber, postpre{1});
The name of the file, which you uploaded is
Data1.01_pre.txt
To get the leading zero, i.e .01_ not .1_, the format string should be
'Data1.%02d_%s.txt'
Thank you both for your help - you've been very patient and helpful! There still were some issues and I've been debugging the last two days to get rid of them but it seems now that this part of the script works.
Thank you again and have a great day!

Sign in to comment.

Categories

Asked:

on 18 Sep 2014

Commented:

on 26 Sep 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!