How to read data from a text file based on the title?

I have two text files. First one is "Sample01.txt". Which is like this..
And the second one is "Sample02.txt". Which is like this..
How can i read data for T[K] and NC7H16_molef[-] and so for O2 and CO in both the text files by reading their title?
Actually i am trying to merge the corresponding data from two text files into one matrix. So there will be three matrix for NC7H16, O2 and CO.
For example for NC7H16 the desired matrix is like this ..
N.B. The row number for NC7H16_molef[-], O2 and CO are not fixed.

4 Comments

Could you attach the 2 sample text files?
Please, find the text files as attached.
The "Sample01.txt" in your image looks like an [n x 2] matrix but the one you attached is [m x 4]. Could you clarify which one is correct? Also, your "Sample02.txt" attachment is png, not txt, so it's not readable.
Sorry, my bad. I reattached them.

Sign in to comment.

 Accepted Answer

Instead of selectively reading a chosen subset of your text files (which I'm not even sure is possible), you can read in the entire text file and then pull out the columns of data you're interested in working with.
Your Sample02.txt file is easy to read using readtable(). The Sample01.txt file, however, contains multiple sub-tables which made it more difficult to work with. I used textscan() to read in the entire file and then split up the sub-tables.
Once both files are read and cleaned up, you can split and combine the data very easily to create new matrices. My example produces the matrix you described in your question. Note that because the data from one file has less rows than the other file, zeros were added as padding on the tail end of the data.
A second version of this code uses header names and is in the comments below.
% full path to files (I prefer working with full paths but you could just use filenames)
s1 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt');
s2 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt');
% Read sample02 (this one's simple)
s2Table = readtable(s2);
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1);
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}];
% Now separate each sub-table into it's own table
key = 'T[K]'; %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key); %logical index identifying header rows
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1table = cellfun(@(x)array2table(str2double(x(2:end, :))), s1cell, 'UniformOutput', false);
% before combining data, tables need to have the same number of rows.
% Here we pad the shorter table with trailing zeros.
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table);
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)))], s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% Now you can join columns to create a new array
% Here is an example
newMatrix = [s1Table{1}.Var1, s1Table{1}.Var2, s2Table.Var1, s2Table.Var2];
% Convert back to a table if you want to
% Note that the variable names can be pulled directly from the source files but will required some cleaning
% since some of them are invalid variable names.
newTable = array2table(newMatrix, 'VariableNames', {'TK_samp01', 'NC7H16_samp1', 'TK_samp2', 'NC7H16_samp2'})
And the result
newTable =
21×4 table
TK_samp01 NC7H16_samp1 TK_samp2 NC7H16_samp2
_________ ____________ ________ ____________
582 0.000621 576 0.00067913
600 0.000437 598 0.00044327
619 0.000294 619 0.00040821
639 0.000245 635 0.00042462
658 0.000247 651 0.00046359
680 0.000289 669 0.00052614
699 0.000322 690 0.00061422
720 0.000266 711 0.0007039
739 0.000301 729 0.0007637
768 0.000373 751 0.00078045
797 0.000174 783 0.00054059
830 3.54e-05 821 0.0001833
860 6.3e-06 860 7.9079e-05
880 3.17e-07 894 4.7073e-05
909 8.94e-08 931 3.1658e-05
0 0 969 2.3408e-05
0 0 1008 1.739e-05
0 0 1036 1.3428e-05
0 0 1074 8.4148e-06
0 0 1112 4.605e-06
0 0 1150 2.3107e-06

9 Comments

Thank you so much. It works like a charm! :)
Thanks for your effort and time. :)
Ans istead of using Var1, Var2, Var3 .... can it be possible to use more versatile approach?
Because, the column number in sample02.txt can vary.
As I mentioned in a comment toward the end of my solution code, your header variable names contain characters such as "[ ] - #" that conflict with Matlab variable names. So those headers need cleaned.
Below is a 2nd version of my solution. This one uses header names that are accepted by matlab. There were lots of changes.
Problematic characters are listed in 'badChar'; you can add more to that list if you have future problems that aren't covered here.
The erase() function removes those problematic characters from your headers so "#T[K]" becomes "TK". The header "NC7H16_molef[-]" becomes "NC7H16_molef", etc.
% full path to files (I prefer working with full paths but you could just use filenames)
s1 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt');
s2 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt');
% Read sample02 (this one's simple)
s2Table = readtable(s2);
% Now get header names; note that this cannot be done in readtable() due to irregular characters.
fid2 = fopen(s2);
s2Headers = strsplit(fgetl(fid2));
s2Headers(end) = [];
fclose(fid2);
% Clean header strings to remove invalid variable name characters
badChars = {'[', ']', '-', '#'}; % list all problematic characters in your headers here
s2Headers = erase(s2Headers, badChars); % Remove problematic chars; req. matlab 2016b
s2Table.Properties.VariableNames = s2Headers; % add clean headers to your table
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1);
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}];
% Now separate each sub-table into it's own table
key = 'T[K]'; %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key); %logical index identifying header rows
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1cellClean = cellfun(@(x)erase(x, badChars), s1cell, 'UniformOutput', false); % Clean headers (req. matlab 2016b)
s1table = cellfun(@(x)array2table(str2double(x(2:end, :)), 'VariableNames', x(1,:)), s1cellClean, 'UniformOutput', false);
% before combing data, tables need to have the same number of rows. Here we
% pad the shorter table with trailing zeros.
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table);
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)), 'VariableNames', x.Properties.VariableNames)], ...
s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% Now you can join columns to create a new table
% Here is an example
newMatrix = [s1Table{1}.TK, s1Table{1}.NC7H16_molef, s2Table.TK, s2Table.NC7H16];
Again appreciate your effort.
But what i wanted to say is, in the new merged matrix , As a sample I am getting only one matrix for NC7H16, whereas there are still O2 and CO. But this is an example where there are only 3 species. The real number won't be known. So can it be poosible to make a loop that it will generate all the merged matrix for all the header one by one?
It's prossible but it depends on how are your data organized. For exmple,
  • are there always 2 text files, sample01.txt and sample02.txt
  • if yes, are all additional species listed as a new column in sample02.txt? And are all additional species listed as a new table in sample01.txt?
  • Are the species names in sample01.txt the same as in sample02.txt except with "_molef[-]' tagged on the end?
01. Yes
02. new species added new column in sample02.txt. And in sample01.txt they will be all piled into two long columns.
03. The names are always same.
Atta, you're going to love this. I added 1 line to my previous code and I added a loop at the end that stores all tables in a cell array "allTables". The one line that was added is marked on the right with an arrow "% <-----".
Here's how the loop works. I loops through columns of sample02 starting at column 2 and then identifies the matching sample01 table based on species names. It then forms a new table for each column in sample02. If a match could not be found (or if >1 match is found), you'll get an error message.
% full path to files (I prefer working with full paths but you could just use filenames)
s1 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt');
s2 = fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt');
% Read sample02 (this one's simple)
s2Table = readtable(s2);
% Now get header names; note that this cannot be done in readtable() due to irregular characters.
fid2 = fopen(s2);
s2Headers = strsplit(fgetl(fid2));
s2Headers(end) = [];
fclose(fid2);
% Clean header strings to remove invalid variable name characters
badChars = {'[', ']', '-', '#'}; % list all problematic characters in your headers here
s2Headers = erase(s2Headers, badChars); % Remove problematic chars; req. matlab 2016b
s2Headers = strrep(s2Headers, 'TK', 'TK2'); % <----- replace TK with TK2 so table headers are unique
s2Table.Properties.VariableNames = s2Headers; % add clean headers to your table
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1);
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}];
% Now separate each sub-table into it's own table
key = 'T[K]'; %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key); %logical index identifying header rows
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1cellClean = cellfun(@(x)erase(x, badChars), s1cell, 'UniformOutput', false); % Clean headers (req. matlab 2016b)
s1table = cellfun(@(x)array2table(str2double(x(2:end, :)), 'VariableNames', x(1,:)), s1cellClean, 'UniformOutput', false);
% before combing data, tables need to have the same number of rows. Here we
% pad the shorter table with trailing zeros.
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table);
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)), 'VariableNames', x.Properties.VariableNames)], ...
s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% NEW SECTION BELOW
% Loop through columns of s2Table starting at col 2
nCols = size(s2Table,2); %number of columns
allTables = cell(nCols-1,1); %store all tables in cell array
s1Headers = cellfun(@(x)strjoin(x.Properties.VariableNames), s1table, 'UniformOutput', false);
for i = 2 : nCols
% match current column of s2Table with the correct table in s1
tblIdx = find(contains(s1Headers, s2Headers{i}));
if length(tblIdx) ~= 1
error('Tables could not be matched')
end
allTables{i-1} = [s1Table{tblIdx}(:,1), s1Table{tblIdx}(:,2), s2Table(:,1), s2Table(:,i)];
end
The cell array 'allTables' contains each table. So, allTables{3} is table 3.
K>> allTables
allTables =
3×1 cell array
{21×4 table}
{21×4 table}
{21×4 table}
K>>
K>> allTables{1}
ans =
21×4 table
TK NC7H16_molef TK2 NC7H16
___ ____________ ____ __________
582 0.000621 576 0.00067913
600 0.000437 598 0.00044327
619 0.000294 619 0.00040821
639 0.000245 635 0.00042462
658 0.000247 651 0.00046359
680 0.000289 669 0.00052614
699 0.000322 690 0.00061422
720 0.000266 711 0.0007039
739 0.000301 729 0.0007637
768 0.000373 751 0.00078045
797 0.000174 783 0.00054059
830 3.54e-05 821 0.0001833
860 6.3e-06 860 7.9079e-05
880 3.17e-07 894 4.7073e-05
909 8.94e+08 931 3.1658e-05
0 0 969 2.3408e-05
0 0 1008 1.739e-05
0 0 1036 1.3428e-05
0 0 1074 8.4148e-06
0 0 1112 4.605e-06
0 0 1150 2.3107e-06
Lastly, the for-loop makes the following assumptions that, if violated, will cause an error.
  • The TK column is always column 1 in both files.
  • The tables in sample01 will always have 2 columns and the 2nd column is the species data
  • All headers in both files are unique (no repeats within a file)
  • The species name in sample02 will always be part of a species name in column 2 of sample01.
  • All columns in sample02 after column 1 are species names (again again, without repeats)
This is really amazing !!
But do i have to type allTables{1},allTables{2}, allTables{3} for those tables??
It is not automatically throwing table for those species !
As the tables are created, they need to be stored somewhere. I choose to store them a cell array.
There are ways to later pull them from the cell array. It depends what you're doing with them.
If you explain how you plan to use the tables i can give you advice.
i didn't understand your last sentence.

Sign in to comment.

More Answers (0)

Categories

Find more on App Building in Help Center and File Exchange

Tags

Asked:

on 12 Feb 2019

Commented:

on 14 Feb 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!