How to read data from a text file based on the title?

Question

0 votes

I have two text files. First one is "Sample01.txt". Which is like this..

And the second one is "Sample02.txt". Which is like this..

How can i read data for T[K] and NC7H16_molef[-] and so for O2 and CO in both the text files by reading their title?

Actually i am trying to merge the corresponding data from two text files into one matrix. So there will be three matrix for NC7H16, O2 and CO.

For example for NC7H16 the desired matrix is like this ..

N.B. The row number for NC7H16_molef[-], O2 and CO are not fixed.

4 Comments
Show 2 older comments Hide 2 older comments

Adam Danz on 12 Feb 2019

The "Sample01.txt" in your image looks like an [n x 2] matrix but the one you attached is [m x 4]. Could you clarify which one is correct? Also, your "Sample02.txt" attachment is png, not txt, so it's not readable.

Mr. 206 on 12 Feb 2019

Sorry, my bad. I reattached them.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Adam Danz on 12 Feb 2019

Edited: Adam Danz on 12 Feb 2019

Open in MATLAB Online

0 votes

Instead of selectively reading a chosen subset of your text files (which I'm not even sure is possible), you can read in the entire text file and then pull out the columns of data you're interested in working with.

Your Sample02.txt file is easy to read using readtable(). The Sample01.txt file, however, contains multiple sub-tables which made it more difficult to work with. I used textscan() to read in the entire file and then split up the sub-tables.

Once both files are read and cleaned up, you can split and combine the data very easily to create new matrices. My example produces the matrix you described in your question. Note that because the data from one file has less rows than the other file, zeros were added as padding on the tail end of the data.

A second version of this code uses header names and is in the comments below.

% full path to files (I prefer working with full paths but you could just use filenames)
s1 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt'); 
s2 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt'); 
% Read sample02 (this one's simple)
s2Table = readtable(s2);  
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1); 
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}]; 
% Now separate each sub-table into it's own table
key = 'T[K]';                           %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key);  %logical index identifying header rows 
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1table = cellfun(@(x)array2table(str2double(x(2:end, :))), s1cell, 'UniformOutput', false); 
% before combining data, tables need to have the same number of rows.  
% Here we pad the shorter table with trailing zeros. 
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table); 
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)))], s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% Now you can join columns to create a new array
% Here is an example
newMatrix = [s1Table{1}.Var1, s1Table{1}.Var2, s2Table.Var1, s2Table.Var2];
% Convert back to a table if you want to
% Note that the variable names can be pulled directly from the source files but will required some cleaning
% since some of them are invalid variable names. 
newTable = array2table(newMatrix, 'VariableNames', {'TK_samp01', 'NC7H16_samp1', 'TK_samp2', 'NC7H16_samp2'})

And the result

newTable =
  21×4 table
    TK_samp01    NC7H16_samp1    TK_samp2    NC7H16_samp2
    _________    ____________    ________    ____________
       582         0.000621         576       0.00067913 
       600         0.000437         598       0.00044327 
       619         0.000294         619       0.00040821 
       639         0.000245         635       0.00042462 
       658         0.000247         651       0.00046359 
       680         0.000289         669       0.00052614 
       699         0.000322         690       0.00061422 
       720         0.000266         711        0.0007039 
       739         0.000301         729        0.0007637 
       768         0.000373         751       0.00078045 
       797         0.000174         783       0.00054059 
       830         3.54e-05         821        0.0001833 
       860          6.3e-06         860       7.9079e-05 
       880         3.17e-07         894       4.7073e-05 
       909         8.94e-08         931       3.1658e-05 
         0                0         969       2.3408e-05 
         0                0        1008        1.739e-05 
         0                0        1036       1.3428e-05 
         0                0        1074       8.4148e-06 
         0                0        1112        4.605e-06 
         0                0        1150       2.3107e-06 

9 Comments
Show 7 older comments Hide 7 older comments

Adam Danz on 12 Feb 2019

Open in MATLAB Online

As I mentioned in a comment toward the end of my solution code, your header variable names contain characters such as "[ ] - #" that conflict with Matlab variable names. So those headers need cleaned.

Below is a 2nd version of my solution. This one uses header names that are accepted by matlab. There were lots of changes.

Problematic characters are listed in 'badChar'; you can add more to that list if you have future problems that aren't covered here.

The erase() function removes those problematic characters from your headers so "#T[K]" becomes "TK". The header "NC7H16_molef[-]" becomes "NC7H16_molef", etc.

% full path to files (I prefer working with full paths but you could just use filenames)
s1 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt'); 
s2 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt'); 
% Read sample02 (this one's simple)
s2Table = readtable(s2);  
% Now get header names; note that this cannot be done in readtable() due to irregular characters.
fid2 = fopen(s2); 
s2Headers = strsplit(fgetl(fid2)); 
s2Headers(end) = []; 
fclose(fid2); 
% Clean header strings to remove invalid variable name characters
badChars = {'[', ']', '-', '#'};        % list all problematic characters in your headers here
s2Headers = erase(s2Headers, badChars); % Remove problematic chars; req. matlab 2016b
s2Table.Properties.VariableNames = s2Headers;  % add clean headers to your table
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1); 
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}]; 
% Now separate each sub-table into it's own table
key = 'T[K]';                           %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key);  %logical index identifying header rows 
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1cellClean = cellfun(@(x)erase(x, badChars), s1cell, 'UniformOutput', false);  % Clean headers (req. matlab 2016b)
s1table = cellfun(@(x)array2table(str2double(x(2:end, :)), 'VariableNames', x(1,:)), s1cellClean, 'UniformOutput', false); 
% before combing data, tables need to have the same number of rows.  Here we
% pad the shorter table with trailing zeros. 
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table); 
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)), 'VariableNames', x.Properties.VariableNames)], ...
    s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% Now you can join columns to create a new table
% Here is an example
newMatrix = [s1Table{1}.TK, s1Table{1}.NC7H16_molef, s2Table.TK, s2Table.NC7H16];

Adam Danz on 13 Feb 2019

Edited: Adam Danz on 13 Feb 2019

Open in MATLAB Online

Atta, you're going to love this. I added 1 line to my previous code and I added a loop at the end that stores all tables in a cell array "allTables". The one line that was added is marked on the right with an arrow "% <-----".

Here's how the loop works. I loops through columns of sample02 starting at column 2 and then identifies the matching sample01 table based on species names. It then forms a new table for each column in sample02. If a match could not be found (or if >1 match is found), you'll get an error message.

% full path to files (I prefer working with full paths but you could just use filenames)
s1 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample01.txt'); 
s2 =  fullfile('C:\Users\janedoe\Documents\MATLAB\', 'sample02.txt');
% Read sample02 (this one's simple)
s2Table = readtable(s2);  
% Now get header names; note that this cannot be done in readtable() due to irregular characters.
fid2 = fopen(s2); 
s2Headers = strsplit(fgetl(fid2)); 
s2Headers(end) = []; 
fclose(fid2); 
% Clean header strings to remove invalid variable name characters
badChars = {'[', ']', '-', '#'};        % list all problematic characters in your headers here
s2Headers = erase(s2Headers, badChars); % Remove problematic chars; req. matlab 2016b
s2Headers = strrep(s2Headers, 'TK', 'TK2');  % <----- replace TK with TK2 so table headers are unique
s2Table.Properties.VariableNames = s2Headers;  % add clean headers to your table
% Read sample01; it will be read in as a cell array of strings
fid = fopen(s1); 
s1str = textscan(fid, '%s %s', 'HeaderLines', 1);
fclose(fid);
s1str = [s1str{:}]; 
% Now separate each sub-table into it's own table
key = 'T[K]';                           %the start of each row that identifies a new sub-table
headerIdx = contains(s1str(:,1), key);  %logical index identifying header rows 
s1cell = splitapply(@(x){x}, s1str, cumsum(headerIdx));
s1cellClean = cellfun(@(x)erase(x, badChars), s1cell, 'UniformOutput', false);  % Clean headers (req. matlab 2016b)
s1table = cellfun(@(x)array2table(str2double(x(2:end, :)), 'VariableNames', x(1,:)), s1cellClean, 'UniformOutput', false); 
% before combing data, tables need to have the same number of rows.  Here we
% pad the shorter table with trailing zeros. 
padRowsNeeded = size(s2Table,1) - cellfun(@(x)size(x,1), s1table); 
s1Table = cellfun(@(x,y)[x;array2table(zeros(y,size(x,2)), 'VariableNames', x.Properties.VariableNames)], ...
    s1table, num2cell(padRowsNeeded),'UniformOutput', false);
% NEW SECTION BELOW
% Loop through columns of s2Table starting at col 2 
nCols = size(s2Table,2);                    %number of columns
allTables = cell(nCols-1,1);                %store all tables in cell array
s1Headers = cellfun(@(x)strjoin(x.Properties.VariableNames), s1table, 'UniformOutput', false); 
for i = 2 : nCols
    % match current column of s2Table with the correct table in s1
    tblIdx = find(contains(s1Headers, s2Headers{i})); 
    if length(tblIdx) ~= 1
        error('Tables could not be matched')
    end
    allTables{i-1} = [s1Table{tblIdx}(:,1), s1Table{tblIdx}(:,2), s2Table(:,1), s2Table(:,i)];
end

The cell array 'allTables' contains each table. So, allTables{3} is table 3.

K>> allTables
allTables =
  3×1 cell array
    {21×4 table}
    {21×4 table}
    {21×4 table}
K>> 
K>> allTables{1}
ans =
  21×4 table
    TK     NC7H16_molef    TK2       NC7H16  
    ___    ____________    ____    __________
    582      0.000621       576    0.00067913
    600      0.000437       598    0.00044327
    619      0.000294       619    0.00040821
    639      0.000245       635    0.00042462
    658      0.000247       651    0.00046359
    680      0.000289       669    0.00052614
    699      0.000322       690    0.00061422
    720      0.000266       711     0.0007039
    739      0.000301       729     0.0007637
    768      0.000373       751    0.00078045
    797      0.000174       783    0.00054059
    830      3.54e-05       821     0.0001833
    860       6.3e-06       860    7.9079e-05
    880      3.17e-07       894    4.7073e-05
    909      8.94e+08       931    3.1658e-05
      0             0       969    2.3408e-05
      0             0      1008     1.739e-05
      0             0      1036    1.3428e-05
      0             0      1074    8.4148e-06
      0             0      1112     4.605e-06
      0             0      1150    2.3107e-06

Lastly, the for-loop makes the following assumptions that, if violated, will cause an error.

The TK column is always column 1 in both files.
The tables in sample01 will always have 2 columns and the 2nd column is the species data
All headers in both files are unique (no repeats within a file)
The species name in sample02 will always be part of a species name in column 2 of sample01.
All columns in sample02 after column 1 are species names (again again, without repeats)

Mr. 206 on 14 Feb 2019

This is really amazing !!

But do i have to type allTables{1},allTables{2}, allTables{3} for those tables??

It is not automatically throwing table for those species !

Adam Danz on 14 Feb 2019

As the tables are created, they need to be stored somewhere. I choose to store them a cell array.

There are ways to later pull them from the cell array. It depends what you're doing with them.

If you explain how you plan to use the tables i can give you advice.

i didn't understand your last sentence.

Sign in to comment.

How to read data from a text file based on the title?

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

9 Comments
Show 7 older comments Hide 7 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

How to read data from a text file based on the title?

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

9 Comments Show 7 older comments Hide 7 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

9 Comments
Show 7 older comments Hide 7 older comments