
use of readtable to read unstructured data
    5 views (last 30 days)
  
       Show older comments
    
    Marzuki Marzuki
 on 21 Sep 2021
  
    
    
    
    
    Commented: Mathieu NOE
      
 on 22 Sep 2021
            I want to read the data in the attached file, using readtbale given below:
fname1='sample.txt';
opt = detectImportOptions(fname1, 'headerlines', 0);
data = readtable(fname1, opt);
However, the result is not correct because there is a shift of data structure (see below)

I plan get following structure:

Is there any best way to do it? thank you for help
0 Comments
Accepted Answer
  Mathieu NOE
      
 on 21 Sep 2021
        why not this ? simply using the import tool 
you can ignaore the second variable if it's not meaningfull
result : 

code : 
out = importfile('sample.txt', 1, 16);
function sample = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
%   SAMPLE = IMPORTFILE(FILENAME)
%   Reads data from text file FILENAME for the default selection.
%
%   SAMPLE = IMPORTFILE(FILENAME, STARTROW, ENDROW)
%   Reads data from rows STARTROW through ENDROW of text file FILENAME.
%
% Example:
%   sample = importfile('sample.txt', 1, 16);
%
%    See also TEXTSCAN.
% Auto-generated by MATLAB on 2021/09/21 18:32:46
%% Initialize variables.
if nargin<=2
    startRow = 1;
    endRow = inf;
end
%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%8s%3s%5s%8s%5s%5s%5s%5s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this code. If an error occurs for a different file, try regenerating the code from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', '', 'WhiteSpace', '', 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block=2:length(startRow)
    frewind(fileID);
    dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', '', 'WhiteSpace', '', 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
    for col=1:length(dataArray)
        dataArray{col} = [dataArray{col};dataArrayBlock{col}];
    end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
    raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[3,4,5,6,7,8,9]
    % Converts text in the input cell array to numbers. Replaced non-numeric text with NaN.
    rawData = dataArray{col};
    for row=1:size(rawData, 1)
        % Create a regular expression to detect and remove non-numeric prefixes and suffixes.
        regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
        try
            result = regexp(rawData(row), regexstr, 'names');
            numbers = result.numbers;
            % Detected commas in non-thousand locations.
            invalidThousandsSeparator = false;
            if numbers.contains(',')
                thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
                if isempty(regexp(numbers, thousandsRegExp, 'once'))
                    numbers = NaN;
                    invalidThousandsSeparator = true;
                end
            end
            % Convert numeric text to numbers.
            if ~invalidThousandsSeparator
                numbers = textscan(char(strrep(numbers, ',', '')), '%f');
                numericData(row, col) = numbers{1};
                raw{row, col} = numbers{1};
            end
        catch
            raw{row, col} = rawData{row};
        end
    end
end
% Convert the contents of columns with dates to MATLAB datetimes using the specified date format.
try
    dates{1} = datetime(dataArray{1}, 'Format', 'HH:mm:ss', 'InputFormat', 'HH:mm:ss');
catch
    try
        % Handle dates surrounded by quotes
        dataArray{1} = cellfun(@(x) x(2:end-1), dataArray{1}, 'UniformOutput', false);
        dates{1} = datetime(dataArray{1}, 'Format', 'HH:mm:ss', 'InputFormat', 'HH:mm:ss');
    catch
        dates{1} = repmat(datetime([NaN NaN NaN]), size(dataArray{1}));
    end
end
dates = dates(:,1);
%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [3,4,5,6,7,8,9]);
rawStringColumns = string(raw(:, 2));
%% Make sure any text containing <undefined> is properly converted to an <undefined> categorical
idx = (rawStringColumns(:, 1) == "<undefined>");
rawStringColumns(idx, 1) = "";
%% Create output variable
sample = table;
sample.VarName1 = dates{:, 1};
sample.VarName2 = categorical(rawStringColumns(:, 1));
sample.VarName3 = cell2mat(rawNumericColumns(:, 1));
sample.VarName4 = cell2mat(rawNumericColumns(:, 2));
sample.VarName5 = cell2mat(rawNumericColumns(:, 3));
sample.VarName6 = cell2mat(rawNumericColumns(:, 4));
sample.VarName7 = cell2mat(rawNumericColumns(:, 5));
sample.VarName8 = cell2mat(rawNumericColumns(:, 6));
sample.VarName9 = cell2mat(rawNumericColumns(:, 7));
% For code requiring serial dates (datenum) instead of datetime, uncomment the following line(s) below to return the imported dates as datenum(s).
% sample.VarName1=datenum(sample.VarName1);
end
2 Comments
More Answers (1)
  Jeremy Hughes
    
 on 21 Sep 2021
        
      Edited: Jeremy Hughes
    
 on 21 Sep 2021
  
      Based on the file attached, I suggest treating this as a fixed width file.
opts = detectImportOptions("sample.txt","FileType","fixedwidth")
readtable("sample.txt",opts)
See Also
Categories
				Find more on Spreadsheets in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

