Import data with double and strings

20 views (last 30 days)
Hello everybody,
I have a file (*.txt as attached). This file contains data perfilated with an equipment, displayed in a first column of date/time and four columns of numeric values (depth and water quality). Each profile is preceeded by a 6 lines heading (text data) and a given file may contain many profiles (always preceeded by a 6 lines heading).
Example:
Sensor Data Columns
Sensor Statistics (sensor #, Description, Statistic)
1 Depth Mean
2 OBS-1 (NTU) Mean
4 Temperature (deg C) Mean
5 Conductivity (mS/cm) Mean
16:15:35.0 07/15/2019 9.92 2422.9 28.77 6.73
16:15:35.5 07/15/2019 9.92 2422.9 28.77 6.73
16:15:36.0 07/15/2019 9.95 2422.9 28.77 6.72
16:15:36.5 07/15/2019 9.98 2422.9 28.77 6.73
16:15:37.0 07/15/2019 10.01 2422.9 28.77 6.72
16:15:37.5 07/15/2019 10.03 2422.9 28.77 6.74
16:15:38.0 07/15/2019 10.02 2422.9 28.77 6.74
The problem is that there are 5 spaces separating date/time and the column showing the depths, but only four spaces for depths greater than 9.9m. When I use:
OBS = importdata(AM2_15_1a.txt,' ',6);
It creates a strcut separating text and numeric data, which is exactly what I want, except that for depths > 9.9m, the lines are read as 0.1 (instead of 10.1) and the remaining "1" is put together with the date column which bugs the analysis.
As I have many files with many lines, is there any solution for fixing this problem? I tried pasting data to excel and then back to the log file; changing to csv and other types in order to standardise the columns separators, but when I do that, "importdata" doesn't work (it doesn't create a struct, but a cell containing only the heading.
Any sugestion? Thanks in advance
Best regards!
  2 Comments
Bob Thompson
Bob Thompson on 24 Sep 2019
Edited: Bob Thompson on 24 Sep 2019
Have you tried readtable, dlmread, or textscan?
I'm not sure if csvread will work well with the mixed format data, as it's really intended to open numeric data files.
Diego Fonseca
Diego Fonseca on 24 Sep 2019
I tried those functions, but unfortunately they don't work, as the first column conatins a time string which I need to keep. But thank you anyway for the help! ;)

Sign in to comment.

Accepted Answer

Adam Danz
Adam Danz on 24 Sep 2019
Edited: Adam Danz on 24 Sep 2019
This solution uses fileread() to read in the entire file and then splits the text by line using strsplit().
It then identifies the start of each profile by looking for the string 'Sensor Data Columns' and then marks the start and end of each segment of data based on those line numbers and the 6 total header lines per profile.
It then loops through each profile and read in the datetime values and the 4 measurements and stores them in tables saved in the cell array "T".
See comments for alternative approaches.
% set parameters
file = 'AM2_15_1a.txt'; %file (better if it includes full path)
profileStartKey = 'Sensor Data Columns'; %text that indicates start of key
nHeaderRows = 6; %number of header rows for each profile
% Get start/end line number for each profile
text = strtrim(strsplit(fileread(file),'\n')'); %separate entire file by line
text(cellfun(@isempty,text)) = []; % remove empty lines
profileStartLineNum = find(~cellfun(@isempty,regexp(text,sprintf('^%s',profileStartKey)))); %lines where header starts
startLineNum = profileStartLineNum + nHeaderRows; %lines where data starts
endLineNum = [profileStartLineNum(2:end)-1;numel(text)]; %lines where data ends
% Loop through each profile and create a table for each profile, stored in "T"
T = cell(numel(startLineNum),1);
for i = 1:numel(startLineNum)
profile = text(startLineNum(i):endLineNum(i)); %block of profile data (cell of chars)
dataStr = cellfun(@strsplit,profile,'UniformOutput',false); %char array broken up into sub-cells
% Extract datetime and measurements for each line within the profile
[dtTemp,dataTemp] = cellfun(@(x)deal(datetime([x{1},' ',x{2}],'InputFormat','HH:mm:ss.s MM/dd/yyyy'),...
str2double(x(3:end))),dataStr,'UniformOutput',false);
T{i} = [cell2table(dtTemp,'VariableNames',{'Date'}),array2table(cell2mat(dataTemp),'VariableNames',{'m1','m2','m3','m4'})];
end
T{n} is the table from profile #n.
  5 Comments
Adam Danz
Adam Danz on 24 Sep 2019
Edited: Adam Danz on 24 Sep 2019
Glad I could help! In case you started implementing the answer before my final edit, I recommend using the answer in its current form where tables are produced and saved within a cell array. It's by far the cleanest and simplest approach.
Strtrim() is used to get rid of the carriage return at the end of each line which is added by strsplit().
Diego Fonseca
Diego Fonseca on 24 Sep 2019
Ok! I'll get this edited code.
Thanks a lot again!

Sign in to comment.

More Answers (0)

Categories

Find more on Text Data Preparation in Help Center and File Exchange

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!