How to read in multiple text files, each containing multiple lines/formats?

Thanks for reading and any support in advance. I am trying to read multiple text files in a folder for which I have the following code. The source of the data is this kaggle dataset -
files = dir(fullfile('archive.1/ChinaSet_AllFiles/','ChinaSet_AllFiles','ClinicalReadings','*.txt'));
N = length(files)
data = []
for i = 1:N
t = files(i).name;
formatspec = '%s %s%*[^\r\n]%*[\r\n]+%s';
file = fopen(fullfile(files(i).folder,t),'r');
A = textscan(file , formatspec, 'delimiter','\n');
data = [data; A];
It loops through the files fine but the files themselves have some data inconsistencies such as the following:
Usual Files:
femal 32yrs
Other files:
male 40yrs
PTB in the right upper field
I need three columns for each file such as - male, 40yrs, "PTB in the right upper field"
Can someone please support?

Answers (2)

dpb on 16 May 2021
Very difficult without example files to see the nuances, but the two records above I'd handle more like--
tData=[]; % empty table placeholder
for i = 1:numel(d) % iterate over dir struct
fid=fopen(fullfile(d(i).folder,d(i).name;),'r'); % open file in turn
data=textscan(fid,'%s,'delimiter','\n','whitespace',''); % read as cellstr() array by record
tmp=split(data(1)); % split the first record to sex, age fields
tData=[tData;table(tmp(1),tmp(2),data(2),'VariableNames',{'Gender','Age','Diagnosis'})]; % insert into table
The above assumes these are the only two record types and that they all follow the pattern of two fields on the first and one long record on second.

Mathieu NOE
Mathieu NOE on 16 May 2021
I have to admit that I am not a super pro of textscan , so someone else will probably make a better code than me , but this is what I tried and tested as a workaround
files = dir(fullfile('archive.1/ChinaSet_AllFiles/','ChinaSet_AllFiles','ClinicalReadings','*.txt'));
N = length(files)
data = []
% for i = 1:N
% t = files(i).name;
% formatspec = '%s %s%*[^\r\n]%*[\r\n]+%s';
% file = fopen(fullfile(files(i).folder,t),'r');
% A = textscan(file , formatspec, 'delimiter',' ');
% data = [data; A];
% fclose(file)
% end
for i = 1:N
t = files(i).name;
rr = readlines(fullfile(files(i).folder,t));
temp = split(rr{1});
% remove empty cells
empty = cellfun('isempty',temp)
temp(empty) = [];
% finally...
A = [temp' rr{2}];
data = [data; A];


