Reading a N columns table which sometimes have N+1 columns

1 view (last 30 days)
Hi Everyone,
I search a lot in the forum without findinding a solution. And pardon my English I am French :)
My problem is the following:
I have a log file with a lot of information. The log file could be up to 500Mo (even bigger sometimes). This log file is seperated in 2 main parts. A header part which is easy and fast to retrieve info line by line, and a data part.
The data part is composed of several tasks with a table of data with text before and after.
The data table structure is the following :
1 25.870 1.000 lhc 0.000 0.000 -20.140 24.449 1.42061512
2 25.870 1.000 lhc 0.000 0.000 -20.520 24.519 1.35075912
3 25.870 1.000 lhc 0.000 0.000 -20.951 24.582 1.28833133
4 25.870 1.000 lhc 0.000 0.000 -21.434 24.638 1.23204173
5 25.870 1.000 lhc 0.000 0.000 -21.958 24.689 1.18086597
6 25.870 1.000 lhc 0.000 0.000 -22.503 24.735 1.13498198
7 25.870 1.000 lhc 0.000 0.000 -23.148 24.781 1.08854135
8 25.870 1.000 lhc 0.000 0.000 -23.741 24.824 1.04623596
9 25.870 1.000 lhc 0.000 0.000 -24.244 24.863 1.00744521
10 25.870 1.000 lhc 0.000 0.000 -24.626 24.898 0.97159033
11 25.870 1.000 lhc 0.000 0.000 -24.876 24.932 0.93839531
12 25.870 1.000 lhc 0.000 0.000 -25.010 24.962 0.90779039
13 25.870 1.000 lhc 0.000 0.000 -25.057 24.990 0.87971152
14 25.870 1.000 lhc 0.000 0.000 -25.063 25.016 0.85443812
15 25.870 1.000 lhc 0.000 0.000 -25.072 25.038 0.83238819
16 25.870 1.000 lhc 0.000 0.000 -25.115 25.056 0.81396378
17 25.870 1.000 lhc 0.000 0.000 -25.220 25.070 0.79981872
18 25.870 1.000 lhc 0.000 0.000 -25.406 25.079 0.79060410
19 25.870 1.000 lhc 0.000 0.000 -25.611 25.078 0.79173920
20 25.870 1.000 lhc 0.000 0.000 -25.936 25.068 0.80208976
21 25.870 1.000 lhc 0.000 0.000 -26.373 25.047 0.82291587
22 25.870 1.000 lhc 0.000 0.000 -26.891 25.014 0.85576164
23 25.870 1.000 lhc 0.000 0.000 -27.437 24.969 0.90124460
24 25.870 1.000 lhc 0.000 0.000 -27.928 24.910 0.96048807
25 25.870 1.000 lhc 0.000 0.000 -28.254 24.835 1.03468974
26 25.870 1.000 lhc 0.000 0.000 -28.317 24.746 1.12353854
27 25.870 1.000 lhc 0.000 0.000 -28.070 24.642 1.22847010
28 25.870 1.000 lhc 0.000 0.000 -27.662 24.552 1.31821801
29 25.870 1.000 lhc 0.000 0.000 -27.101 24.452 1.41784749
30 25.870 1.000 lhc 0.000 0.000 -26.466 24.343 1.52711338
31 25.870 1.000 lhc 0.000 0.000 -25.820 24.224 1.64568471 **
As you can see in line 31, ** appears randomly as a 6th column. This is just a part of the data it goes for thousand of lines.
I am using the following code to retrieve those data. It works fine but I have performance problem with big file. It takes too long. Do you have a solution to help me improve performances ? My problem if the interruption cause by these **. The more I have the slower it gets.
Where fid is the identication of current file opened
% Store all the file in one variable in order to find line of begining and end of tasks and
% doing more quickly research
outFile = textscan(fid, '%s', 'Delimiter', '\n');
frewind(fid);
%Variable
taskSummaryFlagOn='No. goal weight pol. rot. att. 1. comp. 2. comp. residue';
taskSummaryFlagOff='Maximum of 1. component:';
% Find the rows where tasks results are
needle=strfind(outFile{1}, taskSummaryFlagOn);
rowsStartTask= find(~cellfun('isempty', needle));
needle=strfind(outFile{1}, taskSummaryFlagOff);
rowsEndTask= find(~cellfun('isempty', needle));
nbStartLine=0;nbEndLine=2;
%PreAllocation of the variable for better performances
dataSimu=cell(max(size(nbLineData)),9);
nbLineData=zeros(max(size(rowsStartTask)),1);% nbLineData will be to ensure that all the data are correctly retrieve
% Loop
for i=1:max(size(rowsStartTask))
nbLineData(i)=rowsEndTask(i)-rowsStartTask(i)-nbStartLine-nbEndLine;
dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f','headerlines', rowsStartTask(i));
% Exception when the line of data finish with **
while size(dataSimu{i,1},1)~=nbLineData(i)
fgetl(fid);% reading the final '**'
buff=textscan(fid,'%f %f %f %s %f %f %f %f %f');
for j=1:max(size(buff))
dataSimu{i,j}=[dataSimu{i,j};buff{:,j}];
end
end
frewind(fid);
end
If you need more information to understand my problem, I will provide you more details.
Thanks for the time you will spend to help me :)

Accepted Answer

Sindar
Sindar on 7 May 2020
Assuming you don't need the '**' info, you could try this solution from the fscanf examples which skips the remainder of the line after the data you expect:
dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f %*[^\n]'','headerlines', rowsStartTask(i));
  1 Comment
DocWalo
DocWalo on 11 May 2020
Thanks Sindar! Only with that I reduce of a 10 factor my computation time.
If you see other mean to accelerate computation don't hesitate to add some tips :)

Sign in to comment.

More Answers (0)

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!