Read and perform for loop on very large text files

11 views (last 30 days)
I have to process a very large text files with millions of lines. I have attached an example. So, the file has 9 lines of text, which need to be excluded. In the example text I have two types of data ( Column 2) 3 and 4. I need to read the file and save each time step and pefrom an operation in each of those steps until all the time steps. The operation gives me an output in each of those steps and I save them in a matrix. Below is my code. For now I am manually saving those 3 and 4 types in seperate text files. I am reading those two files in S = dir('F.*'); and T = dir('Nareplicate.*');. But it seems saving them in millions of small text files will be very slow process. I can rather Matlab read it directly from the primary text file ( see attached file). One issue is that my text dump files will be really large with millions of line, what will be the fastest way to read and save during the runtime in Matlab?
clc
clear all
S = dir('F.*');
T = dir('Nareplicate.*');
N = numel(S);
Q1 = zeros(N,4);
for k = 1:N
Nadump = dlmread(T(k).name, ' ', 9, 0).';
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = size(Nadump,2);
L2 = size(Fdump,1);
Y = zeros(L2,L1,3);
for m = [3 4 5]
Y(:,:,m-2) = Fdump(:,m)-Nadump(m,:);
end
S1 = min(sqrt(sum(Y.^2,3))/10,[],2);
N1 = nnz(S1 < 0.28);
N2 = nnz(S1 < 0.55);
N3 = nnz(S1 < 0.78);
Q1(k,:) = [N1 N2-N1 N3-N2 L2-N3];
end
W1 = sum(Q1,1)/N/125;
bar(diag(W1),'stacked', 'BarWidth', 1)

Answers (1)

Mathieu NOE
Mathieu NOE on 2 Feb 2022
hello
why not work line by line (from the data file) and do what your code is supposed to do at each step ?
here a example :
(I have checked that I could get all 5250 lines of valid data - concatenated in data_all) ; but you have to work on each line at each iteration step
fid = fopen('NaF.dump.txt');
ind = 1;
line = fgetl(fid); % #get first line
flag = 0;
data_all = [];
count = 0;
while 1
line = fgetl(fid);
if ~ischar(line), break, end % #read until end of file
if strcmp(line,'ITEM: ATOMS id type xs ys zs') % this means the next lines are valid data lines
flag = 1;
count = count + 1; % valid data block counter (for info)
end
if strcmp(line,'ITEM: TIMESTEP')% this means end of block of valid data lines
flag = 0;
end
if flag ~= 0 % valid data line are processed below
% do my / your code on each line data HERE
data = str2num(line);
% concatenate data (if needed / for fun)
c = [data_all; data];
end
ind = ind + 1;
end
fclose(fid);
%

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!