Read and perform for loop on very large text files

3 views (last 30 days)
I have to process a very large text files with millions of lines. I have attached an example. So, the file has 9 lines of text, which need to be excluded. In the example text I have two types of data ( Column 2) 3 and 4. I need to read the file and save each time step and pefrom an operation in each of those steps until all the time steps. The operation gives me an output in each of those steps and I save them in a matrix. Below is my code. For now I am manually saving those 3 and 4 types in seperate text files. I am reading those two files in S = dir('F.*'); and T = dir('Nareplicate.*');. But it seems saving them in millions of small text files will be very slow process. I can rather Matlab read it directly from the primary text file ( see attached file). One issue is that my text dump files will be really large with millions of line, what will be the fastest way to read and save during the runtime in Matlab?
clear all
S = dir('F.*');
T = dir('Nareplicate.*');
N = numel(S);
Q1 = zeros(N,4);
for k = 1:N
Nadump = dlmread(T(k).name, ' ', 9, 0).';
Fdump = dlmread(S(k).name, ' ', 9, 0);
L1 = size(Nadump,2);
L2 = size(Fdump,1);
Y = zeros(L2,L1,3);
for m = [3 4 5]
Y(:,:,m-2) = Fdump(:,m)-Nadump(m,:);
S1 = min(sqrt(sum(Y.^2,3))/10,[],2);
N1 = nnz(S1 < 0.28);
N2 = nnz(S1 < 0.55);
N3 = nnz(S1 < 0.78);
Q1(k,:) = [N1 N2-N1 N3-N2 L2-N3];
W1 = sum(Q1,1)/N/125;
bar(diag(W1),'stacked', 'BarWidth', 1)

Answers (1)

Mathieu NOE
Mathieu NOE on 2 Feb 2022
why not work line by line (from the data file) and do what your code is supposed to do at each step ?
here a example :
(I have checked that I could get all 5250 lines of valid data - concatenated in data_all) ; but you have to work on each line at each iteration step
fid = fopen('NaF.dump.txt');
ind = 1;
line = fgetl(fid); % #get first line
flag = 0;
data_all = [];
count = 0;
while 1
line = fgetl(fid);
if ~ischar(line), break, end % #read until end of file
if strcmp(line,'ITEM: ATOMS id type xs ys zs') % this means the next lines are valid data lines
flag = 1;
count = count + 1; % valid data block counter (for info)
if strcmp(line,'ITEM: TIMESTEP')% this means end of block of valid data lines
flag = 0;
if flag ~= 0 % valid data line are processed below
% do my / your code on each line data HERE
data = str2num(line);
% concatenate data (if needed / for fun)
c = [data_all; data];
ind = ind + 1;


Find more on Programming Utilities in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!