Fastest way to add string

3 views (last 30 days)
Alan
Alan on 23 Sep 2014
Answered: Alan on 23 Sep 2014
I'm dealing with very large csv files. I'm having little to no problem with speed in reading from them with readtable. However, I have found (and reported) a bug in readtable where a blank value in the first column (the line starts with the delimiter, e.g. ',') throws off all the data. A lot of my files have blank values in the first column (due to the way the equipment I'm using records the data)
So, I have to "preprocess" the files and look for these blank columns in the csv file. The most efficient method I've found is the following:
fprintf('Reading File...');
ch = fread(YGID, [1,chunksize], 'int8=>char');
%cch = char(ch');
fprintf('Getting Number Of Lines...');
nol = sum(ch == sprintf('\n')); % number of lines
fprintf('%i\n',nol);
fprintf('Replacing final commas...\n');
cch = regexprep(ch,',(\r|\n)+','$1');
clear ch;
fprintf('Getting line locations...\n');
hlocs = regexp(cch,'\n');
fprintf('Writing Header File...\n');
fwrite(HDID,cch(hlocs(2)+1:hlocs(10)));
fprintf('Replacing Initial Commas\n');
ccch = regexprep(cch,'(\r|\n)+,','$1 ,');
YGID is the file pointer from an fopen. Note that I'm purposely making new variables (not memory efficient) as I have 16 GB of RAM available on my machine and I find making a completely new variable is faster. However, once the file is of a sufficient size (>20 MB, I have some over 200MB), even this becomes very slow. The line it is getting stuck on is "ccch = regexprep(cch,'(\r|\n)+,','$1 ,');" I suspect it's because with each additional space being added (there are hundreds of thousands) it's reallocating memory for the variable. I've tried to "preallocate" the new variable with "ccch = blanks(chunksize + nol);" before it and it didn't seem to make a difference.
Is there any more efficient way to do this task?

Accepted Answer

Alan
Alan on 23 Sep 2014
Found my own answer. strrep is surprisingly faster than regexprep I had to add a conditional to check the OS, though:
if ispc || isunix
fpatt = sprintf('\n,');
rpatt = sprintf('\n, ');
else
fpatt = sprintf('\r,');
rpatt = sprintf('\r, ');
end
ccch = strrep(cch,fpatt,rpatt);

More Answers (0)

Categories

Find more on Functions in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!