How to replace a particular string in text file

I have a problem related to efficiency, the code given below will replace the string and with '' an ' .' the code is working properly for small size text file ,but the main problem i am facing is that if there are approx 40,0000+ lines in text file then it is taking too much time that no one can't wait so please can anyone suggest me something different which run faster than this, Thanks in advance.
fid = fopen('input.txt','r');
f=fread(fid,'*char')';
fclose(fid);
f = regexprep(f,' ','');
f = regexprep(f,' ',' .');
fid = fopen('output.txt','w');
fprintf(fid,'%s',f);
fclose(fid);

 Accepted Answer

strrep is faster then regexprep
f = strrep(f,' ','');
f = strrep(f,' ',' .');

17 Comments

yes, it is much faster. Thanks a lot .
But somewhere in my code i am using code,
f = regexprep( f, '([^\n\r]+)', ' $1' );
f = regexprep(f,'\._.','');
f = regexprep(f,' \w*_|\,_',' ');
in this case how can i use 'strrep'?
strrep is much faster, but when it comes to complex parsing, regexprep is more powerful
then,i can't replace these line by another one?
f = regexprep( f, '([^\n\r]+)', '<s> $1' );
f = regexprep(f,' \w*_|\,_',' ');
What is the purpose of the code? Note that if you wanted to wrap all lines in <s> and </s> tags, you could probably achieve that with
f = ['', strrep(f, '\r\n', '\r\n'), ''] ;
(I changed the order of new line and carriage return, make the change back if it is inverted for any reason in your files)
The purpose my code is, I have text file which contain text like
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
first of all i want to wrap all sentences like
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT
and i am using the code that is 'f = regexprep(f,'\._.','</s>');' 'f = regexprep( f, '([^\n\r]+)', '<s> $1' );'
After that i want to extract the pos
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VBP NN DT
VBD JJ IN VBN NN VBP NN DT
for this i am using 'f = regexprep(f,' \w*_|\,_',' ');'
As you suggest, the code which given above
f = ['', strrep(f, '\r\n', '\r\n'), ''] ;
gives the result as
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VB the_DT website_NN ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
did_VBD new_JJ on_IN posted_VBN recipe_NN see_VBP website_NN the_DT ._.
Did you try with
f = ['', strrep(f, '\n\r', '\n\r'), ''] ;
as I suggest in my note in parenthesis? If it works, then you can just modify it so it removes '._.' as well.. assuming that there is a white space between the last . and the newline char..
f = ['', strrep(f, '._. \n\r', '\n\r'), ''] ;
yes.i notice that and also tried.
f = ['', strrep(f, '._. \r\n', '\r\n'), ''] ;
f = ['', strrep(f, '._. \n\r', '\r\n'), ''] ;
f = ['', strrep(f, '._.\n\r', '\r\n'), ''] ;
f = ['', strrep(f, '._.\n\r', '\r\n'), ''] ;
and i have also tried many others but they are not working it not replacing ._. with </s> and starting string <s>
I think it is reading whole file at a time and not recognizing the new line character
fid = fopen('input.txt','r');
f=fread(fid,'*char')';
fclose(fid);
I though that you were already matching '\r\n' and that was working but too slow.. was is not working? Could you attach one of these files to your question so I can try?
it is not slow i am trying these on 4 sentences(lines), and i am using the code which is given above.
Cedric
Cedric on 18 Oct 2013
Edited: Cedric on 18 Oct 2013
You wrote "the code is working properly", and later that you were using '\r' and '\n' in a regexp pattern. Was it just the first part which was working properly?
In any case, could you attach a file or a chunk of file to your question? It would be easier if I could experiment with your file, because then I can check directly what special chars you have in there and how to match them or use them in replacements. If you post a large enough file, I can also try to optimize. If you cannot attach the file to a public forum page, you can send it to me by email.
i have attached two file 'input.txt' and a 'code.txt' file, these are the copy of the file i am using currently to get the expected output.
Ok, try the following:
content = fileread( 'inputtextfile.txt' ) ;
newContent = strrep( content, '._. ', '' ) ;
newContent = strrep( newContent, char([13,10]), sprintf('</s>\r\n') ) ;
newContent = ['<s>', newContent, ''] ;
yes,it is working,
content = fileread( 'inputtextfile.txt' ) ;
newContent = strrep( content, '._. ', '' ) ;
newContent = strrep( newContent, char([13,10]), sprintf('</s>\r\n ') ) ;
newContent = ['<s> ', newContent,''] ;
newContent = strrep( newContent, ' ', '' ) ; % it will remove extra from the end of file
But, I think 'strrep' can't be used instead of 'rexexprep' in case of last step to get the output file:
*newContent = regexprep(newContent,' \w*_|\,_',' ');*
Cedric
Cedric on 19 Oct 2013
Edited: Cedric on 19 Oct 2013
So you want to remove (or replace with a white space) all prefixes like 'new_', 'on_', etc, as well as precisely the string ',_' ? If so, you can simplify the process by using STRREP for removing all ',_', which allows you to reduce the OR statement in the regexp pattern and keep only the first part ' \w*_'.
If it works, then you can profile REGEXP with other patterns which could apply as well to your case and be more efficient than '\w*', e.g. '\S*'.
yes, now i am using
f = regexp(f,'\S*_','split')
To get the following output,
VBD JJ IN VBN NN VB DT NN
VBD JJ IN VBN NN VB DT NN
These statement are much better.
Thanks for your efforts and for your valuable suggestions.
You're welcome!

Sign in to comment.

More Answers (0)

Categories

Asked:

on 18 Oct 2013

Commented:

on 19 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!