How to find a particular string in a text file
Show older comments
Hi, I have this text file ( Sample: SAM-3_Round-1_C-16_Ref_Spot-1 Psi= 43.309, Delta=105.412, No Soution
Sample: SAM-3_Round-1_C-16_Ref_Spot-2 Psi= 43.284, Delta=105.465, No Soution
Sample: SAM-3_Round-1_C-8_Spot-1 Psi= 43.266, Delta=107.861, No Soution
Sample: SAM-3_Round-1_C-8_Spot-2 Psi= 43.287, Delta=107.872, No Soution
Sample: SAM-3_Round-1_C-10_Spot-1 Psi= 43.269, Delta=106.890, No Soution
Sample: SAM-3_Round-1_C-10_Spot-2 Psi= 43.269, Delta=106.849, No Soution
Sample: SAM-3_Round-1_C-12_Spot-1 Psi= 43.267, Delta=106.872, No Soution
Sample: SAM-3_Round-1_C-12_Spot-2 Psi= 43.278, Delta=106.888, No Soution)
I want to search for the word 'C-8'(say), then I have to store the corresponding values of Psi and Delta (only the numeric values except special character). I tried "regexp" function for this but it didn't work. I am kind of beginner in MATLAB. It will be very helpful if someone can help me with this. Thank you.
1 Comment
Stephen23
on 8 Apr 2015
Should "Soution" really be "Solution"?
Accepted Answer
More Answers (2)
Soumya Bhattacharya
on 7 Apr 2015
2 Comments
Please use the comments for commenting on other answers or your own question. The Answers are supposed to be for actually answering the question. Note that the order of the answer can changes, so knowing what this "comment" applies to could be difficult in future.
The file uploaded has one group of data over two lines like this:
Sample: SAM-3_Round-1_C-16_Ref_Spot-1
Psi= 43.309, Delta=105.412, No Solution
Sample: SAM-3_Round-1_C-16_Ref_Spot-2
Psi= 43.284, Delta=105.465, No Solution
...
and also has many empty lines, which means that a basic textscan operation will not work. Here are two alternatives: one without the empty lines (using textscan), and with with the empty lines (using regexp).
1: If the empty lines are removed, then this will read the values from the file:
fid = fopen('temp.txt','rt');
C = textscan(fid,'%*s%s\n%*[^\n]');
fclose(fid);
fid = fopen('temp.txt','rt');
D = textscan(fid,'%s%f%s%f%[^\n]\n%*[^\n]', 'Delimiter',',=', 'HeaderLines',1);
fclose(fid);
And again we match any substring and get the required values:
>> idx = ~cellfun('isempty',strfind(C{1},'C-8'));
>> Psi = D{2}(idx)
Psi =
43.2660
43.2870
43.2880
43.3160
43.3060
43.3230
>> Delta = D{4}(idx)
Delta =
107.8610
107.8720
107.8860
107.9160
107.8950
107.9300
2: If the data file really must contain those (almost) emtpy lines, then this regexp will identify them out for further parsing:
str = fileread('temp.txt');
xpr = '(.+?)=(.+?),';
tkn = regexp(str,['\s*\S+:\s+(\S+)\s+',xpr,xpr,'([^\n]+)'],'tokens');
tkn = vertcat(tkn{:});
and the useful outputs are:
names = tkn(:,1);
Psi = cellfun(@(s)sscanf(s,'%f'),tkn(:,3));
Delta = = cellfun(@(s)sscanf(s,'%f'),tkn(:,5));
Soumya Bhattacharya
on 8 Apr 2015
0 votes
5 Comments
For the second time: Please use the comments for commenting on other answers or your own question. The Answers are supposed to be for actually answering the question. Note that the order of the answer can change, so knowing what this "comment" applies to could be difficult in future.
Stephen23
on 9 Apr 2015
The most robust and likely fastest method to access this data is to read all of Psi and Delta once into some numeric arrays, and then access whichever elements you need later using some indexing. This will be much faster when you need to match and access multiple different groups, and also much more robust for any future code changes and new data operations.
>> Psi(idx)
Soumya Bhattacharya
on 9 Apr 2015
Soumya Bhattacharya
on 15 Apr 2015
The string
xpr = '(.+?)=(.+?),'
is used twice in the regular expression that is used in the function regexp. It matches any characters separated by an equals sign.
This will help you to understand cellfun:
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!