How to find a particular string in a text file
    45 views (last 30 days)
  
       Show older comments
    
Hi, I have this text file ( Sample: SAM-3_Round-1_C-16_Ref_Spot-1 Psi= 43.309, Delta=105.412, No Soution
Sample: SAM-3_Round-1_C-16_Ref_Spot-2 Psi= 43.284, Delta=105.465, No Soution
Sample: SAM-3_Round-1_C-8_Spot-1 Psi= 43.266, Delta=107.861, No Soution
Sample: SAM-3_Round-1_C-8_Spot-2 Psi= 43.287, Delta=107.872, No Soution
Sample: SAM-3_Round-1_C-10_Spot-1 Psi= 43.269, Delta=106.890, No Soution
Sample: SAM-3_Round-1_C-10_Spot-2 Psi= 43.269, Delta=106.849, No Soution
Sample: SAM-3_Round-1_C-12_Spot-1 Psi= 43.267, Delta=106.872, No Soution
Sample: SAM-3_Round-1_C-12_Spot-2 Psi= 43.278, Delta=106.888, No Soution)
I want to search for the word 'C-8'(say), then I have to store the corresponding values of Psi and Delta (only the numeric values except special character). I tried "regexp" function for this but it didn't work. I am kind of beginner in MATLAB. It will be very helpful if someone can help me with this. Thank you.
Accepted Answer
  Stephen23
      
      
 on 7 Apr 2015
        
      Edited: Stephen23
      
      
 on 7 Apr 2015
  
      It wold likely be much faster and simpler to read in the data normally, and then perform the search and matching inside of MATLAB, rather than trying to perform this on the file (or some string) and convert it afterwards.
Try using textscan, which is intended for this kind of filereading. In this example I named the file 'temp.txt', and also attached it below:
>> fid = fopen('temp.txt','rt');
>> C = textscan(fid,'%*s%s%s%f%s%f%[^\n]', 'Delimiter',',= ', 'MultipleDelimsAsOne',true);
>> fclose(fid);
You will find the all of the data in C:
>> C{1}
ans = 
  'SAM-3_Round-1_C-16_Ref_Spot-1'
  'SAM-3_Round-1_C-16_Ref_Spot-2'
  'SAM-3_Round-1_C-8_Spot-1'
  'SAM-3_Round-1_C-8_Spot-2'
  'SAM-3_Round-1_C-10_Spot-1'
  'SAM-3_Round-1_C-10_Spot-2'
  'SAM-3_Round-1_C-12_Spot-1'
  'SAM-3_Round-1_C-12_Spot-2'
This means you can quickly search for any substring (e.g. 'C-8') in this cell of strings, and then obtain the corresponding values from the other arrays:
>> idx = ~cellfun('isempty',strfind(C{1},'C-8'))
idx =
   0
   0
   1
   1
   0
   0
   0
   0
We can then use this index to extract all of the corresponding values of Psi and Delta (i.e. those corresponding to 'C-8'):
>> Psi = C{3}(idx)
Psi =
     43.266
     43.287
>> Delta = C{5}(idx)
Delta =
     107.86
     107.87
Note that you can easily combine different index requirements too, here we match any of idx and those with 'Spot-1':
>> idy = idx & ~cellfun('isempty',strfind(C{1},'Spot-1'))
idy =
   0
   0
   1
   0
   0
   0
   0
   0
0 Comments
More Answers (2)
  Soumya Bhattacharya
 on 7 Apr 2015
        2 Comments
  Stephen23
      
      
 on 8 Apr 2015
				
      Edited: Stephen23
      
      
 on 9 Apr 2015
  
			Please use the comments for commenting on other answers or your own question. The Answers are supposed to be for actually answering the question. Note that the order of the answer can changes, so knowing what this "comment" applies to could be difficult in future.
  Stephen23
      
      
 on 8 Apr 2015
				
      Edited: Stephen23
      
      
 on 8 Apr 2015
  
			The file uploaded has one group of data over two lines like this:
Sample: SAM-3_Round-1_C-16_Ref_Spot-1
Psi= 43.309, Delta=105.412, No Solution
Sample: SAM-3_Round-1_C-16_Ref_Spot-2
Psi= 43.284, Delta=105.465, No Solution
...
and also has many empty lines, which means that a basic textscan operation will not work. Here are two alternatives: one without the empty lines (using textscan), and with with the empty lines (using regexp).
1: If the empty lines are removed, then this will read the values from the file:
fid = fopen('temp.txt','rt');
C = textscan(fid,'%*s%s\n%*[^\n]');
fclose(fid);
fid = fopen('temp.txt','rt');
D = textscan(fid,'%s%f%s%f%[^\n]\n%*[^\n]', 'Delimiter',',=', 'HeaderLines',1);
fclose(fid);
And again we match any substring and get the required values:
>> idx = ~cellfun('isempty',strfind(C{1},'C-8'));
>> Psi = D{2}(idx)
Psi =
   43.2660
   43.2870
   43.2880
   43.3160
   43.3060
   43.3230
>> Delta = D{4}(idx)
Delta =
  107.8610
  107.8720
  107.8860
  107.9160
  107.8950
  107.9300
2: If the data file really must contain those (almost) emtpy lines, then this regexp will identify them out for further parsing:
str = fileread('temp.txt');
xpr = '(.+?)=(.+?),';
tkn = regexp(str,['\s*\S+:\s+(\S+)\s+',xpr,xpr,'([^\n]+)'],'tokens');
tkn = vertcat(tkn{:});
and the useful outputs are:
names = tkn(:,1);
Psi = cellfun(@(s)sscanf(s,'%f'),tkn(:,3));
Delta = = cellfun(@(s)sscanf(s,'%f'),tkn(:,5));
  Soumya Bhattacharya
 on 8 Apr 2015
        5 Comments
  Stephen23
      
      
 on 18 Apr 2015
				
      Edited: Stephen23
      
      
 on 20 Apr 2015
  
			The string
xpr = '(.+?)=(.+?),'
is used twice in the regular expression that is used in the function regexp. It matches any characters separated by an equals sign.
This will help you to understand cellfun:
See Also
Categories
				Find more on Characters and Strings in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
