Searching the contents of a file with items in a second file

I have two files - File1.txt and File2.txt. I'm searching the first column of File 1 with items from File2.
File1.txt (tab-delemited):
Ex_efxb 0.0023
MSeef 2.3000
F_ecjc 0.3338
MWEEI -0.111
DDAIij 17.777
File2.txt
MSeef
F_ecjc
Required output: The following were found in File 1:
MSeef 2.3000 F_ecjc 0.3338
I have the following script which is not giving the right output, instead it says items not found. clear all; clc;
fid = fopen('File1.txt')
if fid ==-1
disp('Reaction_Flux file open not successful')
else
% Read characters and numbers into separate elements
% in a cell array
rxn_flux = textscan(fid,'%s %f');
A = rxn_flux{1};
B = rxn_flux{2};
len1 = size(A);
closeresult1 = fclose(fid);
if closeresult1 == 0
disp('Reaction_Flux file close successful')
else
disp('Reaction_Flux file close not successful')
end
end
fid = fopen('File2.txt')
if fid ==-1
disp('Flux file open not successful')
else
% Read characters and numbers into separate elements
% in a cell array
[rxn] = textread('File2.txt','%s');
%rxn = textscan(fid,'%s');
C = rxn;
len2 = size(C);
closeresult1 = fclose(fid);
if closeresult1 == 0
disp('Reaction_Flux file close successful')
else
disp('Reaction_Flux file close not successful')
end
end
found = 0;
for i = 1:len1
RXNFLUX = strcmpi(A(i),C);
if RXNFLUX
found = 1;
break
end
end
if found
data = [];
for k = 1:len1
for m = 1:len2
data = [data; A(k),C(m)]
%print to file
fprintf(data,'%s\t %d\n','Out.txt');
end
end
else
disp('rxn not found')
end
Can anyone help? Thanks

 Accepted Answer

The explanation: strcmpi does a case-insensitive match (if you want case-sensitive, change to strcmp ) and returns a logical array with value 1 for every element of A that matches C{i} and 0 for every element that does not. The vertical line means "or", so if an element of RXNFLUX is already 1, it stays 1; but if it is 0 and strcmpi finds a match, it is reset to 1. For example, initially RXNFLUX is all zeros. After the first iteration, it is [0; 1; 0; 0; 0] because MSeef is the second element in A. After the second iteration, the output of the search is [0; 0; 1; 0; 0] because F_ecjc is the third element of A. Combining this with RXNFLUX using "or" gives [0; 1; 1; 0; 0].
The line
iLines = find(RXNFLUX);
finds the indices of all the elements of RXNFLUX that are equal to 1.

More Answers (4)

You've got one mistake:
len2 = length(B);
should be
len2 = length(C);
That must have crept in when you were changing size to length.
The file reading is fine (except that it would be better to use
len1 = length(A)
len2 = length(B)
so that len1 and len2 are scalars. However, there are a lot of problems with the processing, including
  1. accessing the cell array A using A(i) instead of A{i},
  2. testing for string matches ( found=1 ) before you are finished searching, and
  3. not opening the file Out.txt.
Note also that if you search A for elements of C instead of the reverse, you get the indices you need for the next part.
Here is code that will do the analysis:
fid = fopen('Out.txt','w');
fprintf(fid,'The following were found in File 1:\n')
RXNFLUX = false(size(A));
for i = 1:len2
RXNFLUX = RXNFLUX | strcmpi(C{i},A);
end
if any(RXNFLUX)
iLines = find(RXNFLUX);
for i=1:length(iLines)
fprintf(fid,'%s\t %d\n',A{iLines(i)}, B(iLines(i)));
end
else
disp('rxn not found')
end
fclose(fid);
Hi Andrew,
I apologise unreservedly for my very late reply. I have not been been able to log in to the site until some minutes ago. Thank you for the solution. I have connected your code section to the file reading of mine and somehow I got some errors:
??? Index exceeds matrix dimensions.
Error in ==> sollutionRxnFluxFiles at 47
RXNFLUX = RXNFLUX | strcmpi(C{i},A);
Here 's the latest code:
fid = fopen('File1.txt')
if fid ==-1
disp('Reaction_Flux file open not successful')
else
% Read characters and numbers into separate elements
% in a cell array
rxn_flux = textscan(fid,'%s %f');
A = rxn_flux{1};
B = rxn_flux{2};
len1 = length(A);
closeresult1 = fclose(fid);
if closeresult1 == 0
disp('Reaction_Flux file close successful')
else
disp('Reaction_Flux file close not successful')
end
end
fid = fopen('File2.txt')
if fid ==-1
disp('Flux file open not successful')
else
% Read characters and numbers into separate elements
% in a cell array
[rxn] = textread('File2.txt','%s');
%rxn = textscan(fid,'%s');
C = rxn;
len2 = length(B);
closeresult1 = fclose(fid);
if closeresult1 == 0
disp('Reaction_Flux file close successful')
else
disp('Reaction_Flux file close not successful')
end
end
fid = fopen('Out.txt','w');
fprintf(fid,'The following were found in File 1:\n')
RXNFLUX = false(size(A));
for i = 1:len2
RXNFLUX = RXNFLUX | strcmpi(C{i},A);
end
if any(RXNFLUX)
iLines = find(RXNFLUX);
for i=1:length(iLines)
fprintf(fid,'%s\t %d\n',A{iLines(i)}, B(iLines(i)));
end
else
disp('rxn not found')
end
fclose(fid);
Thanks!
Now, it works beautifully! Many thanks, Andrew.
Do you mind explaining this bit of code?
for i = 1:len2
RXNFLUX = RXNFLUX | strcmpi(C{i},A);
end

Categories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!