Clear Filters
Clear Filters

Info

This question is closed. Reopen it to edit or answer.

how to read from a text file and make a cell array from its data

1 view (last 30 days)
Hi all,
I have a text file that contains genes information like is a relations and part of relations between genes.
this text file contains paragraphs for each GOTerm (the GO term is a node which contains certain code number like: GO:0030436) which has: Go term ID ( the first line of each paragraph) and isa (if any)(starts with isa and end with end of isa) and partof Go Terms (if any)(starts with partof: and end with end of partof) a small sample from this text file is:
GO:0030436
isa:
GO:0034297
GO:0043936
GO:0048315
end of isa
partof:
GO:0042243
end of partof
genes:
end of genes
GO:0034297
isa:
end of isa
partof:
end of partof
genes:
end of genes
GO:0043936
isa:
GO:0001410
GO:0034300
GO:0034301
GO:0034302
GO:0034303
GO:0034304
end of isa
partof:
end of partof
genes:
end of genes
I need to read this text file and take the three data from it and make a cell matrix which has 3 columns as follows: map=
ID GoTerms is_a partof
GO:0030436 GO:0034297 GO:0042243
GO:0030436 GO:0043936 0
GO:0030436 GO:0048315 0
GO:0034297 0 0
GO:0043936 GO:0001410 0
GO:0043936 GO:0034300 0
GO:0043936 GO:0034301 0
GO:0043936 GO:0034302 0
GO:0043936 GO:0034303 0
GO:0043936 GO:0034304 0
note that if each Go term contains more than one is a or part of terms, I should repeat the Go term ID in order to make the cell matrix fit and well-organized.
any idea about how to make this code?
I tried to make a code but it doesn't work because I don't know how to take more than 1 isa and part of terms:
s={};
fid = fopen('Opt.pad'); % read from the certain text file
tline = fgetl(fid);
while ischar(tline)
s=[s;tline];
tline = fgetl(fid);
end
% find start and end positions of every [Term] marker in s
terms = [find(~cellfun('isempty', regexp(s, '\GO:\w*'))); numel(s)+1];
% for every [Term] section, run the previously implemented regexps
% and save the results into a map - a cell array with 3 columns
map = cell(0,3);
for term=1:numel(terms)-1
% extract single [Term] data
s_term = s(terms(term):terms(term+1)-1);
% match regexps
%To generate the GO_Terms vector from the text file
tok = regexp(s_term, '^(GO:\w*)', 'tokens');
idx = ~cellfun('isempty', tok);
GO_Terms=cellfun(@(x)x{1}, (tok(idx)));
%To generate the is_a relations vector from the text file
tok = regexp(s_term, '^isa: (GO:\w*)', 'tokens');
idx = ~cellfun('isempty', tok);
is_a_relations =cellfun(@(x)x{1}, (tok(idx)));
%To generate the part_of relaions vector from the text file
tok = regexp(s_term, '^partof: (GO:\w*)', 'tokens');
idx = ~cellfun('isempty', tok);
part_of_relations =cellfun(@(x)x{1}, (tok(idx)));
% map. note the end+1 - here we create a new map row. Only once!
map{end+1,1} = GO_Terms;
map{end, 2} = is_a_relations;
map{end, 3} = part_of_relations;
end
map( cellfun(@isempty, map) ) = {0};

Answers (0)

This question is closed.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!