how to replace characters into digits

9 views (last 30 days)
i have to replace the each characters using the following digits s=ACGT
i have to replace as
'A' then 11
'C' then 00
'G' then 01
'T' then 10
  1 Comment
Jan
Jan on 27 Feb 2013
You write 11 without surrounding quotes. This isn't an accident, correctly? What do you want as output? How long is the input?

Sign in to comment.

Accepted Answer

Azzi Abdelmalek
Azzi Abdelmalek on 27 Feb 2013
Edited: Azzi Abdelmalek on 27 Feb 2013
clear
s='ACGT'
e=['11';'00';'01';'10']
in='AGCTAG' % Your initial data
out=in
for k=1:numel(s)
out=regexprep(out,s(k),e(k,:))
end
  3 Comments
Cedric Wannaz
Cedric Wannaz on 28 Feb 2013
STRREP is probably the best solution. It is a little slower than my solution, but more memory friendly.

Sign in to comment.

More Answers (3)

Jan
Jan on 28 Feb 2013
And a lookup table:
seqIn = 'ACGTTGCA'
table = repmat('0', 2, 255);
table(1, 'AT') = '1';
table(2, 'AG') = '1';
result = reshape(table(:, seqIn), 1, []);
Does this work? I do not have access to Matlab currently.
  3 Comments
Jan
Jan on 28 Feb 2013
@Azzi: Is this a typo?! Your function needs 14 secs with REGEXPREP and 0.007 secs with STRREP? Then my minor suggestion caused a speedup of a factor 1900? Wow, this would be the most efficient suggestion I ever gave. And it would be a strong hint to warn for the low efficiency of regexprep in this forum.

Sign in to comment.


Cedric Wannaz
Cedric Wannaz on 27 Feb 2013
Edited: Cedric Wannaz on 27 Feb 2013
If you need to process long sequences, you might want to optimize a little the efficiency.. a MEX-based solution would be most efficient I guess, but here is one way you could go using basic MATLAB only..
If you want to replace character 'A' with a numeric array (1,1) and so on, you can do:
aa = 'ACGT' ;
seq = 'AAGCTCAGGTTCA' ;
rep = zeros(2, max(aa), 'uint8') ;
rep(:,aa) = [1 0 0 1; 1 0 1 0] ;
result = reshape(rep(:,seq), 1, []) ;
This outputs the numeric array:
result =
1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1
If you want to replace character 'A' with characters '11' and so on, you can do:
aa = 'ACGT' ;
seq = 'AAGCTCAGGTTCA' ;
rep = zeros(2, max(aa), 'uint8') ;
rep(:,aa) = ['11'; '00'; '01'; '10'].' ;
result = char(reshape(rep(:,seq), 1, [])) ;
This outputs the string '11110100100011010110100011'.
EDIT: note that there are slightly different ways of doing this a little slower but with a more memory-friendly approach.
Cheers,
Cedric

Jos (10584)
Jos (10584) on 28 Feb 2013
Here is a simpler approach than looping over REGEXPREP or STRREP:
seqIn = 'ACGTTGCA' % input sequence
letters = 'ACGT' ;
symb = {'11','00','10','01'} ; % stored as a cell array of strings!
[tf,idx] = ismember(seqIn,letters) ;
seqOut = [symb{idx}]

Categories

Find more on Characters and Strings in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!