Fastest Possible Way to convert a table containing Only 2 strings to numbers
Show older comments
Hello all,
I have an NxM array of strings. Most of the cells are empty. however, of those that are not empty, they contain only 2 values, 'Het' or 'Hom' for hetero vs homozygous.
I want to:
1. Create an NxM matrix 2. Put a 1 into the matrix at a position (i,j) for every instance of the string 'Het' in position (i,j) in the array 3. Put a 2 into the matrix at a position (i,j) for every instance of the string 'Het' in position (i,j) in the array (the number one and two should be a number, not a string)
EXAMPLE:
Array = 'Het' '' '' 'Het' '' 'Hom'
'' 'Het' 'Hom' '' '' ''
would become
Matrix = [1 0 0 1 0 2; 0 1 2 0 0 0] (could be NaN instead of 0, that doesnt matter to me)
Now, I can think of a bunch of work around for this.
I could call strfind ton of times. I could use uint8, then divide that output by set number or something etc.
But all the workarounds I can think of are slow.
What is the fastest way to make this conversion on a very large array?
I do have parallel computing toolbox in principle, but I have never used it so I would need clear instructions...
Thank you very much in advance!
Accepted Answer
More Answers (1)
Walter Roberson
on 6 Feb 2014
0 votes
No-one knows the fastest possible way. It is going to depend upon your exact computer details including the processor, architecture, amount of primary cache, kind of connection to your secondary cache, secondary cache speed, third-level cache speed, amount of RAM, which version of MATLAB you are using, what else is running on your system, and other like details. And upon doing a lot of analysis about the most efficient possible algorithm for the task, considering the processor details such as pre-fetch, cache-line size,speculative execution, hyperthreading, out-of-order execution, pipelining....
The fastest possible method might involve sending the data over to an FPGA and having it do the calculations. Or perhaps you could do even better with custom ASICs.
You should also be considering writing a mex routine to do the analysis.
One thing that I would point out is that if the string is empty you know the output immediately (0), and if it is not empty then you only need to check the second character: if it is 'o' then the result is 2 and otherwise the result is 1. You do not need ismember() or to process other parts of the string.
2 Comments
Sarutahiko
on 7 Feb 2014
Walter Roberson
on 7 Feb 2014
Figuring out "the fastest possible way" to do something is often much much more than 90% of the time involved in a problem. Consider, for example, the amount of human effort that has gone into finding "the fastest possible way" to factor large numbers.
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!