Problem to obtain all possible combinations of up to 50 variables or more.
6 views (last 30 days)
Show older comments
I am trying to construct a cell array to store all possible combinations without repetitions of a given (n) number. For instance if I have n=3, all possible combinations without repetitions are:
1
1 2
1 3
2 3
1 2 3
So, I use every possible combination to run a wrapper method (brutal force) for feature selection. For example, if I have 3 features (F1,F2,F3 as in the example above) and want to know what combination produces the highest accuracy using a SVM classifier. Then I use the combinations as indices to select my features and test them in the classifier. I have used this method without problems for up to 28 features, which produces 268,435,455 (2^n - 1) combinations.
Now, the problem is that using this method for more than 28 features (in my PC), Matlab produces an error:
Out of memory. Type HELP MEMORY for your options.
I have tried to optimize my code by reducing variables, using single instead of doubles but it still produces the same error. I initially need help to produce all possible combinations for up to 50 features (1.12 x 10^15 possible combinations) since I have tested the classifier with more than 1 thousand features at once and have not problems, but If you guys give me ideas of how to tackle this problem in a different way I will be very grateful as well.
If my question (or the definition of my problem) is not clear, I will try to rephrase it.
In advance, than you very much.
1 Comment
Answers (2)
Jan
on 19 May 2017
Edited: Jan
on 19 May 2017
You think you have to create an array containing 10e15 combinations. Assuming, that storing "50 features" requires 50 Bytes, you need 500'000 TerraByte of RAM. Well, I can assume that this gives an Out Of Memory error. But why do you think that the forum can solve this problem? The fact that the problem can be solved with 28 features does not allow to extrapolate the validity of the method for 50 elements, when combinations come into play.
As far as I understand, the problem you want to solve, is simply too large to match into the RAM. But of course you can determine the combinations one after the other. If you now want to do anything with the results, this will take a while: If the processing of any of the 1^15 vectors takes 1/1000 second, the total time will be about 31'000 years.
It is time to think about the problem again.
0 Comments
John D'Errico
on 19 May 2017
You want to generate all possible combinations of a set of numbers. In this case, N = 50 or so. There are 2^N possible combinations in that set. Actually, 2^N-1, since the empty set is not included in your list.
2^50 - 1
ans =
1.1259e+15
So roughly 1e15 combinations. Only a quadrillion of them.
It will take memory to store them all.
(2^50 - 1)*50*8
ans =
4.5036e+17
Perhaps as much as 450 quadrillion bytes of memory, and usually at least twice as much if you intend to use that array in any useful way. Do you have 900000 terabytes of memory installed in your computer? Lets see, nearly an exabyte? Those new RAM chips really must be dense then. I need to get a few to replace the ones in my computer.
Seriously, unless you work for the NSA and you routinely do your work on one of the largest computers in the world, I seriously doubt that this is an option. Even then, if you asked to use a major super computer to do something like this, they might laugh you out of the building. If you did work for the NSA, you should already know why this task is a bad idea.
In fact, brute force as you seem to be doing here is always a bad idea when carried to extremes. And if you did find a way to solve this for N=50, tomorrow you would want to do it with N=100.
It looks to me as if you want to do some sort of brute force optimization. So then use a binary integer nonlinear programming algorithm (an optimization tool), to pick the one node that optimizes whatever objective you want to optimize. That is just a guess. It is almost never a good idea to do optimization by brute force inspection.
The point is, DON'T generate all possible combinations. You don't really need them anyway. We have been told nothing about WHY you want to do this silly task, so there is no way I can offer a truly rational alternative. Regardless, you need to reformulate what you are doing.
0 Comments
See Also
Categories
Find more on Portfolio Optimization and Asset Allocation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!