MATLAB Answers

Subsets of uncorrelated features

2 views (last 30 days)
Kais on 10 Jul 2021
Commented: Kais on 15 Jul 2021
Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.

Accepted Answer

Jeff Miller
Jeff Miller on 12 Jul 2021
Edited: Jeff Miller on 12 Jul 2021
N = 5;
R = rand(N); % We will ignore the lower triangular part of this array
rCutoff = 0.4;
% Make a cell array that holds all possible combinations of 2, 3, 4, ... features
combos = cell(0,0);
for i=2:N
iCombos = nchoosek(1:N,i);
for j=1:size(iCombos,1)
combos{end+1} = iCombos(j,:);
ncells = numel(combos);
% Check each cell to make sure that all of the pairwise correlations are
% less than the cutoff
qualifies = true(1,ncells);
for icell=1:ncells
features = combos{icell};
nfeatures = numel(features);
for ifeature=1:nfeatures-1
for jfeature=ifeature+1:nfeatures
iifeature = features(ifeature);
jjfeature = features(jfeature);
if abs(R(iifeature,jjfeature)) > rCutoff
qualifies(icell) = false;
Kais on 14 Jul 2021
I am looking for a pairwise uncorrleation between ALL features iofn the "feature" variable". The line:
sum(nonzeros(triu(abs(R(features,features)),1)) > rCutoff)
will result into a matrix of logical values showing which feature pairs are correlated and which are not (triu is only there to reduce the symmetric correlation matrix). If any of the values of the matrix is true (equivalently, sum of values of the matrix is different from zero), The subset unqualifies as an uncorrelated subset.

Sign in to comment.

More Answers (2)

Ive J
Ive J on 11 Jul 2021
Edited: Ive J on 12 Jul 2021
Let R be the pairwise correlation matrix:
N = 10;
R = rand(N);
R(logical(eye(N))) = 1;
for i = 1:size(R, 1) - 1
for j = i+1:size(R, 1)
R(j, i) = R(i, j);
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G));
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
indepSets{i} = features(M(:, i) ~= 0);
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures
You can find maximalCliques in FEX.
Kais on 15 Jul 2021
Never mind the last question. I figured the algorithm finds max cliques so it won't count subgraphs within larger subgraphs ([6, 7,9] won't show up because it's part of the larger subset [4, 6,7,9]), which significantly reduces the number of subsets.

Sign in to comment.

Image Analyst
Image Analyst on 11 Jul 2021
Would stepwise regression be of any help?
Otherwise, just make an N by N table of correlation coefficients by corelating every feature with every other feature.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!