Cascading sort order that restarts count at each subsequent column

3 views (last 30 days)
I'm trying to sort a table by multiple columns (easy) and obtain a specialized sort order (not-so-easy).
Rather than a sortIndex vector I'm trying to get a sortIndex matrix where each column's count restarts from 1 as a preceding column changes. This is kind of like a set of cascading counters that restart back at 1 as any of the preceding counters tick forward.
Here's some data:
DAT = array2table([
"A" "a" "1"
"A" "a" "2"
"B" "a" "3"
"C" "a" "3"
"C" "b" "5"
"C" "a" "4"
"C" "b" "6"
"C" "b" "7"
], 'Var', ["ColA" "ColB" "ColC"])
And here's a clunky nested loop that achieves what I'm looking for in the specific case of having 3 columns to my data:
Omat = zeros(height(DAT),3);
[~,~,grpA] = unique(DAT.ColA,'stable');
Omat(:,1) = grpA;
for noA = 1:max(grpA)
maskA = grpA==noA;
subIndsA = find(maskA);
[~,~,subGrpB] = unique(DAT.ColB(maskA),'stable');
Omat(subIndsA,2) = subGrpB;
for noB = 1:max(subGrpB)
subMaskB = subGrpB==noB;
[~,~,subSubGrpC] = unique(DAT.ColC(subIndsA(subMaskB)),'stable');
Omat(subIndsA(subMaskB),3) = subSubGrpC;
end
end
And here's the output next to the original data:
Omat =
1 1 1 % "A" "a" "1"
1 1 2 % "A" "a" "2"
2 1 1 % "B" "a" "3"
3 1 1 % "C" "a" "3"
3 2 1 % "C" "b" "5"
3 1 2 % "C" "a" "4"
3 2 2 % "C" "b" "6"
3 2 3 % "C" "b" "7"
Can anyone think of a cleaner implementation that can generalize to N columns?

Answers (1)

KSSV
KSSV on 29 Jun 2022
DAT = array2table([
"A" "a" "1"
"A" "a" "2"
"B" "a" "3"
"C" "a" "3"
"C" "b" "5"
"C" "a" "4"
"C" "b" "6"
"C" "b" "7"
], 'Var', ["ColA" "ColB" "ColC"]) ;
for i = 1:3
DAT.(i) = grp2idx(DAT.(i)) ;
end
DAT
DAT = 8×3 table
ColA ColB ColC ____ ____ ____ 1 1 1 1 1 2 2 1 3 3 1 3 3 2 4 3 1 5 3 2 6 3 2 7
  3 Comments
KSSV
KSSV on 29 Jun 2022
Also:
DAT = array2table([
"A" "a" "1"
"A" "a" "2"
"B" "a" "3"
"C" "a" "3"
"C" "b" "5"
"C" "a" "4"
"C" "b" "6"
"C" "b" "7"
], 'Var', ["ColA" "ColB" "ColC"]) ;
for i = 1:3
[~,~,DAT.(i)] = unique(DAT.(i)) ;
end
DAT
DAT = 8×3 table
ColA ColB ColC ____ ____ ____ 1 1 1 1 1 2 2 1 3 3 1 3 3 2 5 3 1 4 3 2 6 3 2 7
Sven
Sven on 29 Jun 2022
Yep, grp2idx looks like a replicate of the 3rd output to unique (which is what I've always used for this type of operation) with grp2idx() just not having the same features to sort in chosen ordering/stability.
I'm still not able to produce the cascading/resetting set of tickers behavior without writing an explicit set of loops that match the number of columns... tricky one I think.

Sign in to comment.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!