Group data in specific potition (data redistribution)

I have a file with 19 columns. The 2, 8, 14 th column has letters. The 2nd has suffix *N, the 8th column has suffix *E, and the 14 column has suffix *Z.
after each column follow (horizontically) in turn four numbers belonging to each column. I want to make a code that takes first the elements of the column ending in N, then the elements of the column ending in E and finally the elements of the column ending in Z.
Ι am uploading the input file I have , and the output file I would like to create in order to understand what I want to do
Could you help me please?
Thank you in advance

Answers (2)

Hello Ivan
see below
I added a 3rd line in your input file (just copied the first line) so that the output I generate can be compared to your template (with 3 data lines)
So far I believe it works !
all the best
nb_of_elements = 5;
data1 = readcell('Input.xlsx');
data1_str = string(data1);
[m,n] = size(data1);
ind_N = endsWith(data1_str,'N'); % search for strings ending with N
out_N = find_my_data(data1,ind_N,nb_of_elements)
ind_E = endsWith(data1_str,'E'); % search for strings ending with E
out_E = find_my_data(data1,ind_E,nb_of_elements)
ind_Z = endsWith(data1_str,'Z'); % search for strings ending with Z
out_Z = find_my_data(data1,ind_Z,nb_of_elements)
% put the 3 together + 1st column and 1st row
out_NEZ = [data1(1,2:n); out_N out_E out_Z];
out_NEZ = [data1(:,1) out_NEZ];
writecell(out_NEZ, 'file_out.xlsx');
%%%%%%%%%%%%%%%%
function out = find_my_data(array,ind,nb_of_elements)
[m,n] = find(ind>0);
% sort in ascending order for m
[ms,inds] = sort(m);
ns = n(inds);
% create output cell array
out = [];
for ci = 1:length(ms)
mm = ms(ci);
nn = ns(ci);
out = [out ; array(mm,nn:nn+nb_of_elements)];
end
end

4 Comments

Thank but I have one more problem. Some of this files have filled only one cell of 4 cells. (I am uploading one of them in order to understand what I mean _ see 3rd and 4th row).
Command window shows me:
Error using horzcat
Dimensions of arrays being concatenated are not consistent.
Error in testcode (line 20)
out_NEZ = [data1(1,2:n); out_N out_E out_Z];
Could you help me to fix it please?
hello Ivan
I assume the output must respect this format ? (attached)
see my answer in the "answer" section

Sign in to comment.

hello back
so I modified the code to respect empty sections -
hope it helps !
% data1 = readcell('Input.xlsx');
data1 = readcell('Input_2.xlsx');
data1_str = string(data1);
[m,n] = size(data1);
ind_N = endsWith(data1_str(2:m,:),'N'); % search for strings ending with N
[mN,nN] = find(ind_N>0);
ind_E = endsWith(data1_str(2:m,:),'E'); % search for strings ending with E
[mE,nE] = find(ind_E>0);
ind_Z = endsWith(data1_str(2:m,:),'Z'); % search for strings ending with Z
[mZ,nZ] = find(ind_Z>0);
% nb of elements are defined by length of "zeros" after the last "1" in one
% of the ind_N / ind_E / ind_Z arrays
NN = max([max(nN) max(nE) max(nZ)]);
nb_of_elements = n - NN;
out_N = find_my_data(data1(2:m,:),ind_N,nb_of_elements);
out_E = find_my_data(data1(2:m,:),ind_E,nb_of_elements);
out_Z = find_my_data(data1(2:m,:),ind_Z,nb_of_elements);
% put the 3 together + 1st column and 1st row
out_NEZ = [data1(1,2:n); out_N out_E out_Z];
out_NEZ = [data1(:,1) out_NEZ];
writecell(out_NEZ, 'file_out11.xlsx');
%%%%%%%%%%%%%%%%
function array_out = find_my_data(array,ind,nb_of_elements)
[m,n] = find(ind>0);
% sort in ascending order for m
[ms,inds] = sort(m);
ns = n(inds);
% create output cell array
array_out = cell(size(array));
for ci = 1:length(ms)
mm = ms(ci);
nn = ns(ci);
array_out(mm,2:2+nb_of_elements) = array(mm,nn:nn+nb_of_elements);
end
array_out = array_out(:,2:2+nb_of_elements); % remove the unnecessary columns
end

7 Comments

@Mathieu NOEExcuse me , but I am dealing with a problem with the code. I run the cde and I have this input (I am attaching it ). Command window shows me the following
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
Error in group_empty (line 24)
out_NEZ = [data1(1,2:n); out_N out_E out_Z];
Could you help me please?
Could you help me please?
hello Ivan
sorry , I was out for a couple days
this is the fix :
data1 = readcell('test_v2.xlsx');
data1_str = string(data1);
[m,n] = size(data1);
ind_N = endsWith(data1_str(2:m,:),'N'); % search for strings ending with N
[mN,nN] = find(ind_N>0);
ind_E = endsWith(data1_str(2:m,:),'E'); % search for strings ending with E
[mE,nE] = find(ind_E>0);
ind_Z = endsWith(data1_str(2:m,:),'Z'); % search for strings ending with Z
[mZ,nZ] = find(ind_Z>0);
% nb of elements are defined by length of "zeros" after the last "1" in one
% of the ind_N / ind_E / ind_Z arrays
NN = max([max(nN) max(nE) max(nZ)]);
nb_of_elements = n - NN;
out_N = find_my_data(data1(2:m,:),ind_N,nb_of_elements);
out_E = find_my_data(data1(2:m,:),ind_E,nb_of_elements);
out_Z = find_my_data(data1(2:m,:),ind_Z,nb_of_elements);
% put the 3 together + 1st column and 1st row
out_NEZ = [data1(2:m,1) out_N out_E out_Z]; % updated code
out_NEZ = [data1(1,1:size(out_NEZ,2)); out_NEZ]; % updated code
writecell(out_NEZ, 'file_out_v2.xlsx');
%%%%%%%%%%%%%%%%
function array_out = find_my_data(array,ind,nb_of_elements)
[m,n] = find(ind>0);
% sort in ascending order for m
[ms,inds] = sort(m);
ns = n(inds);
% create output cell array
array_out = cell(size(array));
for ci = 1:length(ms)
mm = ms(ci);
nn = ns(ci);
array_out(mm,2:2+nb_of_elements) = array(mm,nn:nn+nb_of_elements);
end
array_out = array_out(:,2:2+nb_of_elements); % remove the unnecessary columns
end
Thank you, but Importing an another file (similar with the previous) I am delaing with this problem as I run your last version: command window shows me
Error using writecell (line 119)
Unsupported cell element of type 'missing'. Convert the element to numeric, logical string,
datetime, duration or categorical before writing.
Error in test_grouping (line 25)
writecell(out_NEZ, 'file_out_v2.xlsx');
What could I do in order to solve it?
hello Ivan
can you share that data file ?
yet another code modification
the issue is related to the fact that your new excel file would generate "missing" values when read with readcell, so I prefer to use readtable instead.
so this is a fix for that issue, but the code is not optimal IMO.
also , your file structure is always evolving ... making the coding a bit difficult.
data1 = readtable('group_test.xlsx')
data1C = table2cell(data1);
data1_str = string(data1C);
[m,n] = size(data1);
ind_N = endsWith(data1_str(2:m,:),'N'); % search for strings ending with N
[mN,nN] = find(ind_N>0);
ind_E = endsWith(data1_str(2:m,:),'E'); % search for strings ending with E
[mE,nE] = find(ind_E>0);
ind_Z = endsWith(data1_str(2:m,:),'Z'); % search for strings ending with Z
[mZ,nZ] = find(ind_Z>0);
% nb of elements are defined by length of "zeros" after the last "1" in one
% of the ind_N / ind_E / ind_Z arrays
NN = max([max(nN) max(nE) max(nZ)]);
nb_of_elements = n - NN;
out_N = find_my_data(data1C(2:m,:),ind_N,nb_of_elements);
out_E = find_my_data(data1C(2:m,:),ind_E,nb_of_elements);
out_Z = find_my_data(data1C(2:m,:),ind_Z,nb_of_elements);
% put the 3 together + 1st column and 1st row
out_NEZ = [data1C(2:m,1) out_N out_E out_Z]; % updated code
% out_NEZ = [data1C(1,1:size(out_NEZ,2)); out_NEZ]; % updated code
writecell(out_NEZ, 'out.xlsx');
%%%%%%%%%%%%%%%%
function array_out = find_my_data(array,ind,nb_of_elements)
[m,n] = find(ind>0);
% sort in ascending order for m
[ms,inds] = sort(m);
ns = n(inds);
% create output cell array
array_out = cell(size(array));
for ci = 1:length(ms)
mm = ms(ci);
nn = ns(ci);
array_out(mm,2:2+nb_of_elements) = array(mm,nn:nn+nb_of_elements);
end
array_out = array_out(:,2:2+nb_of_elements); % remove the unnecessary columns
end

Sign in to comment.

Asked:

on 8 Jan 2021

Commented:

on 26 Feb 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!