Create new cell array based on entries of other cell arrays

2 views (last 30 days)
Hello, I have two cell arrays
arr1 = {'apple','cat1','cat2','cat3','berry','dog'};
arr2 = {'apple','cat','berry1','berry2','dog','elephant'};
arr3 = {'apple','elephant1','elephant2'};
and I'd like to create a fourth cell array which would be the following:
arr4 =
1×12 cell array
'apple' 'cat' 'cat1' 'cat2' 'cat3' 'berry' 'berry1' 'berry2' 'dog' 'elephant' 'elephant1' 'elephant2'
So I'd like to keep all of the matching elements once, but if there are slight differences between the list, I'd like to keep each of the different elements, and specifically if the non-numbered versions and the numbered versions both exist, I'd like the non-numberd version to appear before the numbered version, and stay in the same general order as they are given in the original arrays.
The three arrays will be similar, but sometimes certain elements will be included more than once and thus will be numbered. Additionally, as in the case of 'elephant', some elements will be included in one array but not in the other array. Finally, as in the case of arr3, some elements are not included at all. These can cause the two arrays to be of different sizes. The arrays will always follow the same order of categories: apples, then cats, then berries, then dogs, then elephants which is unfortunately not always alphabetical. We can be certain that each category will be included in arr4, but I'd like to avoid creating a category cell array if possible, however if it would simplify the problem, I could create the following:
categories = {'apple','cat','berry','dog','elephant'};
I'd like to avoid that however since my actual category array would contain several hundred elements and I don't know how I'd procedurally generate that array.
I've been exploring some of the other cell comparison methods which have been asked and answered in other threads, but I'm pretty lost as to how they might fit together to help me with this need. The worst case scenario is I just manually type out arr4 since arr1, 2, and 3 are known. I'd like to avoid this though, since the categories alone would be several hundred elements, this arr4 would end up having around 1000 elements, and I think that would not be a good use of time.
Thanks in advance for any help.
edit: If this is not feasible, I would accept help with creating arr4 as
arr4 =
1×12 cell array
'apple' 'cat1' 'cat2' 'cat3' 'berry' 'dog' 'cat' 'berry1' 'berry2' 'elephant' 'elephant1' 'elephant2'
which doesn't keep true to the order of categories, rather just takes all of the elements from arr1, then adds all of the dissimilar elements from arr2, then adds all of the dissimilar elements from arr3. I'm sure this would be easier to do but would make my task more difficult later on when I'm processing arr4 in other areas of my project.
edit 2: The numberes may be suffixed by something else as well. For example, there may be another cell entry representing the count of apples, so some arrays may contain:
{'apple','applec',etc};
{'apple1','apple1c','apple2','apple2c',etc}
  2 Comments
Emerson Butler
Emerson Butler on 26 Mar 2019
James Tursa's answer works perfectly to create arr4 as described above. But as a follow-up question, let's say one of my surveys returns as a type of arr3, but also includes data:
test_grid = {'apple','elephant1','elephant2';...
'0', '1', '1';...
'1', '1', '2'};
I'd then like to add this data to the larger array, arr4, and set all of the non-included values to 0, such that the final array would be:
final_grid =
3×12 cell array
'apple' 'berry' 'berry1' 'berry2' 'cat' 'cat1' 'cat2' 'cat3' 'dog' 'elephant' 'elephant1' 'elephant2'
'0' '0' '0' '0' '0' '0' '0' '0' '0' '0' '1' '1'
'1' '0' '0' '0' '0' '0' '0' '0' '0' '0' '1' '2'
This is what I have so far, which I'm certain is about as inefficient as it gets:
%With the above arrays already declared, as well as test_grid and arr4:
[TOTAL_SAMPLES,~] = size(test_grid);
TOTAL_SAMPLES = TOTAL_SAMPLES - 1; %TOTAL_SAMPLES is the number of received data points
%It is also used elsewhere, which is why I'm leaving it as is and then adding 1 in the next line.
%Create final_grid as cell with titles and zeros
final_grid = cell(TOTAL_SAMPLES+1,length(arr4));
final_grid(1,:) = arr4;
[S,T] = size(final_grid);
for i = 2:S
for j = 1:T
final_grid(i,j) = {"0"};
end
end
%Fill in known columns with known values
for i = 1:length(test_grid)
[~,T] = find(strcmp(arr4,test_grid(1,i)));
final_grid(2:end,T) = test_grid(2:end,i);
end
I'm wondering if anyone knows of a more efficient way to acheive the overall tasks described. I find myself constantly writing for loops to handle my arrays which work, but in the back of my mind I feel like Matlab probably has other functions that could acheive my tasks with much more efficiency.
James Tursa
James Tursa on 26 Mar 2019
Edited: James Tursa on 26 Mar 2019
Suppose arr3 is the one with the extra data. Then
arr = [arr3,[arr1,arr2;repmat({'0'},size(arr3,1)-1,numel(arr1)+numel(arr2))]];
[~,ia,~] = unique(arr(1,:));
arr4 = arr(:,ia);

Sign in to comment.

Accepted Answer

James Tursa
James Tursa on 21 Mar 2019
Edited: James Tursa on 21 Mar 2019
Not sure if this will suffice for your needs, but a simple way to get the unique strings in alphabetical order:
arr4 = unique([arr1,arr2,arr3]);
  1 Comment
Emerson Butler
Emerson Butler on 21 Mar 2019
Edited: Emerson Butler on 21 Mar 2019
Not perfect since I wanted them to stay in their non-alphabetical order, but this is so simple that I'll just deal with it. Having my data in alphabetical order may make future endeavors easier anyway. Thanks!

Sign in to comment.

More Answers (0)

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!