How to construct a 3D matrix from a 2D matrix

Hi all, I am struggling to find a solution to this problem. I am not very skilled with 3D matrices and my MATLAB coding skills are worse than I thought. So the problem is as follow. I have a 5-column matrix made of patent number, application year, category (ranging from 1 to 6), subcategory (ranging from 1 to 36) and assignee number. So for each assignee number I have many patents denoted by the patent number. Now I want to construct a 3D matrix out of the matrix I just described having the years on the rows or first dimension, the subcategory on columns or second dimension and the assignee number on pages or third dimension. In each cell I would like to have the amount of patents a company has. In other words, the sum of rows per each assignee number.
I am stuck at the very beginning and I don't know where to begin the loop.
Many thanks.

5 Comments

I managed to find a vector containing in each cell a matrix for each company through this code:
[u,ia,ib] = unique(C(:,end));
for i=1:length(O)
Cassignee(i)={C(ib==i,:)};
where C is the name of the 2D matrix.
It's always simpler to work with data than imagine things; how about attaching a .mat file with at least a representative sample of data?
I'd ask specifically, why do you want said 3D array; what's the end goal you're after, I'm guessing there's probably more effective way to get there.
Thanks for your reply dpb. I attached a sample since the original matrices have millions of rows. So you will find the three matrices I'm working with. Matrix A contains the name of the companies on the first column and a company code on the second. Matrix B contains the assignee number on the first column and four columns of company codes which have changed over time. Matrix C contains in order the patent number, application year, category, subcategory and assignee number. Given that, I have basically three matrices which are linked pairwise by a common column: A and B through the company code column and matrices B and C through the assignee number column. My aim is to have a vector for each company for each year made of by the number of patents the company has developed for each subcategory. That's why I though about a 3D matrix which has time on rows subcategories on columns and companies on pages. Eventually, I need to link this 3D matrix to a database of aquirors and target companies which have merged in the past years and match the two so to have only the companies in common.
I hope I've made it clear.
Paolo
Paolo on 21 May 2018
Edited: Paolo on 21 May 2018
In the file you provided, Bsample has six columns rather than five. Is the second column also used for assignee number or are there five columns for company codes?
OK, got it...Firstly, I'd convert to categorical(*) variables instead of strings; will be much easier to do lookups and comparisons that way as well as be less memory-intensive if the tables are large as indicate; before go too far, what's with the minus signs at the beginning of B array first column for assignee--that have some significance?
I've not had time to actually do more than look a little, sorry, but I'm thinking probably using tables and grouping variables will let you do the lookups you're after rather than building this alternate array (esentially it is the array, but would be in table form and use grouping variables to do the construction behind the scenes).
The first step along my idea would be something like-
A=table(categorical(Asample(:,1)),categorical(Asample(:,2)),'VariableNames',{'Company','CompCode'});
B=table(categorical(Bsample(:,1)),categorical(Bsample(:,2:end)),'VariableNames',{'Assignee','CompCode'});
C=table(categorical(Csample(:,1)),categorical(Csample(:,2)),categorical(Csample(:,3)),categorical(Csample(:,4)),'VariableNames',{'Patent','Year','Subcategory','Assignee'});
Example to see where we start from--
>> C(1:10,:)
ans =
10×4 table
Patent Year Subcategory Assignee
_______ ____ ___________ ________
3930271 1974 6 63
3930272 1974 6 65
3930273 1975 6 65
3930273 1975 6 65
3930276 1972 6 69
3930277 1974 6 69
3930280 1974 6 69
3930280 1974 6 69
3930281 1974 6 69
3930282 1974 6 61
>>
The above raises another question--why are there duplicated rows in C? 3930273 is there twice with identically same data for all fields...
(*) Or, if the codes are and always will be all numeric, could use integer.

Sign in to comment.

Answers (0)

Categories

Products

Release

R2018a

Tags

Asked:

on 20 May 2018

Edited:

dpb
on 21 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!