Choice of data structure - cellarray or dataset
Show older comments
I have data of the structure 3 columns by 1.5 million rows. Each column is a variable namely cocode, date, price. I would like to extract /group data from the above data structure based on code and date and perform operations on it or ideally create a time series object for each cocode for a specific time period. Which data type would be the best to use for such data ? "cellArray" or "dataset" or any other ?(I am familiar with reading from excel and dumping the data into either of these) Also what is the ideal method to query the data set based on cocode and date without having to use nested loops ? Is it ok to use a for loop to gather all data or use "find" and get the indices of cocode equal to distinct value of cocode in my data(run in a loop) and use the same indices to access other data corresponding to it? Also I would want to query again based on dates from the above resultset and then perform my analysis on it. If I can group values using "accumarray" how will I pass the indices and can I return input "cellarray" to it ? So bottomline I want to group by cocode and then by dates and have the data , may be in a time series object.
1 Comment
Oleg Komarov
on 7 Mar 2011
Make a bullet point list of the questions.
Accepted Answer
More Answers (1)
Matt Tearle
on 7 Mar 2011
1 vote
In addition to what Oleg suggested, does "code" store a small number of possible values, or are they all distinct (for 1.5 million rows)? In the former case, you might want to use a nominal array.
If you have only three columns, why not store them as separate variables? But if you really want to keep them together, either a 1-by-3 cell array or a dataset array would be the most natural, and least memory-intensive (other than numeric, of course).
5 Comments
Ms. Mat
on 7 Mar 2011
Matt Tearle
on 7 Mar 2011
I'm a bit confused. In 1 you said "the code and the date combination is distinct", but what you said in 2 & 3 sounds like you want to extract each code and each date, then do something. But wouldn't that result in a single value for price?
It sounds quite like using grpstats would do what you want, but I'm not sure. Are you saying that you have a set of codes and a set of dates, and for each unique code-date pair, you want to extract the associated prices and perform some analysis on it? If so, what analysis? Is it something that should return a single value (for each code-date pair)? Or is the date extraction something more complex (like: for each code, divide up the prices by week, based on the date)?
Ms. Mat
on 8 Mar 2011
Oleg Komarov
on 8 Mar 2011
Give a look at my edited answer.
Matt Tearle
on 8 Mar 2011
OK, I'd recommend a nominal array for the company code, then combine everything together in a cell or dataset array. If you go with cells, store the three arrays in a 1-by-3 cell array. I'd also recommend what Oleg suggested about converting the dates to date numbers. You'll most likely need that anyway, plus you can store a double array instead of a cell array of strings.
To do the analysis, see if you can write a function that will return what you need for a single company/code. Then you can use grpstats to do all companies, by specifying code as your grouping variable and your function as the stat to calculate.
Inside your function, use logical indexing to extract the dates you want. You may have to do the analysis in a loop. Look at the various time/date functions in MATLAB to help here. In particular, using date vectors (then datenum to convert to serial date numbers) is a good way to deal with months. For weeks, you can always just add multiples of 7 to a date number.
Logical indexing works something like this: suppose you have dates stored as date numbers in an array "date" and you know the start and end dates, t1and t2. Then
x = price(date>=t1 & date<=t2);
extracts the prices for the given range of dates.
Categories
Find more on Time Series Collections in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!