I will have two sets of field data -- one taken for six weeks last year, and another taken for two months this year. For EACH dataset I have variables collected from 4-8 different sources, for (up to) 50 days, collected at up to 3 different sites. Both datasets, in their entirety, span about 200 - 300 columns and between 8,000 - 15,000 rows.
Within that, I'm trying to figure out how to set up my code for analysing both sets of data. I want to do some different things --
- Analyse the data from each source separately to check for error
- Filter out a large quantity (up to 25%) of data which is poor quality
- Check all the filtered data from ONE dataset for trends between days (rows) and variables (columns)
- Compare filtered data in one dataset between three sites (eg all collected at the same time, on the same days)
- Compare different (filtered) variables within a single dataset over time, and
- Perform analysis on the changes between both (filtered) datasets.
I have no idea how to structure and maintain my code to allow me to do all of these things. I know some of the tests I want to do but others I haven't thought of yet. At the moment I have about 10 different programs which load and structure my raw datafiles in different ways (one comprised of an array of structs, another where data is subset into variables etc), but this is incredibly confusing and has led to a lot of error and enormous amounts of repetition. Deeply nested structs became impossible to work with last year.
I will also have a set of images I want to analyse at the same time, taken from the same days, so I need to take that into account too.
Matlab is so powerful and there are so many ways of managing data. Does anyone have any ideas on organising such a large dataset to be able to analyse so many different parts of it?