Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?

5 views (last 30 days)
I have a large dataset of soil profiles. I am trying to calculate regressions of organic carbon and profile depth. The data set is a csv with columns for 'profile_name', 'top_depth', 'bottom_depth' and 'organic_carbon'. There are other columns for spatial data that I shouldn't have to mention.
The data is organized so there are multiple rows for one profile, so the 'profile name' value is the same for anywhere from 2 to 10 rows while the 'top_depth' and 'bottom_depth' change to reflect the sample interval within the soil profile, and the 'organic_carbon' represents how much carbon is in the soil.
What I want to do is write a script that will run linear and/or logarithmic regressions of 'organic carbon' and the 'Bottom Depth' values within each distinct 'profile_name'. I might want to go further with some calculations but I think that would be the best start. The hurdle for me is sort of binning the profile data by 'profile_name'. Any clues would be greatly appreciated!

Answers (2)

Tom Lane
Tom Lane on 5 Jan 2013
If you have the Statistics Toolbox, you might find it handy to use "dataset" to read in the csv file and create a dataset array from it. Then I recommend that you convert the profile_name variable to a nominal variable. The following illustrates how you can operate on different subsets based on values of a nominal variable:
d = dataset(Origin,Displacement,Weight); % you would read this from a file
d.Origin = nominal(d.Origin); % convert text to nominal
org = unique(d.Origin);
for j=1:length(org);
t = d.Origin==org(j);
p = polyfit(d.Weight(t),d.Displacement(t),1);
fprintf('%s: %s\n',char(org(j)),num2str(p))

Sam on 4 Apr 2013
Edited: Sam on 5 Apr 2013
I've compiled a similar soils dataset and find grpstats() to be very useful for sorting and indexing my data, especially when the data have multiple z values from horizon-based or depth-based profile sampling.


Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!