Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?

Question

Josiah on 4 Jan 2013

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/57987-large-datasets-any-way-to-perform-regression-analyses-on-select-variables-within-a-large-table-base

I have a large dataset of soil profiles. I am trying to calculate regressions of organic carbon and profile depth. The data set is a csv with columns for 'profile_name', 'top_depth', 'bottom_depth' and 'organic_carbon'. There are other columns for spatial data that I shouldn't have to mention.

The data is organized so there are multiple rows for one profile, so the 'profile name' value is the same for anywhere from 2 to 10 rows while the 'top_depth' and 'bottom_depth' change to reflect the sample interval within the soil profile, and the 'organic_carbon' represents how much carbon is in the soil.

What I want to do is write a script that will run linear and/or logarithmic regressions of 'organic carbon' and the 'Bottom Depth' values within each distinct 'profile_name'. I might want to go further with some calculations but I think that would be the best start. The hurdle for me is sort of binning the profile data by 'profile_name'. Any clues would be greatly appreciated!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Tom Lane on 5 Jan 2013

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/57987-large-datasets-any-way-to-perform-regression-analyses-on-select-variables-within-a-large-table-base#answer_70247

Open in MATLAB Online

If you have the Statistics Toolbox, you might find it handy to use "dataset" to read in the csv file and create a dataset array from it. Then I recommend that you convert the profile_name variable to a nominal variable. The following illustrates how you can operate on different subsets based on values of a nominal variable:

d =  dataset(Origin,Displacement,Weight);  % you would read this from a file
d.Origin = nominal(d.Origin);              % convert text to nominal
org = unique(d.Origin);
for j=1:length(org);
    t = d.Origin==org(j);
    p = polyfit(d.Weight(t),d.Displacement(t),1);
    fprintf('%s:  %s\n',char(org(j)),num2str(p))
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Sam on 4 Apr 2013

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/57987-large-datasets-any-way-to-perform-regression-analyses-on-select-variables-within-a-large-table-base#answer_81025

Edited: Sam on 5 Apr 2013

I've compiled a similar soils dataset and find grpstats() to be very useful for sorting and indexing my data, especially when the data have multiple z values from horizon-based or depth-based profile sampling.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments