How to find a correlation coefficient of each variable from a multivariable dataset?
18 views (last 30 days)
Puru Kathuria on 20 Jul 2020
Correlation can be observed between all pair of features and it can tell you how strongly are those variables related.
Correlation among those variables can be positive(when both variables grow in the same direction), negative(when variables grow in the opposite direction) or no correlation at all. And there are multiple ways to observe correlation in the data.
First, you can plot your pair of variables and analyse the plots visually, this can be done using pair plotting [Link1, Link2, Link3]. Using this technique you will see C(M,2) number of plots, thus we prefer this plot when we have a limited number of variables to analyse. Like in your case, M being 3 is a small number to analyse visually.
Second, you can find a linear correlation coefficient, Pearson’s coefficient[Link4] is the most commonly used. This acts as a metric to calculate the correlation among variables. This correlation coefficient ranges from −1 to 1, where 1 implies perfect positive correlation, -1 implies a perfect negative correlation and 0 implies no correlation. [Link5]
You can also refer to this link for other correlation coefficients
%%%%%%% Example: %%%%%%%
data = readmatrix("sampleData.xlsx"); %read data from the file
% plotmatrix :
% Returns a plot where every pair of variables are plotted against each other. If the
% number of variables are N then the plot will have N^2 sub plots where diagonals of the
% plot shows the probability distribution of the variable
% corrplot :
% creates a matrix of plots showing correlations
% among pairs of variables in input data.
% Also, computes the correlation coefficients