Calculate confidence interval of data set

76 views (last 30 days)
Katara So
Katara So on 17 Dec 2020
Answered: Liam Walsh on 13 Nov 2025 at 16:03
I want to calculate a 95% confidence interval for the difference between the mean of each variable for the species versicolor and virginica in the "fisheriris" data set, but I don't quite understand how to write the code. This is my code so far:
load fisheriris
%Versicolor
ver1 = meas(51:100,1);
ver2 = meas(51:100,2);
ver3 = meas(51:100,3);
ver4 = meas(51:100,4);
ver1_mean = mean(ver1);
ver2_mean = mean(ver2);
ver3_mean = mean(ver3);
ver4_mean = mean(ver4);
%Virginica
vir1 = meas(101:150,1);
vir2 = meas(101:150,2);
vir3 = meas(101:150,3);
vir4 = meas(101:150,4);
vir1_mean = mean(vir1);
vir2_mean = mean(vir2);
vir3_mean = mean(vir3);
vir4_mean = mean(vir4);
q1 = quantile(ver1_mean - vir1_mean, [0.95])
q2 = quantile(ver2_mean - vir2_mean, [0.95])
q3 = quantile(ver3_mean - vir3_mean, [0.95])
q4 = quantile(ver4_mean - vir4_mean, [0.95])
But I assume that I should have some sort of interval instead of just 0.95 in my quantile command, however whatever I put for ex, [0.05 0.95], I get the same value in for both ends of the interval and if I just have 0.95 then I don't get an interval. Could someone explain where I have gone wrong, all help is appreciated!

Answers (2)

Ive J
Ive J on 18 Dec 2020
Edited: Ive J on 18 Dec 2020
I'm not sure if you want to calculate CI or something like reference range. Regardless, as you can see on quantile help page, it says Quantiles of a data set. So, of course when your inputs are scalar the output is same as the input.
data = randn(100, 1);
quantile(data, [0.05 0.95]) % this is not 95% CI
% 95% CI
CI95 = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), mean(A) + 1.96*(std(A)./sqrt(numel(A)))]
  2 Comments
Katara So
Katara So on 18 Dec 2020
I want to calculate CI, however I am having trouble getting the mean values of the variables. In another problem where I had to use bootstrp to calculate CI, I did like this:
%Versicolor
ver1 = meas(51:100,1);
%Virginica
vir1 = meas(101:150,1);
M = 2000;
alpha = 0.05/4;
vers_boot1=bootstrp(M,@mean,ver1);
vir_boot1=bootstrp(M,@mean,vir1);
ci1 = quantile(vers_boot1-vir_boot1, [alpha 1-alpha])
which gave me an that ci1 = [-0.9170 -0.3960].
So now I want to do the same but without using bootstrp and for 95% CI. What I don't understand is how to get a vector of the mean values for the two different variables. With the code in my original question I get that the mean is only one value while in the problem posted above I get it to be a vector.
Ive J
Ive J on 20 Dec 2020
Edited: Ive J on 20 Dec 2020
In bootsrapt you simply do random sampling form ver1 and vir1; so, in above example you pick K random subsamples with replication from ver1 and vir1 and calculate the mean of each subsample. So, you have a vector of mean values, for which you can easily calculate CI.
I guess you are looking for something like this:
load fisheriris
%Versicolor
ver = meas(51:100,1);
%Virginica
vir = meas(101:150,1);
% 95% CI using bootstrapping
M = 2000;
alpha = 0.05/2;
ver_boot = bootstrp(M, @mean, ver);
vir_boot = bootstrp(M, @mean, vir);
ci_bootstrap = quantile(ver_boot - vir_boot, [alpha 1-alpha]);
% using Z-value (assuming normal distribution)
A = ver - vir;
ci_z = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), ...
mean(A) + 1.96*(std(A)./sqrt(numel(A)))];
% using t-distribution
[~, ~, ci_t] = ttest2(ver, vir);
% compare them
ci_bootstrap
ci_z
ci_t'
ci_bootstrap =
-0.8880 -0.4230
ci_z =
-0.8942 -0.4098
ci_t =
-0.8819 -0.4221
% visualize esitmated mean from bootstrapping
histogram(ver_boot - vir_boot)
line([mean(A), mean(A)], get(gca, 'YLim'), 'color', 'r', 'LineWidth', 1.5)

Sign in to comment.


Liam Walsh
Liam Walsh on 13 Nov 2025 at 16:03
In a similar vein to Ive J's answer, you can either make use of the bootci function or ttest2.
bootci will generate confidence intervals by creating subsamples of the datasets and computing the mean differences from those subsamples. From the set of these subsampled mean differences, we can construct intervals, which are returned by bootci. It is a non-parametric approach, meaning that it doesn't assume any specific distribution about the data or about quantities estimated from the data.
ttest2 will generate confidence intervals based on properties of the estimate of the mean difference. This test is parametric, unlike bootci, and it assumes the data passed to it are approximately normally distributed. However, with enough observations (>= 30 is the general rule of thumb), the confidence intervals are fairly robust to departures from this assumption.
% Separate out the data
load fisheriris
setData = meas(1:50,:);
versData = meas(51:100,:);
virgData = meas(101:end,:);
% Approach 1: bootci
meandiff = @(x,y) mean(x-y);
nsamples = 1000;
setVersDiff = bootci(nsamples, meandiff, setData, versData);
setVirgDiff = bootci(nsamples, meandiff, setData, virgData);
versVirgDiff = bootci(nsamples, meandiff, versData, virgData);
% Approach 2 - ttest2
setVersDiffT = zeros(2, 4);
setVirgDiffT = zeros(2, 4);
versVirgDiffT = zeros(2, 4);
for i = 1:4 % Four columns in the data
% Set VarType="unequal" to indicate that each of the columns have
% different variances. This can be set to "equal" if they have equal
% variances, but "unequal" will be more robust if that assumption is
% incorrect
[~,~,setVersDiffT(:, i)] = ttest2(setData(:,i), versData(:,i), VarType="unequal");
[~,~,setVirgDiffT(:, i)] = ttest2(setData(:,i), virgData(:,i), VarType="unequal");
[~,~,versVirgDiffT(:, i)] = ttest2(versData(:,i), virgData(:,i), VarType="unequal");
end
% Compare each of the estimates. They show good alignment
disp(setVersDiff)
-1.0975 0.5080 -2.9400 -1.1440 -0.7600 0.7954 -2.6540 -1.0140
disp(setVersDiffT)
-1.1057 0.5198 -2.9396 -1.1431 -0.7543 0.7962 -2.6564 -1.0169
disp(setVirgDiff)
-1.7821 0.3280 -4.2700 -1.8600 -1.3900 0.5880 -3.9440 -1.7040
disp(setVirgDiffT)
-1.7868 0.3143 -4.2537 -1.8631 -1.3772 0.5937 -3.9263 -1.6969
disp(versVirgDiff)
-0.9120 -0.3399 -1.4989 -0.7890 -0.4069 -0.0849 -1.0910 -0.6040
disp(versVirgDiffT)
-0.8820 -0.3303 -1.4955 -0.7951 -0.4220 -0.0777 -1.0885 -0.6049

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!