Calculate confidence interval of data set

Question

Katara So on 17 Dec 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/696780-calculate-confidence-interval-of-data-set

Edited: Liam Walsh on 24 Nov 2025 at 15:40

I want to calculate a 95% confidence interval for the difference between the mean of each variable for the species versicolor and virginica in the "fisheriris" data set, but I don't quite understand how to write the code. This is my code so far:

load fisheriris
%Versicolor
ver1 = meas(51:100,1);
ver2 = meas(51:100,2);
ver3 = meas(51:100,3);
ver4 = meas(51:100,4);
ver1_mean = mean(ver1);
ver2_mean = mean(ver2);
ver3_mean = mean(ver3);
ver4_mean = mean(ver4);
%Virginica
vir1 = meas(101:150,1);
vir2 = meas(101:150,2);
vir3 = meas(101:150,3);
vir4 = meas(101:150,4);
vir1_mean = mean(vir1);
vir2_mean = mean(vir2);
vir3_mean = mean(vir3);
vir4_mean = mean(vir4);
q1 = quantile(ver1_mean - vir1_mean, [0.95])
q2 = quantile(ver2_mean - vir2_mean, [0.95])
q3 = quantile(ver3_mean - vir3_mean, [0.95])
q4 = quantile(ver4_mean - vir4_mean, [0.95])

But I assume that I should have some sort of interval instead of just 0.95 in my quantile command, however whatever I put for ex, [0.05 0.95], I get the same value in for both ends of the interval and if I just have 0.95 then I don't get an interval. Could someone explain where I have gone wrong, all help is appreciated!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Liam Walsh on 13 Nov 2025

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/696780-calculate-confidence-interval-of-data-set#answer_1571927

Edited: Liam Walsh on 24 Nov 2025 at 15:40

Open in MATLAB Online

In a similar vein to Ive J's answer, you can either make use of the bootci function or ttest2.

bootci will generate confidence intervals by creating subsamples of the datasets and computing the mean differences from those subsamples. From the set of these subsampled mean differences, we can construct intervals, which are returned by bootci. It is a non-parametric approach, meaning that it doesn't assume any specific distribution about the data or about quantities estimated from the data.

ttest2 will generate confidence intervals based on properties of the estimate of the mean difference. This test is parametric, unlike bootci, and it assumes the data passed to it are approximately normally distributed. However, with enough observations (>= 30 is the general rule of thumb), the confidence intervals are fairly robust to departures from this assumption.

% Separate out the data
load fisheriris
versData = meas(51:100,:);
virgData = meas(101:end,:);
% Approach 1: bootci
meandiff = @(x,y) mean(x-y);
nsamples = 1000;
versVirgDiff = bootci(nsamples, meandiff, versData, virgData);
% Approach 2 - ttest2
versVirgDiffT = zeros(2, 4);
for i = 1:4 % Four columns in the data
    % Set VarType="unequal" to indicate that each of the columns have
    % different variances. This can be set to "equal" if they have equal
    % variances, but "unequal" will be more robust if that assumption is
    % incorrect
    [~,~,versVirgDiffT(:, i)] = ttest2(versData(:,i), virgData(:,i), VarType="unequal");
end
% Compare each of the estimates. They show good alignment
disp(versVirgDiff)
   -0.9065   -0.3440   -1.5080   -0.7860
   -0.4360   -0.0820   -1.1174   -0.6050
disp(versVirgDiffT)
   -0.8820   -0.3303   -1.4955   -0.7951
   -0.4220   -0.0777   -1.0885   -0.6049

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Ive J on 18 Dec 2020

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/696780-calculate-confidence-interval-of-data-set#answer_579000

Edited: Ive J on 18 Dec 2020

Open in MATLAB Online

I'm not sure if you want to calculate CI or something like reference range. Regardless, as you can see on quantile help page, it says Quantiles of a data set. So, of course when your inputs are scalar the output is same as the input.

data = randn(100, 1);
quantile(data, [0.05 0.95]) % this is not 95% CI
% 95% CI
CI95 = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), mean(A) + 1.96*(std(A)./sqrt(numel(A)))]

2 Comments
Show NoneHide None

Katara So on 18 Dec 2020

Open in MATLAB Online

I want to calculate CI, however I am having trouble getting the mean values of the variables. In another problem where I had to use bootstrp to calculate CI, I did like this:

%Versicolor
ver1 = meas(51:100,1);
%Virginica
vir1 = meas(101:150,1);
M = 2000;
alpha = 0.05/4;
vers_boot1=bootstrp(M,@mean,ver1);
vir_boot1=bootstrp(M,@mean,vir1);
ci1 = quantile(vers_boot1-vir_boot1, [alpha 1-alpha])

which gave me an that ci1 = [-0.9170 -0.3960].

So now I want to do the same but without using bootstrp and for 95% CI. What I don't understand is how to get a vector of the mean values for the two different variables. With the code in my original question I get that the mean is only one value while in the problem posted above I get it to be a vector.

Ive J on 20 Dec 2020

Edited: Ive J on 20 Dec 2020

Open in MATLAB Online

In bootsrapt you simply do random sampling form ver1 and vir1; so, in above example you pick K random subsamples with replication from ver1 and vir1 and calculate the mean of each subsample. So, you have a vector of mean values, for which you can easily calculate CI.

I guess you are looking for something like this:

load fisheriris
%Versicolor
ver = meas(51:100,1);
%Virginica
vir = meas(101:150,1);
% 95% CI using bootstrapping
M = 2000;
alpha = 0.05/2;
ver_boot = bootstrp(M, @mean, ver);
vir_boot = bootstrp(M, @mean, vir);
ci_bootstrap = quantile(ver_boot - vir_boot, [alpha 1-alpha]);
% using Z-value (assuming normal distribution)
A = ver - vir;
ci_z = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), ...
    mean(A) + 1.96*(std(A)./sqrt(numel(A)))];
% using t-distribution 
[~, ~, ci_t] = ttest2(ver, vir);
% compare them
ci_bootstrap
ci_z
ci_t'
ci_bootstrap =
   -0.8880   -0.4230
ci_z =
   -0.8942   -0.4098
ci_t =
   -0.8819   -0.4221
% visualize esitmated mean from bootstrapping
histogram(ver_boot - vir_boot)
line([mean(A), mean(A)], get(gca, 'YLim'), 'color', 'r', 'LineWidth', 1.5)

Sign in to comment.

Calculate confidence interval of data set

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

Calculate confidence interval of data set

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None