Which Anova test and how to use it?

6 views (last 30 days)
Good afernoon everyone,
I would like to use an anova test but i unfortunately does not know which one to use.
I have attached an excel file with the datas.
For instance,
I would like to know the relevance when Thickness and orientation are involved. These are the data of 9 individuals with 5 repetitions.
The correct/not represent whether the participants have found the correct answer or not. correct =1 and Not =0
  28 Comments
Franck paulin Ludovig pehn Mayo
@Adam Danz okay i understand. Just came across this statitics methods like a week ago, trying to understand what is what and which one can suits my issue.
The few info i got seems to lead me towards confidence interval.
Adam Danz
Adam Danz on 20 Jul 2022
Edited: Adam Danz on 20 Jul 2022
I came across these statistical methods 15 years ago and am still trying to understand which ones suit different sets of data and questions. It wasn't until about 5 years ago that I realized my long-tem confusion wasn't a problem with my understanding -- it's a problem in the field of statistics in general. So many peer-reviewed articles apply statistics incorrectly or do not show that the data are fit for the selected statistics. Worse yet, some people keep applying different statistics until they get the results they want which is p-hacking. Three years ago hundreds of scientists and statisticians around the globe supported a movement to change how we think about and practice statistics (see list of articles at the bottom of this answer). What's nice about bootstrapped CIs is that they can be used to visualize how closely related are two distributions rather than just providing a number such as p<0.005.
I'm not swaying you away from using an ANOVA method - but I am arguing that the movement mentioned is a big step forward in statistics.

Sign in to comment.

Answers (1)

Adam Danz
Adam Danz on 13 Jul 2022
I recommend using bootstrapped confidence intervals. The idea is to resample your accuracy data with replacement and compute the mean on the sample for each condition. If you repeat this many times (1000, for example), you'll have a distribution of means which can be used to compute the middle 95% interval. Fortunately MATLAB has a function that does most of the work: bootci which is demo'd in this comment. After you have the CIs for each condition, you can plot them using errorbar. If the CIs do not overlap between two conditions, it is likely that the data from those condtions come from different distributions.
Here's a demo that performs bootstrapped CIs for a single condition in your data. I would set up the loop to compute CIs for all conditions but I still do not understand which conditions to compare since the data do not appear to be nested. Perhaps if the 'thickness' values were corrected in some way, it would be clearer. But first you give it a shot.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1062725/Anovan_Ptestdata.xlsx','VariableNamingRule','preserve');
thickIdx = T.thicknesss == 0.04;
orientIdx = strcmp(T.orientation, 'vertical');
CI = bootci(1000, {@mean, T.("correct/not")(thickIdx & orientIdx)}, 'Type', 'per')
CI = 2×1
0.7667 0.9111
mu = mean(T.("correct/not")(thickIdx & orientIdx));
bar(mu)
hold on
errorbar(1, mu, mu-CI(1), mu-CI(2), 'k-','LineWidth',1)
  14 Comments
Adam Danz
Adam Danz on 20 Jul 2022
  1. This is more of an art form than a science. There are lots of bits of advice out there to know when enough is enough. It's been obvoius to me when I don't have enough data but less obvious when I've collected enough. I have use cross validation to help make that decision. The main idea is, if I remove something like 10-20% of my data and get approximately the same results, then I have enough data.
  2. It wouldn't be surprising if the CIs differ by a very small amount between runs. bootci uses and random selection of your data so the results can differ by a very small amount. If you're getting noticable different results between runs, someting is wrong. Either you're not runing enough boot straps (1000 should be enough but you could try more) or you're not providing the same exact input data between runs. This is definitely something you want to investigate.
  3. I still don't understand your dataset enough to imagine this comparison. If any given data point has a thickness property and an orientation property and you want to know whether thickness or orientation has a stronger effect, then I don't think you can do that with this bootstrapping method which makes me fear that this entire multiple-day thread has nothing to do with your actual goals. The main lesson, if this is the case, is that the data and the goals must be crystal clear to you and to the readers before a useful answer can be written.
I realized you previously asked about NaNs in your bootci results but I forgot to address that question. By default, mean does not ignore NaNs and if there is a NaN in the data, the mean will be NaN. You want to omit nans using
___ = bootci(nBoot, {@(x)mean(x,'omitnan'),data}, 'Type', 'per')
That's all the time I have for this thread @Franck paulin Ludovig pehn Mayo. I hope these ideas will be helpful to you even if you don't end up needing them.
Franck paulin Ludovig pehn Mayo
@Adam Danz Thank you very much , i have grasped the concept. I have an idea how i will go from here.
The last input i would like to know is to fix the Nan . i have implemented it but unfortunately i am still having the same error.
BOOTFUN returns a NaN or Inf.
T = readtable('Newfile.xlsx');
%Var2 = thickness
thickIdx1 = T.Var2 == 0;
thickIdx2 = T.Var2 == 0.02;
thickIdx3 = T.Var2 == 0.03;
thickIdx4 = T.Var2 == 0.04;
%Var4= orientation
orientIdx = strcmp(T.Var4, 'vertical');
%var5= correct/not
data1 = T.("Var5")(thickIdx1 & orientIdx);
data2 = T.("Var5")(thickIdx2 & orientIdx);
data3 = T.("Var5")(thickIdx3 & orientIdx);
data4 = T.("Var5")(thickIdx4 & orientIdx);
%number of bootstapps
nBoot = 1000;
CI1 =bootci(nBoot, {@(x)mean(x,'omitnan'),data1}, 'Type', 'per')
CI2 = bootci(nBoot, {@mean,data2}, 'Type', 'per')
CI3 = bootci(nBoot, {@mean,data3}, 'Type', 'per')
CI4 = bootci(nBoot, {@mean,data4}, 'Type', 'per')
mu1 = mean(data1);
mu2 = mean(data2);
mu3 = mean(data3);
mu4 = mean(data4);
bar(mu1)
bar(mu2)
bar(mu3)
bar(mu4)
hold on
bar([1 2 3 4])
hold on
errorbar([1 2 3 4], 1:4, rand(1,4), rand(1,4),'k-','LineStyle','none','LineWidth',1)

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!