Kruskalwallis test for ECG signal

Question

Elzbieta on 5 Jan 2025

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2172674-kruskalwallis-test-for-ecg-signal

Commented: William Rose on 6 Jan 2025

Hello,

I windowed measured ECG signal for windows of lentght: 5s, 10s, 20s. Then I determined for each frame for given windowed signal the features. For each feature and for each frame of windowe signal, separately I calculated the average values and standard deviations after all windows and check whether the differences between the values for different windows are statistically significant.

For nonparametric tests for statistical analysis I used p-values and Kruskalwallis test. However For the given windowed signal for each frame of them I am obtaining the same values. How would the fix look like?

The code is as follows:

% Define files and parameters
files = {'features_arrythmia_window_5_non_overlapping_3D_var.mat', ...
    'features_arrythmia_window_10_non_overlapping_3D_var.mat', ...
    'features_arrythmia_window_20_non_overlapping_3D_var.mat'};
nFiles = numel(files);
mean_values_all = cell(nFiles, 1);
std_values_all = cell(nFiles, 1);
p_values_windows_all = cell(nFiles, 1);
% Step 1: Analyze statistical differences across windows
for fileIndex = 1:nFiles
    % Load the features file
    load(files{fileIndex}, 'feature');
    [nWindows, nMeasures, nArrhythmias] = size(feature);
    % Initialize matrices
    mean_values = zeros(nMeasures, nArrhythmias);
    std_values = zeros(nMeasures, nArrhythmias);
    p_values_windows = zeros(nMeasures, nArrhythmias);
    for iMeasure = 1:nMeasures
        for iArrhythmia = 1:nArrhythmias
            % Calculate mean and standard deviation
            mean_values(iMeasure, iArrhythmia) = mean(feature(:, iMeasure, iArrhythmia), 'omitnan');
            std_values(iMeasure, iArrhythmia) = std(feature(:, iMeasure, iArrhythmia), 'omitnan');
            % Statistical test across windows for the same measure and arrhythmia
            data = squeeze(feature(:, iMeasure, iArrhythmia));
            group = (1:nWindows)'; % Grouping vector for Kruskal-Wallis test
            p_values_windows(iMeasure, iArrhythmia) = kruskalwallis(data, group, 'off');
        end
    end
    % Store results
    mean_values_all{fileIndex} = mean_values;
    std_values_all{fileIndex} = std_values;
    p_values_windows_all{fileIndex} = p_values_windows;
end

Regards

Elzbieta

1 Comment
Show -1 older commentsHide -1 older comments

William Rose on 5 Jan 2025

@Elzbieta,

Please provide data files that illustrate the problem (or simulated data, if privacy is an issue), so others can run your code.

Sign in to comment.

Sign in to answer this question.

Answer 1

William Rose on 6 Jan 2025

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2172674-kruskalwallis-test-for-ecg-signal#answer_1556967

Open in MATLAB Online

ElzECGanalysis.m

@Elzbieta,

First, I needed to understand what the problm is. Since I needed files to run your script, I made 3 files of random data, as shown below. Each file contains a 3D array, 'feature', with dimensions (12 or 6 or 3) by 4 by 5. The dimensions, per your code, are nWindows, nMeasures, nArrhythmias. I made nWindows=12,6,3 for window length=5,10,20, respectively, with the idea that there'd be more windows if the window length is shorter.

feature=rand(12,4,5);
save('features_arrythmia_window_5_non_overlapping_3D_var.mat','feature')
feature=rand(6,4,5);
save('features_arrythmia_window_10_non_overlapping_3D_var.mat','feature')
feature=rand(3,4,5);
save('features_arrythmia_window_20_non_overlapping_3D_var.mat','feature')

Then I ran your script.

ElzECGanalysis

The script runs without error and produces no console or graphical output. The script produce 3 cell arrays: mean_values_all, std_values_all, and p_values_windows_all. Each cell of the three cell arrays is 4x5, which indicates that the mean, std, and p-values are computed along the nWindows dimension. For example:

disp(mean_values_all{1})
3787    0.5370    0.3881    0.6363    0.5376
5153    0.3675    0.5708    0.5496    0.4971
4683    0.4381    0.4770    0.5489    0.4686
5182    0.5350    0.5111    0.4322    0.4407
disp(mean_values_all{2})
4298    0.4309    0.4985    0.5383    0.5160
5582    0.5258    0.4151    0.5666    0.3922
5587    0.5258    0.4543    0.5967    0.6757
3933    0.3584    0.3207    0.6492    0.5395
disp(std_values_all{3})
2501    0.3767    0.3331    0.3592    0.2206
1106    0.1230    0.3346    0.1108    0.3330
3011    0.3716    0.3380    0.0698    0.3609
3659    0.0670    0.3938    0.1490    0.1372

The problem comes with the p-vaues:

disp(p_values_windows_all{1})
4433    0.4433    0.4433    0.4433    0.4433
4433    0.4433    0.4433    0.4433    0.4433
4433    0.4433    0.4433    0.4433    0.4433
4433    0.4433    0.4433    0.4433    0.4433
disp(p_values_windows_all{2})
4159    0.4159    0.4159    0.4159    0.4159
4159    0.4159    0.4159    0.4159    0.4159
4159    0.4159    0.4159    0.4159    0.4159
4159    0.4159    0.4159    0.4159    0.4159
disp(p_values_windows_all{3})
3679    0.3679    0.3679    0.3679    0.3679
3679    0.3679    0.3679    0.3679    0.3679
3679    0.3679    0.3679    0.3679    0.3679
3679    0.3679    0.3679    0.3679    0.3679

The problem is that p-values are the same, for every element of each cell. The p-values are calculated with the lines

data = squeeze(feature(:, iMeasure, iArrhythmia));

group = (1:nWindows)'; % Grouping vector for Kruskal-Wallis test

p_values_windows(iMeasure, iArrhythmia) = kruskalwallis(data, group, 'off');

I'm not sure what you are trying to do, but this is not it. The commands above take the vector

feature(:,iMeasure, iArrhythmia) (which has length 12 or 6 or 3, depending on the file being processed)

and pairs that vector with a group vector 1:12 or 1:6 or 1:3 and does a K-W test. This has the effect of creating 12 or 6 or 3 groups, with only 1 member in each group. Since there is only 1 member in each group, the K-W test can't give meaningful results. In fact, the p-value in these cases depends only on the length of the vector, and is unaffected by the actual random values in the vector:

disp(kruskalwallis(rand(1,12), 1:12, 'off'))
    0.4433
disp(kruskalwallis(rand(1,6), 1:6, 'off'))
    0.4159
disp(kruskalwallis(rand(1,3), 1:3, 'off'))
    0.3679

Note that the p-values above match the p-values for the data in the files, even though the random vectors are new. Do it again with new random values, and the p-values are the same:

disp(kruskalwallis(rand(1,12), 1:12, 'off'))
    0.4433
disp(kruskalwallis(rand(1,6), 1:6, 'off'))
    0.4159
disp(kruskalwallis(rand(1,3), 1:3, 'off'))
    0.3679

Please specify clearly and specifically the hypothesis you want to test. I suspect you want to test the hypothesis that the distribution of feature(:,j,k) (where j and k are nMeasures and nArrhythmias) is the same for window lengths 5, 10, and 20. If that is correct, then you can't do it the way you're doing it now.

We can do the K-W test across the three window lengths, for each value of j and k, but if we do, there'll be a significant multiple comparisons problem. That is, the probability of making a type I error for the whole group of comparisons is higher than the p-value for each individual comparison. A Bonferroni correction is one way to address this issue, but that has its own issues.

Please explain the meaning and role of dimensions nMeasures and nArrythmias in your data set. Please provide sample data files. I am asking for this info in order to provide more useful advice, including on the potential usefulness of a Bonferroni correction.

1 Comment
Show -1 older commentsHide -1 older comments

William Rose on 6 Jan 2025

Open in MATLAB Online

@Elzbieta,

Here is an example of how you can test hypothesis H0: the distribution of feature(:,j,k) is the same, for window lengths 5, 10, and 20 (where j=1:nMeasures and k=1:nArrhythmias). For this example, I assume nWindows=5 for all three files, nMeasures=4 (for example, the features could be RR interval, QRS duration, and QT interval), and nArrhythmias=2. (If nWindows is different for the different files, we will need to adjust the code. If nMeasures or nArrythmias is different for different files, then please explain the data sets and your goals as clearly as possible.) In this example, random numbers are used. The random numbers are all ~U(0,1), except for feature (:,1,1) when window length=5. In that case only, the random numbers are U(2,3). Therefore I expect a significant p-value (p<0.05) when j,k=1,1, and I expect no significant difference for all other j,k.

% Make simulated data files
feature=rand(5,4,2); feature(:,1,1)=feature(:,1,1)+2;
save('features_5s.mat','feature');
feature=rand(5,4,2);
save('features_10s.mat','feature');
feature=rand(5,4,2);
save('features_20s.mat','feature');
% Load the data from the files
files={'features_5s.mat','features_10s.mat','features_20s.mat'};
nFile=numel(files);
for fileIndex=1:nFile
    load(files{fileIndex},'feature');
    feature4d(:,:,:,fileIndex)=feature;
end
[nWindows,nMeasures,nArrhythmias,nF] = size(feature4d);
fprintf('Size(feature4d)=%d,%d,%d,%d.\n',size(feature4d))
Size(feature4d)=5,4,2,3.
p_values=zeros(nMeasures,nArrhythmias);  % allocate array
for j=1:nMeasures
    for k=1:nArrhythmias
        p_values(j,k)=kruskalwallis(squeeze(feature4d(:,j,k,:)),[],'off');
    end
end
disp(p_values)
    0.0092    0.8781
    0.4025    0.4724
    0.7634    0.6907
    0.9900    0.2299

Note that p_values(1,1) is <0.05, and all the other p_values() are >0.05. This is what we expected, based on the simulated data - see above. p_value(j,k) is probability that the null hypothesis is true. The null hypothesis is H0: features(:,j,k) has the same distribution for files 1, 2, and 3.

Sign in to comment.

Kruskalwallis test for ECG signal

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Kruskalwallis test for ECG signal

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments