When applying the Simple Filter Approach (t-test) for feature selection, if all features have p-values of 0, does it mean that all features have strong discrimination power?

Question

Hussein on 20 Apr 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2109536-when-applying-the-simple-filter-approach-t-test-for-feature-selection-if-all-features-have-p-valu

Commented: Hussein on 9 May 2024

Hello all,

I have frequency response function (FRF) dataset related to pipeline SHM stored in a 6500x4000 matrix (6500 samples (signals) and 4000 features each}. The dataset corresponds to 11 groups or class labels (pipeline conditions). 1500 samples labeled as 'Fault-free', 500 samples labeled as 'BL_C1', 500 samples labeled as 'BL_C2', 500 samples labeled as 'BL_C3', 500 samples labeled as 'BL_C4', 500 samples labeled as 'SD_C1', 500 samples labeled as 'SD_C2', 500 samples labeled as 'SD_C3', 500 samples labeled as 'SC_C1', 500 samples labeled as 'SC_C2', and 500 samples labeled as 'SC_C3'.

I used this code for feature selection using Simple Filter Approach (t-test):

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

% applying t.test for feature selection

% Define the class labels and sample counts

class_labels = {'Fault-free', 'BL_C1', 'BL_C2', 'BL_C3', 'BL_C4', 'SD_C1', 'SD_C2', 'SD_C3', 'SC_C1', 'SC_C2', 'SC_C3'};

sample_counts = [1500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500];

%sample_counts = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]; % using the average signal for each scenario

% Construct the 'groups' variable vector based on the class labels and sample counts

num_samples = sum(sample_counts);

groups = zeros(num_samples, 1);

start_idx = 1;

for i = 1:length(class_labels)

end_idx = start_idx + sample_counts(i) - 1;

groups(start_idx:end_idx) = i;

start_idx = end_idx + 1;

end

% Applying the Simple Filter Approach (t-test)

t_scores = zeros(1, size(data, 2));

p_values = zeros(1, size(data, 2));

alpha = 0.05;

for feature = 1:size(data, 2)

[h, p, ci, stats] = ttest2(data(:, feature), groups, 'Vartype', 'unequal');

t_scores(feature) = stats.tstat;

p_values(feature) = p;

end

% Select features based on p-values below the significance level

selected_features = find(p_values < alpha);

ecdf(p);

xlabel('P value');

ylabel('CDF value')

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The code returned 0 p-value for all features. I used the avarage signal for each scenario reducing the dataset from 6500x4000 to 11x4000 corresponds to 11 sample (signals) representing 11 conditions, again with 4000 feature each, but still 0 p-values returned.

Is this acceptable?

Does it mean that all features have strong discrimination power? I doubt it, to be hounest!

Can anyone clear the doubt, rectify the code if I'm wrong somewher, or help me with a better code for a better technique that works well with my dataset?

Thank you very much in advance!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ayush Aniket on 7 May 2024

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2109536-when-applying-the-simple-filter-approach-t-test-for-feature-selection-if-all-features-have-p-valu#answer_1453676

Open in MATLAB Online

Hi Hussein,

The t-test is traditionally used to compare the means between two groups. Your dataset involves 11 groups, which suggests that a one-way ANOVA (Analysis of Variance) might be more appropriate for comparing means across multiple groups.

Additionally, the 'ttest2' function is designed for comparing the means of two independent samples. In your code, you're comparing 'data(:, feature)' against 'groups', which is conceptually incorrect because 'groups' is not a dataset but a vector of class labels. For feature selection in a multi-class scenario, you would typically compare features across pairs of groups or use techniques designed for multi-class discrimination.

The correct approach for comparing two different 'groups' is as following:

% Define the two groups based on your binary class labels
group1_idx = groups == 1; % Indices for class 1
group2_idx = groups == 2; % Indices for class 2
% Preallocate arrays for t-scores and p-values
t_scores = zeros(1, size(data, 2));
p_values = zeros(1, size(data, 2));
% Loop through each feature to perform t-test
for feature = 1:size(data, 2)
    [h, p, ci, stats] = ttest2(data(group1_idx, feature), data(group2_idx, feature), 'Vartype', 'unequal');
    t_scores(feature) = stats.tstat;
    p_values(feature) = p;
end

You may refer to the following documentation to read more about the arguments of 'ttest2' function and one-way ANOVA which should be more suitable for your analysis:

Hope it helps.

1 Comment
Show -1 older commentsHide -1 older comments

Hussein on 9 May 2024

@Ayush Aniket

It really helps. Thank you very much for your clarification and introducing the ANOVA technique. Greatly appreciated.

Sign in to comment.

When applying the Simple Filter Approach (t-test) for feature selection, if all features have p-values of 0, does it mean that all features have strong discrimination power?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

When applying the Simple Filter Approach (t-test) for feature selection, if all features have p-values of 0, does it mean that all features have strong discrimination power?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments