ECG Segmentation and Classification using PQRST

Hello MATLAB community,
I'm working on ECG classification using MATLAB, and my data is stored in CSV format with 13 columns (time + 12 leads). I'm interested in extracting PQRST segments using the segmentation method described in this MATLAB code: ECG Segmentation and Filtering.
I have a few questions regarding preprocessing:
  • Should I apply low-pass filters to all recordings to enhance signal quality?
  • I intend to retain various types of noise in the data to ensure model robustness. What are your recommendations on managing noise while preserving the integrity of PQRST segments?
Additionally, I'm curious about segmentation approach:
  • Should I perform segmentation separately for each of the 12 leads, or should I merge them before segmentation? What would be the pros and cons of each approach?
Here an example of an ECG data:
Any insights or suggestions on adapting the segmentation method and handling data preprocessing would be greatly appreciated. Thank you!

 Accepted Answer

Should I apply low-pass filters to all recordings to enhance signal quality? ECG frequencies fall in the range of 0.5 to 150 Hz. If your sampling rate is above 300 Hz, you should first add a lowpass filter with cutoff frequency at 150 Hz. However, you should always take a look at the spectrum of your signal to detect if it has unwanted artifacts. In case you decide to filter your signals, you must filter them all with the same filter to mantain uniformity in your results.
What are your recommendations on managing noise while preserving the integrity of PQRST segments? It all comes to visualizing a clean spectrum. Take a look at the following code.
ecg = ecg_signal % One lead only
fs = 125; % Sample rate
t = (0:numel(ecg)-1)/fs;
figure
subplot(2,1,1)
plot(t,ecg)
title('Raw ECG')
xlabel('Time (s)')
ylabel('Amplitude')
xlim([0 5])
subplot(2,1,2)
pspectrum(ecg,fs,'Leakage',1)
title('Spectrum')
This ECG seems to have line noise, based on the spike that shows at 60 Hz. The other spikes are due to the fundamental frequency (FF) of the ECG ranging from 1 to 1.67 Hz. The frequency at which the first spike appears is the FF and can be used to obtain the heart rate (HR = 60*FF). The other spikes show FF harmonics, which are just repetitions of the first spike that are spaced over the spectrum with a uniform separation of its FF.
In other words, the FF of the figure above is 1.5 Hz, so the next spikes appear at 3, 4.5, 6 Hz and so on. You should see its amplitude reduce as the frequency increases, so you must not filter out these spikes.
In this case, it is recommended to apply a lowpass filter with a cutoff frequency below 60 Hz, or a bandstop filter rejecting this unwanted frequency.
To compare both
ecg_filt = lowpass(ecg,50,fs);
figure
subplot(2,1,1)
plot(t,ecg_filt)
title('Filtered ECG (Lowpass fc = 50 Hz)')
xlabel('Time (s)')
ylabel('Amplitude')
xlim([0 5])
subplot(2,1,2)
pspectrum(ecg_filt,fs,'Leakage',1)
title('Spectrum')
ecg_filt2 = bandstop(ecg,[59 61],fs);
figure
figure
subplot(2,1,1)
plot(t,ecg_filt2)
title('Filtered ECG (Bandstop fp = 60 Hz)')
xlabel('Time (s)')
ylabel('Amplitude')
xlim([0 5])
subplot(2,1,2)
pspectrum(ecg_filt2,fs,'Leakage',1)
title('Spectrum')
As you can see, after filtering with both types of filter, the quality of the signal improves a lot, and the PQRST waves appear properly in both cases (this is a healty subject). It's up to you to determine if your signals need extra filtering or not.
Should I perform segmentation separately for each of the 12 leads, or should I merge them before segmentation? What would be the pros and cons of each approach? You should apply the segmentation individually for each of the leads of your interest. Each lead considers a different electrode placement at the moment of capturing the ECG, and these changes in position translates into obtaining different information from each of them. There are no pros but cons to merging the 12 leads into one signal. You loose relevant information. In certain pathologies, some waves have abnormal amplitudes, and these changes are best appreciated only in certain leads. This is why the ECG is taken with many leads in the first place.
Appart from that, you should keep the procedure of segmentation uniform, so that your segmentations come out consistent.
Regards,
Diego.

9 Comments

Thanks a lot @Diego Caro and @Umar
In the present dataset, there are 4 types of noises (electrodes problems, burst noise, baseline drift, and static noise). Each lead contains some of these noises (e.g., Lead I contains burst noise, baseline drift, or both). Should I remove these noises? Does it affect the signal quality? Is it recommended to leave them in to train the model on data quality similar to reality, resulting in a more robust model? What are your recommendations?
Thanks a lot,
Hi Rawaa,
Biomedical signal processing requires careful consideration when it comes to handling noise. The accuracy and reliability of the analysis depend on effectively managing different types of noises that can affect the signal quality.
When it comes to noises, there are several types that need to be addressed. Electrode problems can lead to artifacts in the signal, burst noise can cause sudden spikes or disturbances, baseline drift can obscure the actual signal dynamics, and static noise may interfere with the signal of interest.
Removing noises from the dataset can significantly enhance the signal quality by reducing interference and improving subsequent analysis. However, keeping some noises in the dataset can help the model adapt to real-world scenarios, potentially making it more robust during deployment.
It's important to consider applying noise reduction techniques tailored to each type of noise before training the model. For example, using median filtering for burst noise removal or high-pass filtering for baseline drift can be effective strategies.
If retaining noises for robustness is preferred, augmenting the dataset by introducing synthetic noise variations during training can simulate real-world conditions. It's also crucial to evaluate model performance with and without noise removal to assess its impact on accuracy, robustness, and generalization.
Ultimately, a hybrid strategy that involves removing some noises while retaining others may be necessary to strike a balance between signal fidelity and model resilience. Experimentation with different approaches and thorough evaluation will help determine the optimal strategy for enhancing signal quality and model robustness based on specific requirements and desired characteristics of the model.
@Umar Thanks a lot ,
So, the most recommended approach is a hybrid one where you retain some noises and remove others? For example, removing 20% of noise from the dataset? I'm working with the PTB-XL dataset (PTB-XL 1.0.3)
there are several leads that have more than one type of noise. How should I handle this?
Do you have any ideas on how to visualize and detect static noise? It's still unclear as a noise type; how do I remove it? Regarding leads with more than one type of noise, how should I treat them? Is it advisable to apply a low-pass filter to the entire dataset? I haven't found any preprocessing details for this dataset in the literature. I've tried several architectures with noisy data at 72% (accuracy) , and honestly, this is the first time I'm working with this type of data.
Hi Rawaa,
I was not aware with your project and dataset, however, when dealing with noise in the PTB-XL dataset, a hybrid approach can be beneficial. Retaining some noises while removing others, such as eliminating 20% of noise, can help maintain data integrity.
Now, to handle leads with multiple noise types, consider separating and treating each type individually for better noise reduction. Visualizing and detecting static noise can be achieved through signal processing techniques like spectrograms or wavelet transforms. Applying a low-pass filter to the dataset can help in noise suppression, but it's essential to analyze its impact on the data's characteristics.
I can relate to that when first time dealing with something new and innovative. But after sharing your thoughts, it sounds like you are on the right path. So, I wish you good luck in your future endeavors. You also sound pretty confident and you can accomplish your goals.
Let us know if we can assist you further.
@Umar Thank you for your kind words and support, as well as your advice and guidance. So, to visualize it, should I apply wavelet transforms? Are there any recommendations for this aspect? Because I want to see the noise in the signal (for example, circles indicating where the noise is located)?
Hi Rawaa,
They are particularly useful for analyzing signals with non-stationary characteristics, making them suitable for detecting and visualizing noise in signals. For more information, please refer to https://www.mathworks.com/discovery/wavelet-transforms.html
To begin with, you can use the Continuous Wavelet Transform (CWT) or the Discrete Wavelet Transform (DWT) in Matlab to analyze your signal. The CWT provides a time-frequency representation of the signal, which can help in identifying noise components that vary in time. On the other hand, the DWT decomposes the signal into different frequency bands, allowing you to analyze the signal at different scales.
So, as an example, generate a sample signal with noise. Then apply the DWT using the wavedec function with a specified wavelet and decomposition level.
By visualizing the DWT coefficients, you can identify significant coefficients that correspond to noise in the signal. Adjusting the threshold allows you to detect noise locations, which can be highlighted by plotting circles or markers on the signal plot.
Remember to fine-tune the wavelet type, decomposition level, and threshold based on the characteristics of your signal and the noise you are trying to detect. Experimenting with different wavelets and parameters can help optimize the visualization and detection of static noise in your signals.
Helle again @Umar
i use this method :
for m = 1:size(leads, 2)
denoised_leads(:, m) = wdenoise(leads(:, m), 'Wavelet', 'bior3.1', 'DenoisingMethod', 'SURE', 'ThresholdRule', 'Soft', 'NoiseEstimate', 'LevelDependent');
end
here some examples , is that true ?
a try also level independent method using sym4. here is the result :
thanks a lot
Hi Rawaa,
You asked, i use this method : for m = 1:size(leads, 2) denoised_leads(:, m) = wdenoise(leads(:, m), 'Wavelet', 'bior3.1', 'DenoisingMethod', 'SURE', 'ThresholdRule', 'Soft', 'NoiseEstimate', 'LevelDependent'); end here some examples , is that true ?
The code snippet you provided is correct for performing wavelet denoising in MATLAB using the 'wdenoise' function. It applies the 'bior3.1' wavelet, uses the 'SURE' denoising method, employs a 'Soft' threshold rule, and estimates noise in a level-dependent manner. The loop iterates over each column of the 'leads' matrix, denoising them individually. If your intention is to denoise each column of 'leads' using the specified wavelet and denoising parameters, then the code is appropriate.
thanks for your help ,
Is that good ? the result ?
I used both bior3.1 and sym4 , I wanna choose the best between them ( there are examples for both )
which one do you recommand ?

Sign in to comment.

More Answers (3)

Hi Rawaa,
For ECG data preprocessing, try to apply low-pass filters which will enhance signal quality by removing high-frequency noise. However, retaining some noise types can improve model robustness. To manage noise while preserving PQRST integrity, consider using adaptive filters or wavelet denoising techniques.
Regarding segmentation, you can choose to segment each of the 12 leads separately or merge them before segmentation. Segmentation per lead allows for lead-specific analysis but may require more computational resources. Merging leads simplifies processing but may overlook lead-specific characteristics. Experiment with both approaches to determine the best fit for your dataset and analysis goals.
Remember to validate the segmentation method's effectiveness by comparing results with ground truth annotations.
Hope this will help resolve your problem.
Hi Rawaa,
I have already read the technical articles for both of them, so let me delve into the specifics of the bior3.1 and sym4 wavelet families to help you make an informed decision.
bior3.1 Wavelet Family
The bior3.1 wavelet belongs to the Biorthogonal wavelet family. Biorthogonal wavelets have the advantage of providing a good compromise between time and frequency localization. The bior3.1 wavelet has three vanishing moments in the wavelet function and one vanishing moment in the scaling function. This property makes it suitable for applications where preserving sharp transitions in the signal is essential. The bior3.1 wavelet is known for its ability to capture abrupt changes in the signal efficiently.
sym4 Wavelet Family
On the other hand, the sym4 wavelet is part of the Symlet wavelet family. Symlet wavelets are designed to be symmetric with a compact support, making them suitable for denoising applications and signal compression. The sym4 wavelet, in particular, has four vanishing moments in the wavelet function, allowing it to represent polynomial signals accurately. It is known for its effectiveness in denoising applications where preserving signal smoothness is crucial.
Choosing the Best Wavelet Family
To determine the best wavelet family between bior3.1 and sym4, you need to consider the specific characteristics of your signal and the requirements of your signal processing task. If your signal contains sharp transitions or discontinuities that need to be preserved, the bior3.1 wavelet may be more suitable due to its ability to capture abrupt changes effectively. On the other hand, if your signal is smooth and you are focusing on denoising or compression tasks, the sym4 wavelet from the Symlet family might be a better choice.
Again, the final decision needs to be made by you. I still wish you good luck with this project. Hope, you get A plus.

1 Comment

Hello @Umar
I test different wavelet with different thersholds. I think bior3.9 with adaptive is the best one ?

Sign in to comment.

Hi @rawaa mejri,
What makes you think bior3.9 with adaptive is the best one ? Provide your analysis with brief answer.

4 Comments

hello @Umar
As shown in the table, bior3.9 with adaptive thresholding has the highest SNR (39.73 dB), indicating excellent denoising performance.
Preservation of Signal Characteristics:A high SNR means that the preprocessing is effective in eliminating noise while preserving the essential characteristics of the ECG signal.
Comparison with Other Methods:
Other methods like bior2.8, db8, db4, and sym6 also show good results with high SNRs but are slightly lower than bior3.9.
The sym6 and sym8 methods with adaptive thresholding yield very close results (39.60 dB and 39.57 dB, respectively), which confirms that bior3.9 wavelets are slightly superior in this particular case.
Thanks for your response!
@ rawaa mejri, this is what I was expecting from you. You did it. At mathworks, my goal is to make sure when students are having issues with problems, they should be capable of solving problems on their own by getting clues from us, and more research, experimenting and making mistakes you will do, the more you will learn. However, if you still have any questions for me, please let me know, I will be more happy to help.
Thank you very much. With your valuable assistance and advice, I was able to solve this problem.
As you confirmed the preprocessing, now I can proceed to the next step.
Thanks a lot
No problem @ rawaa mejri, glad to help out. If you need further assistance for next step, please don’t hesitate to ask for help.

Sign in to comment.

Asked:

on 29 Jun 2024

Commented:

adi
on 16 Mar 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!