MATLAB Based Algorithm Wins the 2017 PhysioNet/CinC Challenge to Automatically Detect Atrial Fibrillation

“I don’t think MATLAB has any strong competitors for signal processing and wavelet analysis. When you add in its statistics and machine learning capabilities, it’s easy to see why nonprogrammers enjoy using MATLAB, particularly for projects that require combining all these methods.”

Challenge

Design an algorithm that uses machine learning to detect atrial fibrillation and other abnormal heart rhythms in noisy, single-lead ECG recordings

Solution

Use MATLAB to analyze ECG data, extract features using signal processing and wavelet techniques, and evaluate different machine learning algorithms to train and implement a best-in-class classifier to detect AF

Results

  • First place in PhysioNet/CinC Challenge achieved
  • ECG data visualized in multiple domains
  • Feature extraction accelerated with parallel processing
Block diagram for Tampere’s atrial fibrillation detection algorithm.

Block diagram for Black Swan’s atrial fibrillation detection algorithm.

Every year, participants in the PhysioNet/Computing in Cardiology (CinC) Challenge compete to develop algorithms for patient monitoring and diagnostic applications that analyze physiological signals such as ECG recordings. In the 2017 challenge, participants needed to develop an algorithm that classifies single-lead ECG recordings into one of four categories: normal sinus rhythm, atrial fibrillation (AF), an alternative rhythm, or too noisy to be classified. An algorithm capable of consistently and accurately classifying single-lead recordings could improve clinical outcomes by making it easier to detect AF via handheld devices that could be used daily, rather than via tests conducted in a clinical setting. Early detection of AF is vital: According to PhysioNet, more than an estimated 12 million North Americans and Europeans have AF, which is associated with significant mortality and morbidity.

Placing first the PhysioNet/CinC Challenge 2017 was “Black Swan,” an international team of researchers led by Morteza Zabihi, a biomedical engineering postgraduate at Tampere University of Technology, and Ali Bahrami Rad, a postdoctoral researcher at the University of Tampere (currently at Aalto University). The classification algorithms that Zabihi, Bahrami Rad, and colleagues developed in MATLAB® combine signal processing using wavelets, statistical analysis, and machine learning.

“One of the strengths of our solution is that we consider signal processing techniques as important as machine learning, and so MATLAB was a natural choice for us,” says Bahrami Rad. “MATLAB helped us implement our ideas as fast as possible even though our background is signal processing and electrical engineering, not programming.”

Challenge

Zabihi, Bahrami Rad, and their teammates needed to analyze ECG waveforms to detect AF. Clinical ECG recording systems capture waveforms via multiple leads placed close to the heart. Handheld ECG monitoring devices typically have a single lead and capture waveforms at or near the fingers, producing significantly noisier signal recordings. As a result, existing AF detection algorithms cannot be applied effectively to data collected with handheld devices. Further complicating the challenge, the ECG recordings provided by PhysioNet/CinC were significantly shorter than recordings normally used in AF detection.

The Black Swan team wanted to create a machine learning algorithm that used information about the P waves and QRS waveforms that characterize normal heartbeats in an ECG. The algorithm would incorporate waveform features gleaned from the dataset provided by PhysioNet/CinC to identify normal, AF, and other rhythms in ECG waveforms.

Solution

Zabihi, Bahrami Rad, and their team used MATLAB to visualize ECG data, extract and select features, evaluate machine learning algorithms, and implement a random forest classifier.

The team imported the challenge dataset (8528 single-lead ECG recordings with a sampling frequency of 300 Hz) into MATLAB and then used Signal Processing Toolbox™ and Wavelet Toolbox™ to analyze and process signals in time and frequency and to generate histograms, scatter plots, and spectrograms for studying the characteristics of the signals. They used kernel density estimation to examine probability densities, and searched for patterns in the detail coefficients at various levels of wavelet decomposition. They applied several filters to reduce high-frequency, low-frequency, and nonlinear noise.

Next, the team extracted a set of almost 500 features from the data, including the average intervals between R peaks, variance of P wave amplitudes, and slopes and angles of P, QRS, and T waves. Additional features were based on power spectral densities of each beat and on Shannon, Tsallis, and Rényi entropies of wavelet coefficients.

The team augmented these base-level features with meta-level features (e.g., class posterior probabilities) generated by linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and random forest classifiers from Statistics and Machine Learning Toolbox™.

To reduce processing times during feature extraction, the team used Parallel Computing Toolbox™ to perform the computations concurrently on a workstation with 48 processor cores.

Next, the team used a second random forest classifier to rank the extracted features based on the out-of-bag error analysis and cross-entropy impurity measure of each node. From the ranked list of features, they selected a subset of 150 to be used for classifier training.

They used Statistics and Machine Learning Toolbox and Deep Learning Toolbox™ to evaluate the performance of several machine learning algorithms, including LDA, logistic regression, support vector machine (SVM), neural networks, and a random forest classifier. The trained random forest classifier yielded the best results for the features they had extracted, enabling them to achieve an overall score of 81.9% (based on a F1 score) on the training dataset with 10-fold cross-validation.

Results

  • First place in PhysioNet/CinC Challenge achieved. The team’s classifier achieved an overall score of 82.6% on previously unseen testing data. This score earned them a first-place tie with three other teams.
  • ECG data visualized in multiple domains. “MATLAB made it very easy to visualize the data we were working with by plotting the signals in different domains,” Zabihi says. “Those visualizations provided us with insights that we needed in planning feature extraction and our next steps.”
  • Feature extraction accelerated with parallel processing. “Performing feature extraction in parallel on a multicore workstation with the parfor function saved us a lot of time,” says Zabihi. “That was important to us because we were working on several other projects at the same time and I was finalizing my Ph.D. thesis, so we had to make the most of our available time for the competition.”

Acknowledgements

The team would like to thank Professor Moncef Gabbouj, Professor Serkan Kiranyaz, and Professor Aggelos K. Katsaggelos for their support and encouragement during the competition.