Main Content

mspeaks

Convert raw peak data to peak list (centroided data)

Syntax

Peaklist = mspeaks(X, Intensities)
[Peaklist, PFWHH] = mspeaks(X, Intensities)
[Peaklist, PFWHH, PExt] = mspeaks(X, Intensities)
mspeaks(X, Intensities, ...'Base', BaseValue, ...)
mspeaks(X, Intensities, ...'Levels', LevelsValue, ...)
mspeaks(X, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...)
mspeaks(X, Intensities, ...'Multiplier', MultiplierValue, ...)
mspeaks(X, Intensities, ...'Denoising', DenoisingValue, ...)
mspeaks(X, Intensities, ...'PeakLocation', PeakLocationValue, ...)
mspeaks(X, Intensities, ...'FWHHFilter', FWHHFilterValue, ...)
mspeaks(X, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...)
mspeaks(X, Intensities, ...'HeightFilter', HeightFilterValue, ...)
mspeaks(X, Intensities, ...'ShowPlot', ShowPlotValue, ...)
mspeaks(X, Intensities, ...'Style', StyleValue, ...)

Description

Peaklist = mspeaks(X, Intensities) finds relevant peaks in raw, noisy peak signal data, and creates Peaklist, a two-column matrix, containing the separation-axis value and intensity for each peak. X is a vector of separation-unit values for a set of signals with peaks. Intensities is a matrix of intensity values for a set of peaks that share the same separation-unit range.

[Peaklist, PFWHH] = mspeaks(X, Intensities) returns PFWHH, a two-column matrix indicating the left and right locations of the full width at half height (FWHH) markers for each peak. For any peak not resolved at FWHH, mspeaks returns the peak shape extents instead. When Intensities includes multiple signals, then PFWHH is a cell array of matrices.

[Peaklist, PFWHH, PExt] = mspeaks(X, Intensities) returns PExt, a two-column matrix indicating the left and right locations of the peak shape extents determined after wavelet denoising. When Intensities includes multiple signals, then PExt is a cell array of matrices.

mspeaks(X, Intensities, ...'PropertyName', PropertyValue, ...) calls mspeaks with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Enclose each PropertyName in single quotation marks. Each PropertyName is case insensitive. These property name/property value pairs are as follows:

mspeaks(X, Intensities, ...'Base', BaseValue, ...) specifies the wavelet base.

mspeaks(X, Intensities, ...'Levels', LevelsValue, ...) specifies the number of levels for the wavelet decomposition.

mspeaks(X, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...) specifies the method to estimate the threshold, T, to filter out noisy components in the first high-band decomposition (y_h).

mspeaks(X, Intensities, ...'Multiplier', MultiplierValue, ...) specifies the threshold multiplier constant.

mspeaks(X, Intensities, ...'Denoising', DenoisingValue, ...) controls the use of wavelet denoising to smooth the signal. Choices are true (default) or false.

mspeaks(X, Intensities, ...'PeakLocation', PeakLocationValue, ...) specifies the proportion of the peak height to use to select the points used to compute the centroid separation-axis value of the respective peak. PeakLocationValue must be a value ≥ 0 and ≤ 1. Default is 1.0.

mspeaks(X, Intensities, ...'FWHHFilter', FWHHFilterValue, ...) specifies the minimum full width at half height (FWHH), in separation units, for reported peaks. Peaks with FWHH below this value are excluded from the output list Peaklist.

mspeaks(X, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...) specifies the minimum distance, in separation units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. Increase this filter value to join oversegmented peaks into a single peak.

mspeaks(X, Intensities, ...'HeightFilter', HeightFilterValue, ...) specifies the minimum height for reported peaks. Peaks with heights below this value are excluded from the output list Peaklist.

mspeaks(X, Intensities, ...'ShowPlot', ShowPlotValue, ...) controls the display of a plot of the original and the smoothed signal, with the peaks included in the output matrix Peaklist marked.

mspeaks(X, Intensities, ...'Style', StyleValue, ...) specifies the style for marking the peaks in the plot.

mspeaks finds peaks in data from any separation technique that produces signal data, such as spectroscopy, nuclear magnetic resonance (NMR), electrophoresis, chromatography, or mass spectrometry.

Input Arguments

X

Vector of separation-unit values for a set of signals with peaks. The number of elements in the vector equals the number of rows in the matrix Intensities. The separation unit can quantify wavelength, frequency, distance, time, or m/z depending on the instrument that generates the signal data.

Intensities

Matrix of intensity values for a set of peaks that share the same separation-unit range. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The number of rows equals the number of elements in vector X.

BaseValue

Integer from 2 to 20 that specifies the wavelet base.

Default: 4

LevelsValue

Integer from 1 to 12 that specifies the number of levels for the wavelet decomposition.

Default: 10

NoiseEstimatorValue

Character vector, string, or scalar that specifies the method to estimate the threshold, T, to filter out noisy components in the first high-band decomposition (y_h). Choices are:

  • mad — Default. Median absolute deviation, which calculates T = sqrt(2*log(n))*mad(y_h) / 0.6745, where n = the number of rows in the Intensities matrix.

  • std — Standard deviation, which calculates T = std(y_h).

  • A positive real value.

MultiplierValue

Positive real value that specifies the threshold multiplier constant.

Default: 1.0

DenoisingValue

Controls the use of wavelet denoising to smooth the signal. Choices are true (default) or false.

Tip

If your data was previously smoothed, for example, with the mslowess or mssgolay function, you do not need to use wavelet denoising. Set this property to false.

PeakLocationValue

Value that specifies the proportion of the peak height to use to select the points to compute the centroid separation-axis value of the respective peak. The value must be ≥ 0 and ≤ 1.

Note

When PeakLocationValue = 1.0, the peak location is at the maximum of the peak. When PeakLocationValue = 0, mspeaks computes the peak location with all the points from the closest minimum to the left of the peak to the closest minimum to the right of the peak.

Default: 1.0

FWHHFilterValue

Positive real value that specifies the minimum full width at half height (FWHH), in separation units, for reported peaks. Peaks with FWHH below this value are excluded from the output list Peaklist.

Default: 0

OverSegmentationFilterValue

Positive real value that specifies the minimum distance, in separation units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. Increase this filter value to join oversegmented peaks into a single peak.

Default: 0

HeightFilterValue

Positive real value that specifies the minimum height for reported peaks.

Default: 0

ShowPlotValue

Controls the display of a plot of the original signal and the smoothed signal, with the peaks included in the output matrix Peaklist marked. Choices are true, false, or I, an integer specifying the index of a spectrum in Intensities. If set to true, the first spectrum in Intensities is plotted. Default is:

  • false — When you specify return values.

  • true — When you do not specify return values.

StyleValue

Character vector or string specifying the style for marking the peaks in the plot. Choices are:

  • 'peak' (default) — Places a marker at the peak crest.

  • 'exttriangle' — Draws a triangle using the peak crest and the extents.

  • 'fwhhtriangle' — Draws a triangle using the peak crest and the FWHH points.

  • 'extline' — Places a marker at the peak crest and vertical lines at the extents.

  • 'fwhhline' — Places a marker at the peak crest and a horizontal line at FWHH.

Output Arguments

Peaklist

Two-column matrix where each row corresponds to a peak. The first column contains separation-unit values (indicating the location of peaks along the separation axis). The second column contains intensity values. When Intensities includes multiple signals, then Peaklist is a cell array of matrices, each containing a peak list.

PFWHH

Two-column matrix indicating the left and right locations of the full width at half height (FWHH) markers for each peak. For any peak not resolved at FWHH, mspeaks returns the peak shape extents instead. When Intensities includes multiple signals, then PFWHH is a cell array of matrices.

PExt

Two-column matrix indicating the left and right locations of the peak shape extents determined after wavelet denoising. When Intensities includes multiple signals, then PExt is a cell array of matrices.

Examples

  1. Load a MAT-file, included with the Bioinformatics Toolbox™ software, that contains two mass spectrometry data variables, MZ_lo_res and Y_lo_res. MZ_lo_res is a vector of m/z values for a set of spectra. Y_lo_res is a matrix of intensity values for a set of mass spectra that share the same m/z range.

    load sample_lo_res
  2. Adjust the baseline of the eight spectra stored in Y_lo_res.

    YB = msbackadj(MZ_lo_res,Y_lo_res);
  3. Convert the raw mass spectrometry data to a peak list by finding the relevant peaks in each spectrum.

    P = mspeaks(MZ_lo_res,YB);
  4. Plot the third spectrum in YB, the matrix of baseline-corrected intensity values, with the detected peaks marked.

    P = mspeaks(MZ_lo_res,YB,'SHOWPLOT',3);

  5. Smooth the signal using the mslowess function. Then convert the smoothed data to a peak list by finding relevant peaks and plot the third spectrum.

    YS = mslowess(MZ_lo_res,YB,'SHOWPLOT',3);

    P = mspeaks(MZ_lo_res,YS,'DENOISING',false,'SHOWPLOT',3);

  6. Use the cellfun function to remove all peaks with m/z values less than 2000 from the eight peaks listed in output P. Then plot the peaks of the third spectrum (in red) over its smoothed signal (in blue).

    Q = cellfun(@(p) p(p(:,1)>2000,:),P,'UniformOutput',false);
    figure
    plot(MZ_lo_res,YS(:,3),'b',Q{3}(:,1),Q{3}(:,2),'rx')
    xlabel('Mass/Charge (M/Z)')
    ylabel('Relative Intensity')
    axis([0 20000 -5 95])

Algorithms

mspeaks converts raw peak data to a peak list (centroided data) by:

  1. Smoothing the signal using undecimated wavelet transform with Daubechies coefficients

  2. Assigning peak locations

  3. Estimating noise

  4. Eliminating peaks that do not satisfy specified criteria

References

[1] Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., and Kobayash, R. (2005) Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinfomatics 21:9, 1764–1775.

[2] Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., and Feng, Z. (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4:3, 449–463.

[3] Donoho, D.L., and Johnstone, I.M. (1995) Adapting to unknown smoothness via wavelet shrinkage. J. Am. Statist. Asso. 90, 1200–1224.

[4] Strang, G., and Nguyen, T. (1996) Wavelets and Filter Banks (Wellesley: Cambridge Press).

[5] Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.C., and Kuerer, H.M. (2005) Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5(16), 4107–4117.

Version History

Introduced in R2007a