Determine Threshold in Graph

Hi all, I want to split my data file into two data files. One with the data of the peaks (above the threshold) and one with the remaining data (under the threshold). Like this:
I have got hundreds of data files so I want to split them in a for loop. My problem is that every dataset has a different threshold value:
All data files have got the same shape, so I wonder if it is possible to let matlab determine the threshold value. My problem is that I can't find the good way to determine the threshold value. Can someone help me out with this?
Thanks, Daan

 Accepted Answer

To threshold a signal where the "baseline" varies, I think you should take the histogram. Then compute the PDF and cdf. Compute the cdf with cumsum(). If you know that the peaks always occupy, say 3% of the signal, then you can find the gray level of that from the cdf.
Histogram has changed recently. What version of MATLAB do you have? If you have the latest one, use histogram() or histcounts(). If earlier than about R2014b then use hist() or histc().
[counts, binCenters] = hist(data, 100);
cdf = cumsum(counts);
cdf = cdf / cdf(end);
thresholdIndex = find(cdf>0.97);
thresholdLevel = binCenters(thresholdIndex);
logicalIndexes = data > thresholdLevel; % Indexes of peaks.

6 Comments

Thanks for the quick answer. Could you explain why to use a histogram to determine a threshold value? I work with Matlab 2011a.
It gives you the distribution of signal values. How can you find out that the majority of the signal values are around -1 on the top plot, or -3.5 on the bottom plot? The histogram - that's how. If you don't know where the baseline is, how can you determine where to threshold it? You could just take the mean of the whole thing, which might be reasonable if the spikes did not take up to great a percentage of time. Or you could examine the histogram and find the baseline and go up a bit from there. With the histogram method, the threshold does not depend on how wide the spikes are (within reason). Take the histogram of your signal and plot it with a bar chart or line plot. Then plot a vertical line at the mean plus a standard deviation or two and see if it looks like it places it at the right place in the plot, like in the "corner" of a log-normal type of distribution. Attach your data if you want help.
Andrew Park
Andrew Park on 27 Sep 2020
Edited: Andrew Park on 27 Sep 2020
Hello @Image Analyst, this post is 5 years old but I have a question about your CDF approach above.
What is the difference between taking the CDF and then filtering out indices at top 3% (your approach above) vs. sorting the whole data in descending order then taking the first 3% of the data? Thanks!
They should be close. The sorting approach will be more accurate (closer to the actual 3% value) but will take more time than the histogram approach because you are sorting thousands of elements versus just sticking them in a histogram bin. Sorting takes longer than simply incrementing the count (incrementing an array value). But with sorting you could get a more accurate value and with a histogram you can only get as close as the bin width.
Andrew Park
Andrew Park on 28 Sep 2020
Edited: Andrew Park on 28 Sep 2020
Thank you for the answer. I'm not sure if I should make this as a whole separate post, so I'll ask you here first.
If I don't know how much percentage of the total signal the peaks occupy, what would be the best method to set the percentage? For instance, I have graphs that have varying number of peaks, so I don't know how many there are until the algorithm scans the signal once to count them - which will take additional time. As of now, I'm just making an assumption that the 100th largest peak height will suffice as the threshold value.
(If you think I should post this separately, I'll do that.)
Image Analyst
Image Analyst on 28 Sep 2020
Edited: Image Analyst on 28 Sep 2020
Yes, post separately after reading this link so you'll know to do certain things, like including any scripts and data, and a screenshot.

Sign in to comment.

More Answers (0)

Asked:

on 5 Oct 2015

Edited:

on 28 Sep 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!