Zone identification on curve

25 views (last 30 days)
Jagnoux Léo on 1 Dec 2020
Commented: Jagnoux Léo on 1 Dec 2020
How can I automatically identify the start and end of a staircase function?
As in the following figure, I would like to automatically identify the beginning and the end (the whole part in red). It is important to note that there are several areas to identify on the same vector.
I have tried with the function "find" and "min" and "max" but it does not work perfectly when the data changes.
Should a function be able to identify a staircase function?

KSSV on 1 Dec 2020
it would be better if you share data.
Jagnoux Léo on 1 Dec 2020
Here is the data, this is an example.

Star Strider on 1 Dec 2020
Try this:
var = D.var;
x = 0:numel(var)-1;
varfilt = sgolayfilt(var, 5, 271);
Lmx = find(islocalmax(varfilt, 'MinSeparation',900));
Lmn = find(islocalmin(varfilt, 'MinSeparation',900));
figure
plot(x, var)
hold on
% plot(x, varfilt)
for k = 1:numel(Lmn)
idxrng = Lmn(k):Lmx(k);
plot(x(idxrng), var(Lmn(k):Lmx(k)),'-r', 'LineWidth',2)
end
% plot(x(Lmx), var(Lmx), '^r')
% plot(x(Lmn), var(Lmn), 'vr')
hold off
grid
producing this plot:
That is likely as good as it is possible to get. I left other graphics calls in the code (commented-out) so if you un-comment them, you can see how the code works in more detail.
.

John D'Errico on 1 Dec 2020
Should a function automatically identify a staircase function? No. That can be quite difficult, especially if your staircase is, let me call it, atypical. And everything always seems to be atypical.
Locating regions where your function is constant might be as simple as looking for sub-sequences of points where the function deviates from a constant by no more than some maximum tolerance. But even that can be challenging to find where the stairs end and to find the locations of transitions, especially if there may be a gradual trend mixed in there. For every simple algorithm to find such a set of stairs, again, I can probably pose a problem case to make it fail.
Thinking... (let me see what I can play around with.)

John D'Errico on 1 Dec 2020
Sigh. Looks like I was too slow. Oh well. I've attached the code I wrote. It seems to work pretty well, looking for segments in a sequence where the function is constant to within a tolerance.
In the example I give below, since I add gaussian noise to the steps, I used a tolerance of 6. This indicates the distance between max and min within any step it finds, since +/- 3*sigma should be adequate.
seq = floor((0:100)/10)*10 + randn(1,101);
[regions,startstop] = stairfit(seq,6,3)
regions =
Columns 1 through 21
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
Columns 22 through 42
3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5
Columns 43 through 63
5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7
Columns 64 through 84
7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9
Columns 85 through 101
9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 0
startstop =
1 10
11 20
21 30
31 40
41 50
51 60
61 70
71 80
81 90
91 100
As you can see, it found exactly 10 steps in the sequence, exactly where they should be.
help stairfit
stairfit - locate regions in a sequence where a curve is piecewise constant, within a given tolerance
usage: regions = stairfit(xseq,tol)
arguments: (input)
xseq - vector containing data
tol - maximum amount the curve can deviate from a constant for a point
to be considered to lie on any given stair.
treadmin - (optional) the minimum length of any sub-sequence to
be considered a step. If provided, treadmin must be a positive
Default: 3
arguments: (output)
regions - vector of the same size as xseq. regions will be 0 between
stairs, and in the region of a stair, will contain an index (n) to
denote the nth stair.
startstop - an Rx2 array, where the first column indicates the starting
index of a step, and the second column indicates the final index of
that step.
Such is life. :)
Jagnoux Léo on 1 Dec 2020
Thank you for your time and response ! :)