Moving window with cumulative sum less than a ref value and excluding certain windows

2 views (last 30 days)
I have an array of 36000 cells, i want to break those arrays into windows of cumulative sum less than a reference value. and exclude few windows based on other parameters respective to this array based on index. For example A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]; similarly other arrays of same length. Now window 1 should start with 1st value and end at sum less than 2 ( window{1} = [0.5 0.6 .45], sum = 1.55) second window has to start from 2nd value ( window{2} = [0.6 .45 .56 .056 .16], sum = 1.826). Similarly i need all possible windows for 36000 cells, while excluding those where value/sum of other arrays/windows of respective index. Index of these windows is also needed. I here wanted a moving window. Eg Index of my windows would be like [ 1 3] , [2 6], [3 9], [4 10], [5 11], [6 11], [ 7 11], [ 8 11], [9 11], [10, 11].
  • If i don't wont to consider values in index 3,4, 8, i want to exclude windows which include these indexes.
  • If i want to exclude windows whose corresponding array average is less than certain value. Ex: I've another B whose size is same as A. after breaking array A into windows, average of array B windows is less than a value (20) has to be excluded.
  3 Comments
Image Analyst
Image Analyst on 16 Jan 2017
RANJITH, please attach some data. I'd like to know if your data is really in cells like you said, or if it's a regular double array. Cell arrays are very slow and inefficient memory hogs - the drawbacks to their being really flexible as to what kind of data they can hold. If you're using cells then I encourage you to see if you can use a standard numerical array, which will speed things up. I can't tell if you're using cells (like MATLAB definition, not Excel's definition) or standard arrays because you're using both braces (like cell arrays use) and brackets (like standard arrays use). But maybe you're using the term cell like Excel uses it - this would be "element" in MATLAB's terminology - I don't know which. Plus, attaching data would help out the people who are trying to solve your question.
RANJITH REDDY KALLURI
RANJITH REDDY KALLURI on 17 Jan 2017
I here attach the data that needs to be breakdown into windows of sum less than 39.12 or a reference value.

Sign in to comment.

Accepted Answer

John BG
John BG on 23 Jan 2017
Edited: John BG on 23 Jan 2017
Ranji
This is John BG <mailto:jgb2012@sky.com jgb2012@sky.com> , all together in one block
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
% masking
mask=[3 4 8];
d=0;
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d);
B(d,:)=[];
reasons why I ask you to consider accepting my answer:
1. my code works, no errors like Jan Simon's Out(end,:)=[36800 36799]
2. my excluding mask section doesn't crash, like Simon's.
3. from 1.7 seconds to 0.5 seconds .. if it's really about cutting down time, C should be the language to use.
4. Ranji, If it's really about speed you may want to compile it, which will require the coder to get it all through C before building the .exe and then functions cumsum and find are going to add many more lines and end up taking more time than the few lines I have supplied.
5. the warning error that Jan Simon claims does not show up in MATLAB version R2016, it may happen in his MATLAB version R2009 but not in up to date MATLAB version.
6. his recommendation ' .. This is always a bad programming practize .. ' is the typical pointless remark that does not contribute to help solve the question.
  1 Comment
Jan
Jan on 25 Jan 2017
Edited: Jan on 26 Jan 2017
Shouldn't the last line be [36800, 36800]? [36799, 36799] ignores the last element.
As said before: Your code crashes, if the input contains trailing zeros:
while A(n2)==0
n2=n2+1;
end
From 1.7 to 0.5 seconds only by using pre-allocation and omit the zeros? This is a nice speed up. Feel free to insert the pre-allocation lines in your code to increase its quality.
The editor of R2016b shows the message in the lines "B=[B;n1 n2-1];" and "d=[d k];" as the orange marks in ths MLint-warning column on the rigt side:
The variable 'B' appears to change size on every loop iteration.
Consider pre-allocation for speed.
When the code for the masking is included also, your code needs 8.46 sec, and my suggestion 0.28 sec. The severe drawback of a missing pre-allocation is the exponential growing of the runtime: If the input array A has the double size, your code needs 55.3 sec already compared to 0.6 of mine. As you can see, pre-allocation is fundamental in programming and it does matter the solution here also. For more details search for "Schlemiel the Painter's algorithm", e.g. https://en.wikipedia.org/wiki/Joel_Spolsky:
In software development, a Shlemiel the painter's algorithm
(sometimes, Shlemiel the painter algorithm) is a method that is
inefficient because the programmer has overlooked some
fundamental issues at the very lowest levels of software design.
Therefore your statement "typical pointless remark" means, that you did not get the point.

Sign in to comment.

More Answers (5)

Andrei Bobrov
Andrei Bobrov on 16 Jan 2017
Edited: Andrei Bobrov on 19 Jan 2017
I'm corrected, thank you Jan Simon.
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
out = zeros(numel(A),2);
n = numel(A);
t = 2;
for ii = 1:n
z = cumsum(A(ii:n)) < t;
out(ii,:) = [ii, find(z,1,'last') + ii - 1] ;
end
  5 Comments
RANJITH REDDY KALLURI
RANJITH REDDY KALLURI on 18 Jan 2017
@ Jan Simon , @Andrei , above code helps me to generate windows. I now need to exclude windows whose average value is less than 0.1473. and the windows has to rearrange.
Andrei Bobrov
Andrei Bobrov on 19 Jan 2017
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
t = 2;
tav = 0.1473;
n = numel(A);
wind = cell(n,1);
out = zeros(n,2);
lo = true(n,1);
for ii = 1:n
jj = ii:n;
tt = cumsum(A(ii:n));
z = tt < t;
iend = find(z,1,'last');
lo(ii) = tt(iend)./jj(iend) >= tav;
out(ii,:) = [ii,iend+ii-1];
wind{ii} = A([ii:iend+ii-1]);
end
wind = wind(lo);
out = out(lo,:);

Sign in to comment.


John BG
John BG on 12 Jan 2017
Edited: John BG on 19 Jan 2017
Let me split the question into 2:
1. generating window indices
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]
B={0}
for k=1:1:numel(A)
s=0;n1=k ;n2=k
while s<2
s=s+sum(A([n1:n2]));
if n2<numel(A)
n2=n2+1;
elseif n2>=numel(A)
n2=numel(A);
end
end
if n2<numel(A)
B=[B [n1 n2-1 ]];
elseif n2>=numel(A)
B=[B [n1 n2]];
end
end
L=zeros(size(B,2)-2,2);
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:);
end
L contains the sought indices
L =
1 3
2 4
3 5
4 7
5 11
6 11
7 11
8 11
9 11
10 11
2.
Adding selectivity.
At this point I'm not sure whether you want to exclude any window that has in example index=3 then L would have rows 1 and 3 removed, or you also would like to have any window that implicitly has index 3, which then would mean that L row 2 would also be excluded
case 1: only excluding indices in given index, example 3:
[i j]=find(L==3)
L(i,:)=[]
L =
2 4
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 2: excluding any window directly or implicitly referring a given index:
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
d=0
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],3)
d=[d k];
end
end
d(1)=[];
L(d,:)=[]
L =
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 1, using a mask:
mask=[3 4 8]
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(L==mask(k))
q=[q i(:)']
end
q(1)=[]
q=sort(q)
L(q,:)=[]
L =
5 11
6 11
7 11
9 11
10 11
case 2, using a mask:
L=zeros(size(B,2)-2,2) % restore L
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
L(d,:)=[]
L =
9 11
10 11
if you find these lines useful would you please mark my answer as Accepted Answer?
To any other reader, if you find this answer of any help please click on the thumbs-up vote link,
thanks in advance for time and attention
John BG
  5 Comments
RANJITH REDDY KALLURI
RANJITH REDDY KALLURI on 16 Jan 2017
Thank you John, the code worked in less than 3 minutes, but when i sum the cells which i've got from array 'L'. but the sum i get is not at all close to 38.19. Can you once check it.
John BG
John BG on 19 Jan 2017
Edited: John BG on 21 Jan 2017
Ranji
This is John BG jgb2012@sky.com
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG

Sign in to comment.


John Chilleri
John Chilleri on 12 Jan 2017
Hello,
I tossed this together real quick so there might be unforeseen errors, but this should work:
% Given A
window = cell(36000,1);
window_index = cell(36000,1);
for i = 1:36000
count = i;
while (sum(A(i:count) < 2))
if (count == 36000)
count = count+1;
break;
end
count = count + 1;
end
window{i} = A(i:count-1);
window_index{i} = i:count-1;
end
% Deleting windows containing undesired indices
Undesired = [];
for i = 1:36000
for j = 1:length(Undesired)
if (sum(window_index{i} == Undesired(j)) == 0)
window{i} = [];
window_index{i} = [];
break;
end
end
end
Note that you need to give A and fill in Undesired. Otherwise it will make the windows empty that have undesirable indices, so if you access them after "removal" they'll be empty.
Hope this helps!
  1 Comment
RANJITH REDDY KALLURI
RANJITH REDDY KALLURI on 13 Jan 2017
Edited: RANJITH REDDY KALLURI on 13 Jan 2017
Thanks John, Your code is taking more than 20 minutes to create windows itself. I here attach the array i was trying to use. And the cumulative sum has to be less than 38.19.

Sign in to comment.


John BG
John BG on 23 Jan 2017
Edited: John BG on 23 Jan 2017
Ranji
This is John BG jgb2012@sky.com
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
.
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
.
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG
  1 Comment
Jan
Jan on 23 Jan 2017
Edited: Jan on 23 Jan 2017
@John BG: Matlab's editor shows an important message for your code: In "B=[B;n1 n2-1]" the array grows iteratively. This is always a bad programming practize. Use a pre-allocation instead:
% Before the loop:
B = zeros(numA, 2);
iB = 0;
% Instead of B=[B;n1 n2-1]:
iB = iB + 1;
B(iB, :) = [n1, n2-1];
% After the loop:
B = B(1:iB, :);
This reduces the runtime on my machine from 2.86 to 1.34 seconds.
The "while A(n2)==0" will fail if A contains trailing zeros.
At least under the old Matlab R2009a I can use currently, creating "A" dynamically by "load Array.mat" increases the runtime by a factor of 50. This allows the JIT acceleration to operate efficiently:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
But this will most likely not matter the code of the OP, because the data are provided as file for the discussion in the forum only.
When the data contain a lot of zeros, removing them will accelerate the processing:
Index = find(A);
A = A(Index);
... as above
B = Index(B); % Get original indices back
This gives an additional boost of 40% on my machine. See my answer.

Sign in to comment.


Jan
Jan on 23 Jan 2017
Edited: Jan on 25 Jan 2017
The part for finding the intervals:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
Index = find(A);
AA = A(Index); % Ignore zeros in cumulative sum
A2 = cumsum(AA);
numAA = numel(AA);
Out = zeros(numAA, 2); % Pre-allocation
t0 = 38.19;
for n1 = 1:numAA
n2 = n1;
while A2(n2) - A2(n1) < t0 && n2 < numAA
n2 = n2 + 1;
end
Out(n1, 1) = Index(n1);
Out(n1, 2) = Index(n2) - 1;
end
Out(numAA, 2) = numel(A); % Fix last element
This needs 0.57 sec on my machine compared to 2.86 sec of John BG's suggestion.
[EDITED] [EDITED2: Bugfix: "B"->"Out"]
exclude = [3, 4, 8];
mask = false(size(Out, 1), 1);
for k = 1:numel(exclude)
mask = mask | (Out(:, 1) >= exclude(k) & exclude(k) <= Out(:, 2));
end
Out(mask, :) = [];
  4 Comments
John BG
John BG on 23 Jan 2017
the last line of the answer of Jan Simon is not valid.
Out([end-10:end],:)
ans =
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
36800 36799
it should be
B([end-10:end],:)
ans =
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
and the Jan Simon's [EDITED] section to apply the mask crashes:
exclude = [3, 4, 8];
mask = false(1, size(Out, 1));
for k = 1:numel(exclude)
mask = mask | (B(:, 1) >= exclude(k) & exclude(k) <= B(:, 2));
end
B(mask, :) = [];
Error using |
Matrix dimensions must agree.
Jan
Jan on 25 Jan 2017
Edited: Jan on 25 Jan 2017
@John BG: [36799, 36799] seems not be correct also, because [36800, 36800] is the last element.
Thanks for finding this bug: While the output is called "Out" in my code, I used "B" here by accident. This is fixed now.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!