Moving window with cumulative sum less than a ref value and excluding certain windows
2 views (last 30 days)
Show older comments
I have an array of 36000 cells, i want to break those arrays into windows of cumulative sum less than a reference value. and exclude few windows based on other parameters respective to this array based on index. For example A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]; similarly other arrays of same length. Now window 1 should start with 1st value and end at sum less than 2 ( window{1} = [0.5 0.6 .45], sum = 1.55) second window has to start from 2nd value ( window{2} = [0.6 .45 .56 .056 .16], sum = 1.826). Similarly i need all possible windows for 36000 cells, while excluding those where value/sum of other arrays/windows of respective index. Index of these windows is also needed. I here wanted a moving window. Eg Index of my windows would be like [ 1 3] , [2 6], [3 9], [4 10], [5 11], [6 11], [ 7 11], [ 8 11], [9 11], [10, 11].
- If i don't wont to consider values in index 3,4, 8, i want to exclude windows which include these indexes.
- If i want to exclude windows whose corresponding array average is less than certain value. Ex: I've another B whose size is same as A. after breaking array A into windows, average of array B windows is less than a value (20) has to be excluded.
3 Comments
Image Analyst
on 16 Jan 2017
RANJITH, please attach some data. I'd like to know if your data is really in cells like you said, or if it's a regular double array. Cell arrays are very slow and inefficient memory hogs - the drawbacks to their being really flexible as to what kind of data they can hold. If you're using cells then I encourage you to see if you can use a standard numerical array, which will speed things up. I can't tell if you're using cells (like MATLAB definition, not Excel's definition) or standard arrays because you're using both braces (like cell arrays use) and brackets (like standard arrays use). But maybe you're using the term cell like Excel uses it - this would be "element" in MATLAB's terminology - I don't know which. Plus, attaching data would help out the people who are trying to solve your question.
Accepted Answer
John BG
on 23 Jan 2017
Edited: John BG
on 23 Jan 2017
Ranji
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
% masking
mask=[3 4 8];
d=0;
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d);
B(d,:)=[];
reasons why I ask you to consider accepting my answer:
1. my code works, no errors like Jan Simon's Out(end,:)=[36800 36799]
2. my excluding mask section doesn't crash, like Simon's.
3. from 1.7 seconds to 0.5 seconds .. if it's really about cutting down time, C should be the language to use.
4. Ranji, If it's really about speed you may want to compile it, which will require the coder to get it all through C before building the .exe and then functions cumsum and find are going to add many more lines and end up taking more time than the few lines I have supplied.
5. the warning error that Jan Simon claims does not show up in MATLAB version R2016, it may happen in his MATLAB version R2009 but not in up to date MATLAB version.
6. his recommendation ' .. This is always a bad programming practize .. ' is the typical pointless remark that does not contribute to help solve the question.
1 Comment
Jan
on 25 Jan 2017
Edited: Jan
on 26 Jan 2017
Shouldn't the last line be [36800, 36800]? [36799, 36799] ignores the last element.
As said before: Your code crashes, if the input contains trailing zeros:
while A(n2)==0
n2=n2+1;
end
From 1.7 to 0.5 seconds only by using pre-allocation and omit the zeros? This is a nice speed up. Feel free to insert the pre-allocation lines in your code to increase its quality.
The editor of R2016b shows the message in the lines "B=[B;n1 n2-1];" and "d=[d k];" as the orange marks in ths MLint-warning column on the rigt side:
The variable 'B' appears to change size on every loop iteration.
Consider pre-allocation for speed.
When the code for the masking is included also, your code needs 8.46 sec, and my suggestion 0.28 sec. The severe drawback of a missing pre-allocation is the exponential growing of the runtime: If the input array A has the double size, your code needs 55.3 sec already compared to 0.6 of mine. As you can see, pre-allocation is fundamental in programming and it does matter the solution here also. For more details search for "Schlemiel the Painter's algorithm", e.g. https://en.wikipedia.org/wiki/Joel_Spolsky:
In software development, a Shlemiel the painter's algorithm
(sometimes, Shlemiel the painter algorithm) is a method that is
inefficient because the programmer has overlooked some
fundamental issues at the very lowest levels of software design.
Therefore your statement "typical pointless remark" means, that you did not get the point.
More Answers (5)
Andrei Bobrov
on 16 Jan 2017
Edited: Andrei Bobrov
on 19 Jan 2017
I'm corrected, thank you Jan Simon.
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
out = zeros(numel(A),2);
n = numel(A);
t = 2;
for ii = 1:n
z = cumsum(A(ii:n)) < t;
out(ii,:) = [ii, find(z,1,'last') + ii - 1] ;
end
5 Comments
Andrei Bobrov
on 19 Jan 2017
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]' ;
t = 2;
tav = 0.1473;
n = numel(A);
wind = cell(n,1);
out = zeros(n,2);
lo = true(n,1);
for ii = 1:n
jj = ii:n;
tt = cumsum(A(ii:n));
z = tt < t;
iend = find(z,1,'last');
lo(ii) = tt(iend)./jj(iend) >= tav;
out(ii,:) = [ii,iend+ii-1];
wind{ii} = A([ii:iend+ii-1]);
end
wind = wind(lo);
out = out(lo,:);
John BG
on 12 Jan 2017
Edited: John BG
on 19 Jan 2017
Let me split the question into 2:
1. generating window indices
A= [ 0.5 0.6 .45 .56 .056 .16 .18 .10 .15 .45 .36]
B={0}
for k=1:1:numel(A)
s=0;n1=k ;n2=k
while s<2
s=s+sum(A([n1:n2]));
if n2<numel(A)
n2=n2+1;
elseif n2>=numel(A)
n2=numel(A);
end
end
if n2<numel(A)
B=[B [n1 n2-1 ]];
elseif n2>=numel(A)
B=[B [n1 n2]];
end
end
L=zeros(size(B,2)-2,2);
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:);
end
L contains the sought indices
L =
1 3
2 4
3 5
4 7
5 11
6 11
7 11
8 11
9 11
10 11
2.
Adding selectivity.
At this point I'm not sure whether you want to exclude any window that has in example index=3 then L would have rows 1 and 3 removed, or you also would like to have any window that implicitly has index 3, which then would mean that L row 2 would also be excluded
case 1: only excluding indices in given index, example 3:
[i j]=find(L==3)
L(i,:)=[]
L =
2 4
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 2: excluding any window directly or implicitly referring a given index:
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
d=0
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],3)
d=[d k];
end
end
d(1)=[];
L(d,:)=[]
L =
4 7
5 11
6 11
7 11
8 11
9 11
10 11
case 1, using a mask:
mask=[3 4 8]
L=zeros(size(B,2)-2,2) % restore L to initial result
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(L==mask(k))
q=[q i(:)']
end
q(1)=[]
q=sort(q)
L(q,:)=[]
L =
5 11
6 11
7 11
9 11
10 11
case 2, using a mask:
L=zeros(size(B,2)-2,2) % restore L
for k=2:1:size(B,2)-1
L(k-1,:)=B{k}(:)
end
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(L,1)
if intersect([L(k,1):1:L(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
L(d,:)=[]
L =
9 11
10 11
if you find these lines useful would you please mark my answer as Accepted Answer?
To any other reader, if you find this answer of any help please click on the thumbs-up vote link,
thanks in advance for time and attention
John BG
5 Comments
John BG
on 19 Jan 2017
Edited: John BG
on 21 Jan 2017
Ranji
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG
John Chilleri
on 12 Jan 2017
Hello,
I tossed this together real quick so there might be unforeseen errors, but this should work:
% Given A
window = cell(36000,1);
window_index = cell(36000,1);
for i = 1:36000
count = i;
while (sum(A(i:count) < 2))
if (count == 36000)
count = count+1;
break;
end
count = count + 1;
end
window{i} = A(i:count-1);
window_index{i} = i:count-1;
end
% Deleting windows containing undesired indices
Undesired = [];
for i = 1:36000
for j = 1:length(Undesired)
if (sum(window_index{i} == Undesired(j)) == 0)
window{i} = [];
window_index{i} = [];
break;
end
end
end
Note that you need to give A and fill in Undesired. Otherwise it will make the windows empty that have undesirable indices, so if you access them after "removal" they'll be empty.
Hope this helps!
1 Comment
John BG
on 23 Jan 2017
Edited: John BG
on 23 Jan 2017
Ranji
I have corrected the sum and managed to reduce run time below 2 seconds, please have a look and let me know if now my answer qualifies as Accepted Answer:
load Array.mat;
tic
A2=cumsum(A);
numA=numel(A);
n1=1;n2=1;
B=[0 0];
t0=38.19;
while n1<numA || n2<numA
while A(n1)==0
n1=n1+1;
n2=n2+1;
end
while A2(n2)-A2(n1)<t0 && n2<numA
n2=n2+1;
while A(n2)==0
n2=n2+1;
end
end
B=[B;n1 n2-1];
n1=n1+1;
n2=n1;
end
B(1,:)=[]
toc
Elapsed time is 1.740256 seconds.
B
=
421 4623
422 4623
423 4623
424 4623
425 4623
426 4623
427 4623
428 4623
429 4623
430 4623
..
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
.
please find attach the resulting variable (without masking [3 4 8] that you ask in 2nd part of your question) in attached file result1.mat variable name B, type double, size 21593x2.
Adding selectivity
1. only excluding indices contained in mask
mask=[3 4 8]
q=0
for k=1:1:length(mask)
[i j]=find(B==mask(k));
q=[q i(:)'];
end
q(1)=[];
q=sort(q);
B(q,:)=[];
B
2. excluding any window directly or implicitly referring any given index in mask
mask=[3 4 8]
d=0
for n=mask
for k=1:1:size(B,1)
if intersect([B(k,1):1:B(k,2)],n)
d=[d k];
end
end
end
d(1)=[];
d=unique(d);d=sort(d)
B(d,:)=[]
.
Ranji
Because you probably want a longer mask I have only attached the initial selection in attached file result1.mat
Since my answer solves your question much faster than any other provided here, would you please be so kind to consider marking my answer as Accepted Answer?
thanks for time and attention, awaiting answer
John BG
1 Comment
Jan
on 23 Jan 2017
Edited: Jan
on 23 Jan 2017
@John BG: Matlab's editor shows an important message for your code: In "B=[B;n1 n2-1]" the array grows iteratively. This is always a bad programming practize. Use a pre-allocation instead:
% Before the loop:
B = zeros(numA, 2);
iB = 0;
% Instead of B=[B;n1 n2-1]:
iB = iB + 1;
B(iB, :) = [n1, n2-1];
% After the loop:
B = B(1:iB, :);
This reduces the runtime on my machine from 2.86 to 1.34 seconds.
The "while A(n2)==0" will fail if A contains trailing zeros.
At least under the old Matlab R2009a I can use currently, creating "A" dynamically by "load Array.mat" increases the runtime by a factor of 50. This allows the JIT acceleration to operate efficiently:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
But this will most likely not matter the code of the OP, because the data are provided as file for the discussion in the forum only.
When the data contain a lot of zeros, removing them will accelerate the processing:
Index = find(A);
A = A(Index);
... as above
B = Index(B); % Get original indices back
This gives an additional boost of 40% on my machine. See my answer.
Jan
on 23 Jan 2017
Edited: Jan
on 25 Jan 2017
The part for finding the intervals:
FileData = load('C:\Local\Array.mat');
A = FileData.A;
Index = find(A);
AA = A(Index); % Ignore zeros in cumulative sum
A2 = cumsum(AA);
numAA = numel(AA);
Out = zeros(numAA, 2); % Pre-allocation
t0 = 38.19;
for n1 = 1:numAA
n2 = n1;
while A2(n2) - A2(n1) < t0 && n2 < numAA
n2 = n2 + 1;
end
Out(n1, 1) = Index(n1);
Out(n1, 2) = Index(n2) - 1;
end
Out(numAA, 2) = numel(A); % Fix last element
[EDITED] [EDITED2: Bugfix: "B"->"Out"]
exclude = [3, 4, 8];
mask = false(size(Out, 1), 1);
for k = 1:numel(exclude)
mask = mask | (Out(:, 1) >= exclude(k) & exclude(k) <= Out(:, 2));
end
Out(mask, :) = [];
4 Comments
John BG
on 23 Jan 2017
the last line of the answer of Jan Simon is not valid.
Out([end-10:end],:)
ans =
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
36800 36799
it should be
B([end-10:end],:)
ans =
36789 36799
36790 36799
36791 36799
36792 36799
36793 36799
36794 36799
36795 36799
36796 36799
36797 36799
36798 36799
36799 36799
and the Jan Simon's [EDITED] section to apply the mask crashes:
exclude = [3, 4, 8];
mask = false(1, size(Out, 1));
for k = 1:numel(exclude)
mask = mask | (B(:, 1) >= exclude(k) & exclude(k) <= B(:, 2));
end
B(mask, :) = [];
Error using |
Matrix dimensions must agree.
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!