splitapply doesn't split well into bins

2 views (last 30 days)
לק"י
Hi guys,
I wanted splitapply command to split to 90 different bins. somewhy it returns only 50.
Here is the process I made:
First, 'cell1areas' (size - 18800X1) - a variable that contains vector of areas was loaded.
then 'bins' or 'groups' from 0 to 90000 in 1000 spacing was created in 'edges' variable.
after that, discretize function was applied to the area vector data. the max value of the variable dis is 62 (max(dis)).
valid function was apllied to check rather the data is a number or NaN.
last, splitapply function was called with @sum to sum all values for each group.
The problem is, that the spltsum variable have 50 'bins' or vector elements in it, instead of the desired 90 (which is the number of bins in edges) or even 62(!) like the discretize gave only 62 different numbers and not 90.
Thanks in advace, this community is great and really helpfull!
the code:
edges=[0 0:1000:90000 90000];
dis=discretize(cell1areas, edges);
valid=isfinite(cell1areas);
spltsum=splitapply(@sum , cell1areas(valid) , findgroups(dis(valid)) );

Accepted Answer

Matt J
Matt J on 11 Oct 2021
Edited: Matt J on 13 Oct 2021
You can use accumarray instead.
spltsum=accumarray(dis(valid), cell1areas(valid) , [90,1]);
  5 Comments
Amit Ifrach
Amit Ifrach on 13 Oct 2021
לק"י
thanks!
and another (last) one, I want the data to be splitted in bins defined by:
edges=[0 0:1000:90000 90000];
but as far as I understand the acuumarray arbitrary devides the data into 90 bins without paying attention to the length of the bins required (because of the last argument, [90,1]). is it true?
spltsum=accumarray(dis(valid), cell1areas(valid) , [90,1]);
if so, I need a way that the data will be splitted by the edges vector alone.
or to put it in other words:
I assume accumarray only sums up each value in cell1area that has the same 'bin' (value of bin as an integer).
the binning of cell1area is done primarily by discretize function (dis variable in this example).
accumarray only sums up all the values in cell1area that has the same binnig (by the dis function).
if so, why should I mention in the accumarray function the [90,1] vector/variable. it should know that I want 90 bins that are separated from each other by 1000 untill the value 90000, not arbitrary values that matlab thinks suites to devide the data I give it.
thanks!
Matt J
Matt J on 13 Oct 2021
Not all 90 bins contain counts. If you don't tell accumarray how many bins you have, it will assume you only have max(dis(valid)) bins.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!