# Random sample, I want the 5% of the data per each hour

1 view (last 30 days)
Rachele Franceschini on 4 Jun 2021
Edited: Scott MacKenzie on 4 Jun 2021
I have a database with 19 columns. One column has date, month, year and hour. I would like to get, per each hour, the 5% of the data. Naturaly, I would like to see all the other data, along with the column of time.
Can you help me?
I saw the comand resample, but at the moment, I am in difficulty.
##### 2 CommentsShowHide 1 older comment
Rachele Franceschini on 4 Jun 2021
Thank you!
I put only 8 columns (to simplify). I tried to: retime, randsample, split the data on basis of time.
But nothing.
Thank you!!!!

Scott MacKenzie on 4 Jun 2021
Edited: Scott MacKenzie on 4 Jun 2021
There might be a way to simplify this, but I believe the script below achieves what you are after...
% read all the data into a table
% build a vector of 0s and 1s --> each 1 occurs where the hour changes
dt = datetime(T{:,3});
hr = hour(dt);
z = diff(hr);
% build a vector of the indices where the time changes
idx = find(z); % indices of 1s in z
idx = [0; idx];
% build a vector of new indices, selecting at random 5% of the rows for each hour
idxNew = [];
for i=2:length(idx)
n = round(0.05 * (idx(i) - idx(i-1)+1));
idxNew = [idxNew, randi([idx(i-1)+1, idx(i)], 1, n)];
end
% create new table with 5% of the rows for each hour
Tnew = T(idxNew,:);
With this script, your data set is now much smaller. See below. That's the general idea, right? ##### 2 CommentsShowHide 1 older comment
Scott MacKenzie on 4 Jun 2021
@Rachele Franceschini You're welcome.
BTW, I just fixed a small bug in the answer script. The second index in each range included the first row of the following hour. It's fixed now. Good luck.

### More Answers (1)

KSSV on 4 Jun 2021
Let A be your data matrix.
[m,n] =size(A) ;
p = round(5/100*m) ;
idx = randsample(m,n) ;
iwant = A(idx,:)

R2021a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!