randomly divide vector into 2 parts.

Asked by RUCHI CHOUDHARY

RUCHI CHOUDHARY (view profile)

on 15 Sep 2019 at 15:05
Latest activity Edited by Bruno Luong

Bruno Luong (view profile)

on 15 Sep 2019 at 17:42
Accepted Answer by Rik

Rik (view profile)

i have vector 100000*1 .i want to randomly divide data of it into 80% and 20% and store them into 2 vector.Divide of data must be random.
"non_zero_entry" is total how many entry in the vector.i tried to do this but it show me error "Index exceeds the number of array elements (80000)".
training_data=.8;
tf = false(length(non_zero_entry),1);
tf(1:round(training_data*non_zero_entry)) = true;
tf = tf(randperm(non_zero_entry)); % error occur in this line "Index exceeds the number of array elements (80000)".
dataTraining = index_rating(tf,:);
dataTesting = index_rating(~tf,:);

Rik (view profile)

on 15 Sep 2019 at 15:17

I don't fully understand every step of your code, so I rewrote it:
index_rating=rand(100000,5);%generate example data
training_data=.8;
s=rand(size(index_rating,1),1);
sorted=sort(s);
cutoff=sorted(round(training_data*numel(s)));
tf=s<=cutoff;
dataTraining = index_rating(tf,:);
dataTesting = index_rating(~tf,:);

Answer by Bruno Luong

Bruno Luong (view profile)

on 15 Sep 2019 at 17:33
Edited by Bruno Luong

Bruno Luong (view profile)

on 15 Sep 2019 at 17:42

For the record, here is how I would do:
n = size(index_rating,1);
i = randperm(n,round(training_data*n));
dataTraining = index_rating(i,:);
dataTesting = index_rating(setdiff(1:n,i),:);

Answer by Bruno Luong

Bruno Luong (view profile)

on 15 Sep 2019 at 15:25
Edited by Bruno Luong

Bruno Luong (view profile)

on 15 Sep 2019 at 15:43

Always post complete code that can run. don't less us geuss what is the size/class of the inputs
First assumption: non_zero_entry is a integer scalar, then change the allocation to
tf = false(non_zero_entry,1);
Second assumption non_zero_entry is a vector then change
tf = tf(randperm(non_zero_entry));
to
tf = tf(randperm(end))
Similarly correction must be applied for the earlier instruction
tf(1:round(training_data*end)) = true