Any help in this regard would be highly appreciated...
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How to form the training set ?
16 views (last 30 days)
Show older comments
chaaru datta
on 14 May 2022
Commented: chaaru datta
on 20 Jun 2022 at 6:57
Hello all, I am new to machine learning and wanna use MATLAB for it... I am trying to form a training set in MATLAB on the basis of following expression:

where S denotes the training set, M = 10, m = 1 to M,
is the training feature such that
,
denotes the training label such that
.




My query is what should be the dimension of my training set. I think it should be
.

Any help in this regard will be highly appreciated.
1 Comment
Accepted Answer
the cyclist
on 14 May 2022
If I understand all of your notation correctly, I think your training set needs to be an Mx3 matrix.
If
means that each observation of x has two components (epsilon minus and epsilon plus), then for each observation of the training set, you need two values to represent x, and one to represent y. So

M = [0.2 0.3 -1;
-0.3 0.4 1;
...
0.6 0.5 -1];
would be the representation in which
- 1st column is x (epsilon minus)
- 2nd column is x (epsilon plus)
- 3rd column is y
16 Comments
chaaru datta
on 14 May 2022
Thank you so much sir for your answer....
But I have a query that how to assign label to each observation.This doubt arises to me because the first column of training set is related to epsilon minus , second column is related to epsilon plus then how should I decide for the label of that observation to be minus or plus.
the cyclist
on 14 May 2022
Is this a supervised learning task? If so, then you should know all the input features (x) and the label y.
You have to know the features and the labels, in order to train the model.
If you don't know the values of the features and the label, you might have an unsupervised learning task.
Maybe you could explain more about your problem, and post your data?
chaaru datta
on 14 May 2022
Yes sir ...it's a supervised learning task and I know all the input features (x).
Also, I know that the label is either -1 or 1.
But I am having doubt that if we consider the first row then in 3rd column what should I label? Plus 1 or Minus 1?
the cyclist
on 14 May 2022
I'm still not sure I understand your question. Do you want separate arrays for input and label?
X = [0.2 0.3;
-0.3 0.4;
...
0.6 0.5];
Y = [-1;
1;
...
-1];
chaaru datta
on 14 May 2022
No sir I don't want separate arrays for input and label...
Basically, I want the same array as earlier one i.e. M×3.
But my query is how one decides that my first row third column label is minus 1 or plus 1.
the cyclist
on 14 May 2022
I'm confused.
You wrote "Also, I know that the label is either -1 or 1."
So, use the information you know. If you know the value is -1, put -1. If you know the value is +1, use 1.
chaaru datta
on 14 May 2022
Ok sir...Thanks a lot once again...will implement it in MATLAB now...
chaaru datta
on 15 May 2022
Hello Sir, I had implemented this training set (Mx3) for SVM. However , I am getting accuracy around 50 % whereas I was expecting it to be around 98%.
the cyclist
on 15 May 2022
Can you upload your data and code? (You can use the paperclip icon in the INSERT section of the toolbar.)
Without seeing your data/code, it's impossible to know whether you have implemented something incorrectly, or if you just are expecting too much accuracy.
chaaru datta
on 16 May 2022
Hi sir,
I had shared my code and Training set...
the cyclist
on 16 May 2022
I'm confused again, because the code you uploaded ...
- doesn't load the data
- seems to just generate random data (maybe for testing the code?)
- doesn't fit a statistical model
When you say you got low accuracy, I don't see where you have calculated that.
Also, I did fit a logistic regression model to the data in that file (and also looked at some scatter plots and correlation coefficients), and it doesn't look like En_minus or En_plus have much explanatory power at all for Target:
data = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/999375/Dataset_PIDpaper_7_pls15dB_prac18.xlsx");
data.Target = (data.Target+1)/2;
modelspec = 'Target ~ En_plus + En_minus';
mdl = fitglm(data,modelspec,'Distribution','binomial')
mdl =
Generalized linear regression model:
logit(Target) ~ 1 + En_minus + En_plus
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ ________ ________
(Intercept) -0.0073256 0.0094742 -0.77322 0.43939
En_minus -0.00045798 0.00088569 -0.51709 0.60509
En_plus 0.0018022 0.00089727 2.0086 0.044583
100000 observations, 99997 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 4.04, p-value = 0.133
chaaru datta
on 16 May 2022
Sir, I would like to answer your queries one by one....
1) I am using SVM to do the classification of wireless signals.
2) Data is not loaded : because I used MATLAB to generate the data (training set) of dimension Mx3, where M = 10^5.
3) seems to just generate random data : Data generated has random values because it is related to wireless channels which are random in nature.
3) I don't see where you have calculated the accuracy: Using this training set, I calculated the accuracy in Python.
I am also sharing the research paper which I am trying to implement.
the cyclist
on 16 May 2022
I have to admit I can't spend the time to fully understand your code or that paper. But, here is my impression.
In your code, it looks Train_label_final is not just random, but random with no relationship to Train_set_features. In other words, this is the case where the signal-to-noise ratio is tiny. [SNR(dB) very negative.] In the paper, notice that when SNR(dB) = -15, they also get an accuracy of about 50%. I think you are seeing exactly the same thing.
But I don't see anywhere in your code where you coded an example in which SNR is large, so you have never simulated a case where the accuracy would be high.
chaaru datta
on 17 May 2022
Sir, in the code the large SNR of +15 dB is shown on line 21. And it's effect is included in line 44 and line 57....
the cyclist
on 17 May 2022
I see that the signal is used in the calculation of the features, but it doesn't affect the label, right?
The label you generated is completely random, not affected by the features. Here is the code to generate the labels, with all other code removed:
M_train = 1*10^5; % for training iteration, given in paper as 10^5
M_train_detail = int32(randi([0, 1], [1, M_train])); % generating random tag symbols
Train_label_final = [];
for kk = 1:(M_train)
if M_train_detail(kk)== 0
lab = -1;
else
lab = 1;
end
Train_label = [lab];
Train_label_final = [ Train_label_final; Train_label];
end
This is random, with no reference to signal or the features. Therefore, it is no surprise that you cannot predict these labels from the features.
chaaru datta
on 17 May 2022
Yes sir...you are right. I am generating the labels but they are not affected by the features.
Also, I would like to describe the system model given in paper in brief.
1) System model contains Radio frequency source, tag and reader. 2) Tag reflects (backscatters) two types of signal viz., -1 and +1. 3) When reflected signal from tag is -1 , then epsilon minus feature is obtained at reader else epsilon plus is obtained at the reader. 4) Thus my training set consists of epsilon minus, epsilon plus and labels for each reflected signal from the tag.
More Answers (1)
the cyclist
on 17 May 2022
I spent a little bit more time with the paper.
It seems to me that in the paper, the labels y are supposed to be used when generating s (Eq. 5 & 6) and then epsilon (Eq. 7 & 8).
But you don't use your labels as part of the calculation of the features.
7 Comments
chaaru datta
on 18 May 2022
Yes sir...you are right...but I had also generated the features according to the labels...
For e.g in code line 44 to 54 is for label -1 and line 57 to 67 for label +1.
the cyclist
on 18 May 2022
But the labels used to generate the feature are not what you use in the variable Train_label (which is the 3rd column of Train_set). Shouldn't they be the same labels? Instead, Train_label is just random noise.
Can you also post the Python code with the model, so I can see how you are using the output of the MATLAB program?
chaaru datta
on 18 May 2022
Sir, I am sharing the Pyhton code....
Sir, in this paper we have two features based on energy of signals and they are en_min and en_pls as mentioned in MATLAB code on line 54 and 67. So how should I assign the label to these features?
chaaru datta
on 18 May 2022
Hello sir, I would like to clarify few of my doubts one by one.
1) As per our earlier discussion training set has to be M x 3. So if we assume M =10 then training set will be 10 x 3, in which first column belongs to en_min, second coulmn belongs to en_pls and last column is of label. Is this correct sir?
2) If the mth bit from tag to reader is -1 then only en_min will be obtained. Then my query is what should be the value of en_pls ?
3) If the mth bit from tag to reader is 1 then only en_pls will be obtained. Then my query is what should be the value of en_min ?
4) If we consider that for a mth bit both en_min and en_pls are available then what we should write in the corresponding label i.e., in the third column.
Sir, I would like to tell you my few observations.
1) Sir, I also did in the following way: if mth bit is -1 then en_min will have value and en_pls will be zero and label is -1. And if mth bit is 1 then en_pls will have value and en_min will be zero and label is 1. However, I get 100% accuracy which is definetly not correct.
2) I also form the training set wherein, I compare en_min and en_pls and assign the label 1 if en_pls is greater than en_min and viceversa. But here also, I got 99.92% accuracy even if SNR is -15dB which is again not correct.
the cyclist
on 18 May 2022
I'm not sure I can spend enough time reviewing the paper, and your code, to be able to answer these for you.
But what is very clear to me is that in your current code, the labels you are using are completely unrelated to the energy level features, so they will be unpredictable.
It seems possible to me that in the training set, the label are supposed to be almost perfectly predictable, but the testing set (with different labels) will not be as predictable. That is normally what happens in machine learning problems.
I can try to take another look, but probably not for a few days.
chaaru datta
on 18 May 2022
It's ok sir...Thank you so much for your whole hearted support...I will keep trying to implement this paper...Sir, pls do let me know once you are free so that if I could further discuss with you...
Also it would be better if you could suggest some links to me to solve such machine learning problems..
chaaru datta
on 20 Jun 2022 at 6:57
Hello Sir, can you please share your insights on forming training set as done in this paper.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)