How to split a column's elements to two vectors based on lables?
2 views (last 30 days)
Show older comments
phdcomputer Eng
on 26 Dec 2018
Commented: Image Analyst
on 29 Dec 2018
I attached a part of lung dataset(32X57), It's last column is the lables(1 or 2), I want to split each column to two vectors based on the lables:
F(i).normal vector for saving matrix's elements with lable 1 ,
F(i).tumor vector for saving elements with lable 2 .
I attached my matlab code.
For adding each column's elements in a vector, It seems this code is not true. I'll be very gratefull to have your opinion.
close all;
clc
load lung.mat
F=lung;
[n,m]=size(F);
for i=1: m
s1=0; s2=0;
for j=1: n
if (F(j,m)==1)
for z=1:s1
F(i).normal(z)=F(j,i);
s1=s1+1;
end
else
for x=1:s2
F(i).tumor(x)=F(j,i);
s2=s2+1;
end
end
end
end
0 Comments
Accepted Answer
Image Analyst
on 27 Dec 2018
You didn't attach lung.mat. But is this what you want:
% Create sample data.
data = randi(9, 32, 57); % Random integers in the range 1-9.
data(:, end) = randi(2, 32, 1) % Last columns is 1 or 2 ONLY.
% Find out what rows are labeled 1 and 2
% by looking in the last column.
rowsLabeled1 = data(:, end) == 1;
rowsLabeled2 = data(:, end) == 2;
% Extract rows labeled 1 and 2 into their own matrices.
data1 = data(rowsLabeled1, :);
data2 = data(rowsLabeled2, :);
% You can get vectors from each column by extracting it into a new variable
% e.g. to get 2 vectors for column 5, do
col51 = data1(:, 5); % Get col 5 with label 1.
col52 = data2(:, 5); % Get col 5 with label 2.
14 Comments
Image Analyst
on 29 Dec 2018
You already know how to use pdist2, and you can plot all those distances, and even get a histogram of them. If you want to split into two zones, you can use graythresh(), imbinarize() or kmeans(), though like before I think that makes little to no sense. You still haven't explained why. Anyway, you should use a fixed threshold for consistency. Using an automatic threshold that varies depending on how many points are class 1 or class 2 is not good for comparing data sets. What if the distances were normally distributed? What does that mean? The numbers are uniformly distributed??? What if the distances had two clusters? What does that mean? That the measurements were in two tight clusters? It seems that by having the data for that measurement already labeled that someone has already somehow thresholded something, and it's probably the values themselves rather than the distance between them. But go ahead and do it and show us the values and the histograms, and the distance values and the distance value histogram and we can see if the distance histogram gives any additional insight.
It would be easy for you to make up data sets that range from clustered to uniformly distributed and compute the distances in each case. For example, in my K Nearest Neighbor demo, I create two classes, each with a spread, and a separation between the two classes. Though it's in 2-D for 2 variables. You could actually just make two classes in 1-D simply by using rand() and randn() and setting the mean and spread for each class.
Image Analyst
on 29 Dec 2018
OK, I programmed up a simple Monte Carlo Simulation for you with uniform, non-overlapping distributions for two classes. It is attached. You can see the measurement values, the distance values, and the histogram of the distance values. I think you can do a lot of your experimentation and discovery of insights just by trying different distributions in a Monte Carlo fashion. For example, maybe the distribution of distances is the convolution of the distributions of the two measurement class distributions. What do you think?
% Program to do a Monte Carlo simulation of measurements between two classes of patients.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
imtool close all; % Close all imtool figures if you have the Image Processing Toolbox.
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 16;
% Specify parameters.
numClass1 = 120; % Number of measurements in class 1.
numClass2 = 80; % Number of measurements in class 2.
meanClass1 = 25;
meanClass2 = 75;
spread1 = 25;
spread2 = 25;
% Generate measurements
class1Values = meanClass1 + spread1 * (rand(numClass1, 1) - 1);
class2Values = meanClass2 + spread2 * (rand(numClass2, 1) - 1);
% Plot measurements
subplot(2, 2, 1);
plot(class1Values, 'b*', 'MarkerSize', 10, 'LineWidth', 2);
hold on;
plot(class2Values, 'r*', 'MarkerSize', 10, 'LineWidth', 2);
xlabel('Measurement Number', 'FontSize', fontSize);
ylabel('Measurement Value', 'FontSize', fontSize);
title('Measurement Value for Every Patient', 'FontSize', fontSize);
grid on;
legend1 = sprintf('%d in Class 1', numClass1);
legend2 = sprintf('%d in Class 2', numClass2);
legend(legend1, legend2, 'location', 'east');
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0, 0.04, 1, 0.96]);
drawnow;
% Compute distances of every point to every other point.
set1 = [zeros(length(class1Values), 1), class1Values];
set2 = [zeros(length(class2Values), 1), class2Values];
distances = pdist2(set1, set2);
subplot(2, 2, 2);
bar(distances);
grid on;
title('Distances between Class 1 Points and Class 2 Points', 'FontSize', fontSize);
xlabel('Pair Number', 'FontSize', fontSize);
ylabel('Distance between pair', 'FontSize', fontSize);
% Show histogram of distances.
subplot(2, 2, 3:4);
histogram(distances);
grid on;
caption = sprintf('Histogram of %d Distances between Class 1 Points and Class 2 Points', numel(distances));
title(caption, 'FontSize', fontSize);
xlabel('Distance', 'FontSize', fontSize);
ylabel('Count', 'FontSize', fontSize);
More Answers (1)
Cris LaPierre
on 27 Dec 2018
Your data is not attached, so nothing to test but have you looked into using a table and the functions findgroup and splitapply? See some examples here.
See Also
Categories
Find more on Specifying Target for Graphics Output in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!