How can i encode strings in a array to numbers?
Show older comments
I have different strings in a matrix and some of the coloums of this matrix is strings. they includes city names etc. so i would like to convert this city names into number i did below code. my matrix is made of string so firstly i try to learn how many unique string are exist in a coloumn for example in 9th coloumn.
c=unique(matrix(:,9));
then for example i find 3400 elements and i give numbers to each one. (first coloumn of the c is city names and second coloumn is numbers 1 to 3400)
c(:,2)=1:3400;
then i have 1 million rows and i try to match everyone of them with numbers and create a new matrix that includes only numbers
for i=1:1000000
for j=1:3400
if matrix(i,9)==c(j,1)
numbermatrix(i,9)=str2double(c(j,2));
end
end
end
so this code works well but it takes a lot of time to compare every variable with all possibilities.
Is there any easy and fast working method to do this job?
1 Comment
Ok, maybe you can make that a little more clear.
>"I have different strings in a matrix and some of the coloums of this matrix is strings"
So you have a string matrix. Which is unlikely, so you have either a string array or more likely a cell. You could just have an actual matrix array if you have the same number of characters and have the shorter city names just padded with spaces, but ugh hopefully not. Or maybe city codes of the same lenght. Anyway it does not matter since it seems you are only interested in column 9 and unique works on all of the above. So maybe:
matcol9 = {'New York', 'London', 'Paris', 'San Francisco', 'Toronto', 'Sydney', 'Singapore', 'London', 'Paris'}'; % example cell; ["London", "Paris" etc] in case of string array
[c,ia,ic]=unique(matcol9); %If A is a vector, then C = A(ia) and A = C(ic).
So unique has already a built-in way to assign an index to the original elements (the variable ic). And that may already be what you need. Assuming that the 1 million rows is the same "matrix" (why are they rows now tho?). Otherwise ismember might be more helpful than comparing each elemnt to each.
Accepted Answer
More Answers (1)
If you have data where some of the columns are text and some are numbers, consider storing your data in a table array rather than trying to use numbers as proxies for the text data. In a table array, each variable has to have the same type of data but different variables can contain different types of data.
names = ["Walter Roberson"; "Image Analyst"; "Star Strider"];
reputation = [135067; 77260; 65504];
contributors = table(names, reputation)
Here the names variable is text while the reputation variable is numeric.
starStriderReputation = contributors{3, 'reputation'}
Alternately, instead of treating the names as data I could treat them as row names and use them to index into the table.
contributors2 = table(reputation, RowNames=names)
walterRobersonReputation = contributors2{"Walter Roberson", 1}
If your data is time-based a timetable may be more useful than a table because it provides capabilities to change the time basis of your data or perform time-based analysis (compute daily averages from data collected more frequently, for example.)
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!