How to recognize gender by name
6 views (last 30 days)
Show older comments
Alexander Engman
on 11 Jul 2018
Commented: Image Analyst
on 13 Jul 2018
Hi!
I have a list (1 column, 601 rows) of the most popular male and female surnames and they are marked in another column as either M for male or F for female. I have another list of surnames of people from a statistical survey (which does not have the same dimensions as the list of names). I want to compare the names from the survey with the names in my list and mark them as either M or F if they are recognized. If they are not found in my list, I want to leave them blank. Does anyone know how I can do this?
Many thanks in advance.
2 Comments
Jan
on 11 Jul 2018
What exactly is "a list"? Prefer to post a small Matlab code, which creates a representative data set. Then suggesting some code is much easier.
Accepted Answer
Guillaume
on 11 Jul 2018
Edited: Guillaume
on 11 Jul 2018
Very easy to do:
%inputs:
%genderlist = Mx2 cell array, 1st column name, 2nd column gender
%namelist = Nx1 cell array, list of names that need gender
%output
%namelistwithgender = Nx2 cell array, 1st column from namelist, 2nd column corresponding gender if found in genderlist, empty otherwise
[isfound, where] = ismember(namelist, genderlist(:, 1));
namelistwithgender = namelist;
namelistwithgender(isfound, 2) = genderlist(where(isfound), 2);
Note that the search is performed case sensitive. If you want to ignore case, then convert both lists to lower in the ismember call.
6 Comments
Image Analyst
on 13 Jul 2018
Just use lower() and strrep():
namelist = lower(namelist); % Everything is lower case after this.
theName = namelist{:, 1};
theName = strrep(theName, '-', ' '); % Replace dashes with spaces.
% Get cell array of names
ca = strsplit(theName)
for k = 1 : length(ca)
thisName = ca{k}; % Extract first word
% Check if thisName is in each gender namelist.
etc.
More Answers (1)
Image Analyst
on 11 Jul 2018
I'd get a distribution and then use k nearest neighbors. After all, there are several names with varying numbers of people in either gender, like chris, robin, ariel, sam, pat, etc.
2 Comments
Image Analyst
on 11 Jul 2018
Then just use xlsread() to read in your reference name lists, and your "test/validation" set of names and use ismember(), something like (untested):
[numbers, names, raw] = xlsread(filename);
femaleNames = strings(:, 1); % Female names in column 1.
maleNames = strings(:, 2); % Male names in column 2.
testNames = strings(:, 3); % Test names in column 3.
for k = 1 : length(testNames)
inFemaleList(k,1) = ismember(testNames{k}, femaleNames);
inMaleList(k,2) = ismember(testNames{k}, maleNames);
end
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!