finding e-mail address which begins with 2 character and different domain names as array

19 views (last 30 days)
hello every one; i have 3 arrays and they are declared as the following:
dom_lists={'000' 'hotmal.com';'001' 'gmail.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
sub_g1={'aj';'ih';'vn';'hu';'eg';'is';'rd';'nt';'me';'ah';'zb';'en';'mm'};
sub_g2={'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
list_emialadre={'aakm@hotmail.com';'abomcn@hotmail.com';...............}; 5408 emails which are based on domain names, e.g. hotmail.com have 676 email address and also gmail.com have 676....
what i want is; generating the sub_g2's meaning or equivalent strings from domain_lists vector. after that, sub_g1's data is also used for finding from the email address which begins with the sub_g1's dual characters and ends sub_2 domain_lis. so help me for solving this problem.

Accepted Answer

Stephen23
Stephen23 on 1 May 2015
Edited: Stephen23 on 2 May 2015
If you split the problem into parts then it is much easier to solve. Here are the data call arrays, which I altered e.g. by adding one sample email address that actually matches the second pair+domain data (otherwise the two email addresses given do not match any, so we would not have a positive result):
dom_lists = {'001' 'gmail.com'; '000' 'hotmal.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
sub_g1 = {'aj'; 'ih'; 'vn'; 'hu'; 'eg'; 'is'; 'rd'; 'nt'; 'me'; 'ah'; 'zb'; 'en'; 'mm'};
sub_g2 = {'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
email_list = {'aakm@hotmail.com'; 'abomcn@hotmail.com'; 'ihzzz@myspace.com'};
First convert the binary strings into numeric values with bin2dec, as this makes them easier to work with:
sub_N = bin2dec(cell2mat(sub_g2));
dom_N = bin2dec(cell2mat(dom_lists(:,1)));
Then simply match these numeric values using bsxfun, and extract the correct domain strings:
[row,~] = find(bsxfun(@eq,dom_N,sub_N.'));
dom_g2 = dom_lists(row,2);
Then create regexp regular expressions based on the character-pair and domain strings, and use these to locate the matching email addresses:
rgx = strcat('^',sub_g1,'.*@',strrep(dom_g2,'.','\.'),'$');
mtc = cellfun(@(s)regexp(email_list,s),rgx, 'UniformOutput',false);
out = ~cellfun('isempty',[mtc{:}]);
where out can be shown in the command window:
>> out
out =
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
out is a logical array where each row corresponds to one of the given email addresses (only three in the sample data!) and each column corresponds to the pair+domain data of sub_g1 and sub_g2 (thus thirteen columns). From this array we can see that the third email address matches the data of the second pair+domain data, which is what was stated at the beginning, so the algorithm has successfully detected this positive test case.
  7 Comments
Stephen23
Stephen23 on 3 May 2015
@abdulkarim hassan: I'm glad to help! On this forum it is also considered polite to accept answers that resolve your questions.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!