Renaming categories with accents

Question

Luisa Liboni on 29 Nov 2018

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/432879-renaming-categories-with-accents

Answered: Guillaume on 30 Nov 2018

I have a categorical array t and some categories can have diacritic/accents, such as circumflexes. I want to standardize everything with no diacritic/accents.

I tried this code:

str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
newcat = cat;
for i = 1:numel(str)
    newcat = regexprep(newcat, str{i}, strreplace{i});
end
     
B = renamecats(t,cat,newcat)

However, after removing the accents, some categories turn out to be the same, for exemple: VERMELHO AND VERMÊLHO.

So I receive the following error:

Error using categorical/renamecats (line 39)
NEWNAMES contains duplicated values.

Is there anyway around?

This is just an example. I need a very efficient code since my categorical array t is comming from a very long table with approximaly 500 categories.

Thanks,

2 Comments
Show NoneHide None

Jan on 30 Nov 2018

500 does not sound like big data.

You did not mention what you want to happen, if the names of the categoricals are equal. So it is hard to suggest a solution.

By the way, strrep is much faster than regexprep .

Luisa Liboni on 30 Nov 2018

Jan,

My table has 27,900 lines with 12 categortical columns. One of these columns have 500 unique categories. And I need to get rid of accents in all of the 12.

VERMELHO and VERMÊLHO should be the same. So yes, after removing the accents, some categories turn out to be the same and should be merged. However, renamecats give me that error.

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 30 Nov 2018

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/432879-renaming-categories-with-accents#answer_349800

Open in MATLAB Online

str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
%calculation of new categories, no need for loop
newcat = replace(cat, str, strreplace);
%replace cat by newcat. Create new categorical array using newcat and the index of the original categories in t:
newt = categorical(newcat(double(t)))

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Renaming categories with accents

2 Comments
Show NoneHide None

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Renaming categories with accents

2 Comments Show NoneHide None

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments