Mass replace text, then split the replaced text into multiple columns - from multiple csv files

Question

0 votes

I have multiple csv files (Azure.csv, Blue.csv,...until 50th file) which has roughly this kind of data inside it:

ID,     Duration,   Team A,   TeamAComposition,   Team B,   TeamBComposition
120404  3442        Azure     Ae Bee Cee Dee Ee   Blue      Ef Gee Heij Ai Jei
302033  233         Azure     Ae Bee Cee Dee Zet  Cyan      El Em En Ou Pee
...and so on

I would like to replace the team composition data into integers, so for example "Ae" will be replaced into "1", "Bee" will be "2", "Cee" into "3" and so on, and if possible convert those into 5 different new columns. So the new table is roughly like this

ID,     Duration,   Team A,   AComp1, AComp2,   AComp3,   AComp4,  AComp5, Team B,   BComp1,  BComp2, BComp3, BComp4, BComp5,
120404  3442        Azure     1       2         3         4        5       Blue      6        7       8       9       10

How can I do this? Thanks in advance.

6 Comments
Show 4 older comments Hide 4 older comments

Guillaume on 2 Mar 2018

A comma or any other separator not present in the names would remove any ambiguity indeed. It's possible to do the splitting without that separator but if there's a chance of the problem above, then an explicit separator is safer.

NeverPerfecT on 3 Mar 2018

Edited: NeverPerfecT on 3 Mar 2018

How can I achieve the multiple column conversion above? (assuming I've given comma after each hero names) Also I'm using Matlab R2015b

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Guillaume on 5 Mar 2018

Open in MATLAB Online

0 votes

Solution for when heroes are just separated by a space:

In its own m file:

function varargout = split_heros(heroes, varargin)
    %split multiple scalar strings (varargin) into heroes id
    varargout = cell(1, numel(varargin)*5);
    for col = 1:numel(varargin)
        ishero = ~cellfun(@isempty, regexp(varargin{col}, heroes.regex));
        varargout(5*col-4 : 5*col) = num2cell(heroes.hero_id(ishero));
    end
end

To create the table:

heronames = readtable('hero_names.csv');
heronames.regex = strcat(heronames.localized_name, '( |$)');  %regular expression to help splitting list of names
%to wrap in a loop for each of the 50 files:
intable = readtable('DC vs EG.csv');
outnames = [sprintfc('AComp%d', 1:5), sprintfc('BComp%d', 1:5)];  %using undocumented sprintfc. in R2016b or later replace with compose
outtable = [intable(:, [1:6, 9, 11]), ...
            rowfun(@(varargin) split_heros(heronames, varargin{:}), intable, 'InputVariables', {'TeamAHeroes', 'TeamBHeroes'}, 'NumOutputs', 10, 'OutputVariableNames', outnames)]

Solution for when heroes are separated by a comma:

In its own m file:

function varargout = split_heros(heroes, varargin)
    [~, row] = ismember(strsplit(strjoin([varargin{:}], ','), ','), heroes.localized_name);
    varargout = num2cell(heroes.hero_id(row));
end

To create the table:

heronames = readtable('hero_names.csv');
%to wrap in a loop for each of the 50 files,  same code as before
intable = readtable('DC vs EG.csv');
outnames = [sprintfc('AComp%d', 1:5), sprintfc('BComp%d', 1:5)];  %using undocumented sprintfc. in R2016b or later replace with compose
outtable = [intable(:, [1:6, 9, 11]), ...
            rowfun(@(varargin) split_heros(heronames, varargin{:}), intable, 'InputVariables', {'TeamAHeroes', 'TeamBHeroes'}, 'NumOutputs', 10, 'OutputVariableNames', outnames)]

6 Comments
Show 4 older comments Hide 4 older comments

NeverPerfecT on 7 Mar 2018

Open in MATLAB Online

CDEC vs VG.csv

Sorry to bother you again Guillame, I've come to another dead end xD. When I tried to convert another file this wall of error text came up.. (csv attached)

Error using table/rowfun>dfltErrHandler (line 309)
Applying the function '@(varargin)split_heros(heronames,varargin{:})' to the 38th row of A generated the
following error:
In an assignment  A(:) = B, the number of elements in A and B must be the same.
Error in table/rowfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 196)
    errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});
Error in table/rowfun (line 214)
                [b_data{i,:}] = errHandler(struct('identifier',ME.identifier, 'message',ME.message,
                'index',i),inArgs{:});
Error in replacenames (line 7)
            rowfun(@(varargin) split_heros(heronames, varargin{:}), intable, 'InputVariables', {'TeamAHeroes',
            'TeamBHeroes'}, 'NumOutputs', 10, 'OutputVariableNames', outnames)];

Is this due to no commas present? As always, thanks again!

Guillaume on 7 Mar 2018

Open in MATLAB Online

If you're getting an "In an assignment A(:) = B, ..." error most likely it's because for some reason the code didn't identify exactly 5 heros in a team. It either found more or less. As said, I didn't put any error checking. You could add the line

assert(sum(ishero) == 5, '%d heros found in team %d', sum(ishero), col);

after the ishero = ... line in split_heros.m to make the error message clearer.

When I test your 'CDEC vs VG.csv' file, I get a problem on the first row of the table because Barathrum is not in your 'hero_name.csv'. I suspect a similar problem happens for row 38 (where you're getting the error).

Note that the check for hero names that I've implemented for the 1st case is fairly crude. It also assumes that the hero names do not contain any of the following characters: ()[]{}|^$.+*\?, that is any of the characters that have special meaning in regular expressions.

If you want to check which heros are identified for a particular team of a particular row, e.g. team B of row 38:

heroes = intable{38, 'TeamBHeroes'}
ishero = ~cellfun(@isempty, regexp(heroes, heronames.regex));
foundheros = heronames(ishero, :)

which in my case shows that Monkey King is not found (since not in heronames)

Sign in to comment.

Mass replace text, then split the replaced text into multiple columns - from multiple csv files

6 Comments
Show 4 older comments Hide 4 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Mass replace text, then split the replaced text into multiple columns - from multiple csv files

6 Comments Show 4 older comments Hide 4 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments

6 Comments
Show 4 older comments Hide 4 older comments