how to extract a list of unique words from a set of one row strings

Question

Harrison on 14 Nov 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings

Commented: Harrison on 15 Nov 2024

Basically I have a set of 11 strings of words, and each string has no repeating words, but I need a list of every unique word in all 11 strings.

I've found that this works for one string at a time, but I can't get a list for all 11 strings this way.

A{1} = updatedDocuments(1,1)

B{1} = strjoin(unique(strtrim(strsplit(A{1}, ',')))', '')

Is it possible to index A{1} as updatedDocuments(1:11,1) or do something similar?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Madheswaran on 14 Nov 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1545194

Edited: Madheswaran on 15 Nov 2024

Open in MATLAB Online

Hi @Harrison,

I am assuming the following:

'updatedDocuments' is an array of 'tokenizedDocument'
Each document contains text that is comma seperated and doesn't end with a comma

To get the unique words from the entire set of strings, you can follow the below approach:

% remove comma from the documents if you don't want comma to be 
% included in 'uniqeWords'
updatedDocuments = removeWords(updatedDocuments, ","); 
uniqueWords = updatedDocuments.Vocabulary;

If the 'updatedDocuments' is an cell array of char vector, you can follow the below approach:

updatedDocuments = strcat(updatedDocuments, ','); % Add comma at end of each cell
allWords = strjoin(updatedDocuments(1:11,1), ' '); % Join all words into a single string
allWords = strtrim(strsplit(allWords, ',')); % Split with comma as delimiter and trim
uniqueWords = unique(allWords); % unique words (1 x n cell where n is the number of unique words)

For more information, refer to the following documentations:

Hope this helps!

3 Comments
Show 1 older commentHide 1 older comment

Madheswaran on 15 Nov 2024

That is because I assumed 'updatedDocument' to be a cell array of character vectors. If 'updatedDocument' were an array of 'tokenizedDocument', resolving this issue would be straightforward. I have updated the answer by including a solution for when 'updatedDocument' is a 'tokenizedDocument', in addition to the existing explanation.

Let me know if that helps!

Harrison on 15 Nov 2024

Thats exactly right! Thank you!!

Sign in to comment.

Answer 2

Paul on 14 Nov 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1544974

Open in MATLAB Online

If UpdatedDocuments is a 1D cell array of chars ...

UpdatedDocuments{1} = 'one,two,three,one';
UpdatedDocuments{2} = 'one,two,three,two';
UpdatedDocuments{3} = 'one,two,three,three';
result = cellfun(@(S) strjoin(unique(strtrim(strsplit(S, ','))),','),UpdatedDocuments,'Uni',false)
result = 1x3 cell array
    {'one,three,two'}    {'one,three,two'}    {'one,three,two'}

1 Comment
Show -1 older commentsHide -1 older comments

Paul on 15 Nov 2024

Open in MATLAB Online

The Vocabulary property of tokenizedDocument returns the uniqew words in the array

documents = tokenizedDocument([
    "an example of a short sentence  an example of a short sentence " 
    "a second short sentence a second short sentence"]);
documents
documents = 
  2x1 tokenizedDocument:

    12 tokens: an example of a short sentence an example of a short sentence
     8 tokens: a second short sentence a second short sentence
documents.Vocabulary
ans = 1x7 string array
    "an"    "example"    "of"    "a"    "short"    "sentence"    "second"

Sign in to comment.

how to extract a list of unique words from a set of one row strings

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

how to extract a list of unique words from a set of one row strings

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments