Create Word Cloud from String Arrays

This example shows how to create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud function. If you have Text Analytics Toolbox™ installed, then you can create word clouds directly from string arrays. For more information, see wordcloud (Text Analytics Toolbox).

Read the text from Shakespeare's Sonnets with the fileread function.

sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans = 
    'THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,'

Convert the text to a string using the string function. Then, split it on newline characters using the splitlines function.

sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(10:14)
ans = 5x1 string
    "  From fairest creatures we desire increase,"
    "  That thereby beauty's rose might never die,"
    "  But as the riper should by time decease,"
    "  His tender heir might bear his memory:"
    "  But thou, contracted to thine own bright eyes,"

Replace some punctuation characters with spaces.

p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,p," ");
sonnets(10:14)
ans = 5x1 string
    "  From fairest creatures we desire increase "
    "  That thereby beauty's rose might never die "
    "  But as the riper should by time decease "
    "  His tender heir might bear his memory "
    "  But thou  contracted to thine own bright eyes "

Split sonnets into a string array whose elements contain individual words. To do this, join all the string elements into a 1-by-1 string and then split on the space characters.

sonnets = join(sonnets);
sonnets = split(sonnets);
sonnets(7:12)
ans = 6x1 string
    "From"
    "fairest"
    "creatures"
    "we"
    "desire"
    "increase"

Remove words with fewer than five characters.

sonnets(strlength(sonnets)<5) = [];

Convert sonnets to a categorical array and then plot using wordcloud. The function plots the unique elements of C with sizes corresponding to their frequency counts.

C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")

See Also

|