context
Search documents for word or n-gram occurrences in context
Syntax
Description
specifies the length of the context to return using any of the previous
syntaxes.T
= context(___,contextLength
)
specifies additional options using one or more name-value pair arguments using any
of the previous syntaxes.T
= context(___,Name,Value
)
Examples
Search Documents for Word Occurrences
Load the example data. The file sonnetsPreprocessed.txt
contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt
, split the text into documents at newline characters, and then tokenize the documents.
filename = "sonnetsPreprocessed.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);
Search for the word "life".
tbl = context(documents,"life");
head(tbl)
Context Document Word ________________________________________________________ ________ ____ "consumst thy self single life ah thou issueless shalt " 9 10 "ainted counterfeit lines life life repair times pencil" 16 35 "d counterfeit lines life life repair times pencil pupi" 16 36 " heaven knows tomb hides life shows half parts write b" 17 14 "he eyes long lives gives life thee " 18 69 "tender embassy love thee life made four two alone sink" 45 23 "ves beauty though lovers life beauty shall black lines" 63 50 "s shorn away live second life second head ere beautys " 68 27
View the occurrences in a string array.
tbl.Context
ans = 23x1 string
"consumst thy self single life ah thou issueless shalt "
"ainted counterfeit lines life life repair times pencil"
"d counterfeit lines life life repair times pencil pupi"
" heaven knows tomb hides life shows half parts write b"
"he eyes long lives gives life thee "
"tender embassy love thee life made four two alone sink"
"ves beauty though lovers life beauty shall black lines"
"s shorn away live second life second head ere beautys "
"e rehearse let love even life decay lest wise world lo"
"st bail shall carry away life hath line interest memor"
"art thou hast lost dregs life prey worms body dead cow"
" thoughts food life sweetseasond showers gro"
"tten name hence immortal life shall though once gone w"
" beauty mute others give life bring tomb lives life fa"
"ve life bring tomb lives life fair eyes poets praise d"
" steal thyself away term life thou art assured mine li"
"fe thou art assured mine life longer thy love stay dep"
" fear worst wrongs least life hath end better state be"
"anst vex inconstant mind life thy revolt doth lie o ha"
" fame faster time wastes life thou preventst scythe cr"
"ess harmful deeds better life provide public means pub"
"ate hate away threw savd life saying "
" many nymphs vowd chaste life keep came tripping maide"
Search Documents for N-Gram Occurrences
Load the example data. The file sonnetsPreprocessed.txt
contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt
, split the text into documents at newline characters, and then tokenize the documents.
filename = "sonnetsPreprocessed.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);
Search for the bigram "dost thou".
ngram = ["dost" "thou"]; tbl = context(documents,ngram); head(tbl)
Context Document Word _____________________________________________________________ ________ ________ "unthrifty loveliness why dost thou spend upon thy self thy " 4 4 5 "ee beauteous niggard why dost thou abuse bounteous largess " 4 25 26 "ve profitless usurer why dost thou great sum sums yet canst" 4 35 36 "eavy eyelids weary night dost thou desire slumbers broken s" 61 10 11 " sweet lovely dost thou make shame like canker f" 95 3 4 "hy budding name o sweets dost thou thy sins enclose tongue " 95 19 20 "ruth beauty love depends dost thou therein dignified make a" 101 16 17 " thou blind fool love dost thou mine eyes behold know be" 137 5 6
View the occurrences in a string array.
tbl.Context
ans = 10x1 string
"unthrifty loveliness why dost thou spend upon thy self thy "
"ee beauteous niggard why dost thou abuse bounteous largess "
"ve profitless usurer why dost thou great sum sums yet canst"
"eavy eyelids weary night dost thou desire slumbers broken s"
" sweet lovely dost thou make shame like canker f"
"hy budding name o sweets dost thou thy sins enclose tongue "
"ruth beauty love depends dost thou therein dignified make a"
" thou blind fool love dost thou mine eyes behold know be"
"h rebel powers array why dost thou pine suffer dearth paint"
"y large cost short lease dost thou upon thy fading mansion "
Specify Context Length
Load the example data. The file sonnetsPreprocessed.txt
contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt
, split the text into documents at newline characters, and then tokenize the documents.
filename = "sonnetsPreprocessed.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);
Search for the word "life" and return each occurrence with a 15-character context before and after.
tbl = context(documents,"life",15);
head(tbl)
Context Document Word ____________________________________ ________ ____ "hy self single life ah thou issuel" 9 10 "nterfeit lines life life repair ti" 16 35 "eit lines life life repair times p" 16 36 "ows tomb hides life shows half par" 17 14 "ng lives gives life thee " 18 69 "assy love thee life made four two " 45 23 " though lovers life beauty shall b" 63 50 "ay live second life second head er" 68 27
View the occurrences in a string array.
tbl.Context
ans = 23x1 string
"hy self single life ah thou issuel"
"nterfeit lines life life repair ti"
"eit lines life life repair times p"
"ows tomb hides life shows half par"
"ng lives gives life thee "
"assy love thee life made four two "
" though lovers life beauty shall b"
"ay live second life second head er"
" let love even life decay lest wis"
"all carry away life hath line inte"
"ast lost dregs life prey worms bod"
" thoughts food life sweetseasond s"
"hence immortal life shall though o"
"te others give life bring tomb liv"
"ing tomb lives life fair eyes poet"
"self away term life thou art assur"
"t assured mine life longer thy lov"
"t wrongs least life hath end bette"
"nconstant mind life thy revolt dot"
"er time wastes life thou preventst"
"l deeds better life provide public"
"way threw savd life saying "
"hs vowd chaste life keep came trip"
Specify Source Text
Specify source text to display context.
Load the sonnets.txt
data and split it into separate documents.
txt = extractFileText("sonnets.txt");
paragraphs = split(txt,[newline newline]);
Extract the sonnets from paragraphs
. The first sonnet is the fifth element of paragraphs, and the remaining sonnets appear in every second element afterward.
sonnets = paragraphs(5:2:end); documents = tokenizedDocument(sonnets);
Normalize the text, then search for the word "life".
documentsNormalized = normalizeWords(documents);
T = context(documentsNormalized,"life")
T=23×3 table
Context Document Word
________________________________________________________ ________ ____
"sum'st thy self in singl life ? ah ! if thou issueless" 9 18
" : so should the line of life that life repair , which" 16 73
"ld the line of life that life repair , which thi , tim" 16 75
"s a tomb which hide your life , and show not half your" 17 34
" live thi , and thi give life to thee . " 18 128
"ssi of love to thee , my life , be made of four , with" 45 53
"eauti , though my lover' life : hi beauti shall in the" 63 100
" awai , to live a second life on second head ; er beau" 68 59
"t your love even with my life decai ; lest the wise wo" 71 118
"shall carri me awai , my life hath in thi line some in" 74 18
"ast but lost the dreg of life , the prei of worm , my " 74 83
"to my thought as food to life , or as sweet-season'd s" 75 10
"ur name from henc immort life shall have , though i , " 81 42
" , when other would give life , and bring a tomb . the" 83 108
"a tomb . there live more life in on of your fair ey th" 83 118
"yself awai , for term of life thou art assur mine ; an" 92 13
⋮
Since the words are normalized, the contexts may not be easy to read. To view the contexts using the original text data, specify the source text using the 'Source'
option.
T = context(documentsNormalized,"life",'Source',sonnets)
T=23×3 table
Context Document Word
________________________________________________________ ________ ____
"um'st thy self in single life? Ah! if thou issueless s" 9 18
": So should the lines of life that life repair, Which " 16 73
"d the lines of life that life repair, Which this, Time" 16 75
" a tomb Which hides your life, and shows not half your" 17 34
"ves this, and this gives life to thee. " 18 128
"assy of love to thee, My life, being made of four, wit" 45 53
"eauty, though my lover's life: His beauty shall in the" 63 100
"n away, To live a second life on second head; Ere beau" 68 59
"t your love even with my life decay; Lest the wise wor" 71 118
" shall carry me away, My life hath in this line some i" 74 18
"st but lost the dregs of life, The prey of worms, my b" 74 83
"o my thoughts as food to life, Or as sweet-season'd sh" 75 10
"name from hence immortal life shall have, Though I, on" 81 42
", When others would give life, and bring a tomb. There" 83 108
"a tomb. There lives more life in one of your fair eyes" 83 118
"hyself away, For term of life thou art assured mine; A" 92 13
⋮
Input Arguments
documents
— Input documents
tokenizedDocument
array
Input documents, specified as a tokenizedDocument
array.
word
— Word to find
string scalar | character vector | scalar cell array
Word to find in context, specified as a string scalar, character vector, or scalar cell array containing a character vector.
Data Types: char
| string
| cell
ngram
— N-gram to find
string array | cell array of character vectors
N-gram to find in context, specified as a string array or cell array of character vectors.
ngram
has size
1
-by-N
, where
N
is the number of words in the n-gram. The value of
ngram(j)
is the j
th word of the
n-gram.
The function ignores trailing empty strings in
ngram
.
Data Types: string
| cell
contextLength
— Context length
25 (default) | positive integer
Context length, specified as a positive integer.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Solver','avb'
specifies to use approximate variational
Bayes as the solver.
Source
— Source text
string array | cell array of character vectors
Source text, specified as the comma-separated pair consisting of
'Source'
and a string array or a cell array of
character vectors. If the input documents are preprocessed, and you have
the source text, then you can use this option to make the output more
readable.
The source text must be the same size as
documents
.
IgnoreCase
— Option to ignore case
false
(default) | true
Option to ignore case, specified as the comma-separated pair
consisting of 'IgnoreCase'
and one of the following:
false
– search for occurrences that match the word or n-gram exactly.true
– search for occurrences that match the word or n-gram ignoring case.
Output Arguments
T
— Table of contexts
table
Table of contexts with these columns:
Context | String containing the queried word or n-gram in context |
Document | Numeric index of the document containing the word or n-gram |
Word | Numeric indices of the word or n-gram in the document |
Version History
Introduced in R2017b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)