erasePunctuation
Erase punctuation from text and documents
Syntax
Description
erases punctuation and symbols from newDocuments = erasePunctuation(documents)documents. If a word is
empty after removing punctuation and symbol characters, then the function removes
it. For tokenized document input, the function erases punctuation from tokens with
type 'punctuation' and 'other'. For example,
the function does not erase punctuation and symbol characters from URLs and email
addresses.
erases punctuation and symbols from only the specified token types.newDocuments = erasePunctuation(documents,'TokenTypes',types)
Examples
Input Arguments
Output Arguments
More About
Tips
For string input,
erasePunctuationremoves punctuation characters from URLs and HTML tags. This behavior can prevent the functionseraseTags,eraseURLs, anddecodeHTMLEntitiesfrom working as expected. If you want to use these functions to preprocess your text, then use these functions before usingerasePunctuation.
References
[1] Unicode Character Categories. https://www.fileformat.info/info/unicode/category/index.htm