Text Analytics Toolbox

Analyze and model text data

Text Analytics Toolbox provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling.

Text Analytics Toolbox includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. You can extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.

Using machine learning techniques such as LSA, LDA, and word embeddings, you can find clusters and create features from high-dimensional text data sets. Features created with Text Analytics Toolbox can be combined with features from other data sources to build machine learning models that take advantage of textual, numeric, and other types of data.

MATLAB code that extracts text data from Microsoft Word documents into a datastore.

Import and Visualize Text

Import text data into MATLAB from single files or large collections of files, including PDF, HTML, and Microsoft^® Word files. Visually explore text data sets using word clouds and text scatter plots.

Extract Text Data from PDF, HTML, Microsoft Word, Microsoft Excel, and CSV Files

Documentation | Examples

Screenshot of the Preprocess Text Data Live Editor task with results displayed as a word cloud.

Clean and Preprocess Text

Apply high-level filtering functions to remove extraneous content, such as URLs, HTML tags, and punctuation. Correct spelling, filter stop words, and normalize words to root form.

Clean and Preprocess Text Data in Live Editor

Documentation | Examples

MATLAB code for creating a scatter plot and the created word embedding t-SNE plot.

Convert Text to Structured Format

Extract linguistic features by using a tokenization algorithm, calculate word frequency statistics to represent text data numerically, and train word embedding models such as word2vec and skip-gram.

Explore and Visualize Word Embeddings

Documentation | Examples

Workflow for performing transfer learning with FinBERT transformer model on text data to identify positive and negative attitudes.

Apply AI to Text Analytics

Fit a machine learning or deep learning model, such as LSA, LDA, and LSTM, to text data. Leverage transformer models, such as BERT, FinBERT, and GPT-2, to perform transfer learning with text data.

Train BERT Document Classifier

Documentation | Examples

Large Language Models

Connect MATLAB to the OpenAI™ Chat Completions API. Leverage the natural language processing capabilities of GPT models within your MATLAB environment, for tasks such as text summarization and chatting.

Large Language Models (LLMs) with MATLAB

Documentation | Examples

Illustration of cleaning text data for natural language processing. On the left: word cloud of raw data. On the right: word cloud of cleaned data.

Text Analytics for Engineers

Develop predictive maintenance schedules based on sensors and text log data. Automate requirement formalization and compliance checking.

Information Retrieval with Work Orders Data

Documentation | Examples

Use text analytics to summarize multiple documents into one document.

Document Analysis

Analyze text with topic modeling to discover and visualize underlying patterns, trends, and complex relationships. Summarize documents, extract keywords, and evaluate document importance and similarity.

Classify Text Data Using Convolutional Neural Network

Documentation | Examples

Word clouds separated into positive and negative words.

Sentiment Analysis

Identify the attitudes and opinions expressed in text data to categorize statements as being positive, neutral, or negative. Build models that can predict sentiment in real time.

Sentiment Analysis in MATLAB

Documentation | Examples

Word cloud of generated text from the novel Pride and Prejudice.

Text Generation and Classification

Use deep learning to generate new text based on observed text and to classify text descriptions with word embeddings that can identify categories.

Generate Text Using Autoencoders

Documentation | Examples

Product Resources:

Documentation Examples Videos Technical articles Functions Requirements Release notes

Text Analytics Toolbox FAQs

Text Analytics Toolbox provides algorithms and visualizations for preprocessing, analyzing, and modeling text data in MATLAB.

You can extract text from PDF, HTML, Microsoft Word, Microsoft Excel, and CSV files.

The toolbox offers high-level filtering to remove URLs, HTML tags, and punctuation, as well as spelling correction, stop word filtering, and word normalization to root form.

The toolbox includes LSA, LDA, word embeddings (word2vec and skip-gram), and supports deep learning models such as LSTM and transformer models like BERT, FinBERT, and GPT-2.

Yes, features created with Text Analytics Toolbox can be combined with features from numeric, audio, and other data sources to build comprehensive machine learning models.

Common applications include sentiment analysis, predictive maintenance, topic modeling, document classification, text summarization, and keyword extraction.

The toolbox provides word clouds to display word frequency and text scatter plots to explore relationships between words and visualize word embeddings.

Yes, you can connect MATLAB to the OpenAI Chat Completions API to leverage GPT models for tasks such as text summarization and chatting.