Vector estimate of importance of each word in a text
4 views (last 30 days)
What is a simple, widely used, practical way to estimate the 'importance' of each word in book length texts? Need to produce a vector of the estimated importance for each occurence of each word in the text. The output should be a vector with the same length as the sequential number of words in the text (including each occurrence separately) that provides an estimate of word importance for each word in the text.
for example, would like to be able to do something like this:
Imp = wordImportance(longString);
where Imp will be a two-column table of all the words in the string that will be the same number of rows as the sequential number of words in longString (including each occurrence separately, and including stopwords etc) and their importance in context within the text, likely based on ngram.
I have access to the text analytics toolbox. I don't have sufficient knowledge of how to use it.
There appears to be an entire sub-field of text analysis devoted to this problem. I am aware that there are many approaches and that selecting one depends on the details of what you are trying to achieve. That said, I need a basic importance estimate for each word to get started.
The examples for using rakeKeywords and textrankKeywords seem relevant, but don't ultimately produce a vector output of the estimated importance of each word in a document.
Thank you for your help
Constantino Carlos Reyes-Aldasoro on 2 Sep 2022
Why don't you start with a simpler problem, say that you want to count the occurrences of certain words. That would be a problem that is easy to validate manually and that will help you get started. Then you can increase the complexity step by step towards importance, say importance is similarity to a certain word, or belonging to a certain group (colours, fruits, etc.) so if your importance is being member of the group you give a value.