Why in topic modeling by LDA, stop words still exist within the generated topics, although I removed it by the stop words removal function ?

1 view (last 30 days)

Jack on 25 Sep 2021

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1460434-why-in-topic-modeling-by-lda-stop-words-still-exist-within-the-generated-topics-although-i-remove

Edited: Jack on 25 Sep 2021

Hello and good day to you..

I am doing topic modling by Latent Dirichlet Allocation (LDA), and this require preprocessing (cleaning) the data before. Thus, I did preprocessing steps in order as follows:

1- Tokenize the text using tokenizedDocument.

2- addPartOfSpeechDetails

3- Lemmatize the words using normalizeWords.

4- Erase punctuation using erasePunctuation.

5- Remove a list of stop words (such as "and", "of", and "the") using removeStopWords.

6- Remove words with 2 or fewer characters using removeShortWords.

7- Remove words with 15 or more characters using removeLongWords.

However, when topics generated by the LDA model, whereby a topic in LDA means (a collection of propably related words), there is a topic contain stop words although it were removed from the data by the step number 5. thus it must not be exist in the data to be modeld by the LDA. why these stop words still there and showed as one of resulted topics, althgouh these words do not even exist in the Vocabulary of the model ?

Please HELP !

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Why in topic modeling by LDA, stop words still exist within the generated topics, although I removed it by the stop words removal function ?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Why in topic modeling by LDA, stop words still exist within the generated topics, although I removed it by the stop words removal function ?

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments