Neural Networks Algorithm for predicting the fourth word in a sentence

Question

Mohamed Temraz on 29 Jan 2012

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/27346-neural-networks-algorithm-for-predicting-the-fourth-word-in-a-sentence

Commented: Greg Heath on 20 Feb 2015

I have this assignment with where we're required to analyze a learning algorithm for predicting the fourth word in a sentence. It is a 4-grams model, with the first three words given. The parameters we can change are d=the number of dimensions we can represent a word, and numHid= the number of hidden units in the hidden layer (here we're using a single hidden layer). So I trained the algorithm with different d every time and different numHid, the algorithm stops automatically when the validation error starts increasing. My question is: What does the number of epochs represent? is it better for the epochs to be minimum? provided that the learning rate is kept constant throughout the algorithm. Should I use the parameters that give me the minimum cross Entropy error?

Thanks

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 29 Jan 2012

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/27346-neural-networks-algorithm-for-predicting-the-fourth-word-in-a-sentence#answer_35583

>I have this assignment with where we're required to analyze a learning algorithm for predicting the fourth word in a sentence. It is a 4-grams model, with the first three words given. The parameters we can change are d=the number of dimensions we can represent a word,

How, exactly, are words represented? How many 4-word combinations do you have? Are there multiple combinations that have the same 4th word?

>and numHid= the number of hidden units in the hidden layer (here we're using a single hidden layer). So I trained the algorithm

What kind of algorithm? What is it's name? Are you using the NN Toolbox?

> with different d every time and different numHid, the algorithm stops automatically when the validation error starts increasing. My question is: What does the number of epochs represent?

The interval between successive weight update stages is an epoch

> is it better for the epochs to be minimum? provided that the learning rate is kept constant throughout the algorithm.

Regardless of learning rate, the ultimate goal is to minimize the performance error on nondesign data. Speed is of secondary importance.

>Should I use the parameters that give me the minimum cross Entropy error?

Use the parameters that optimize YOUR measure of performance. From my point of view you have a classification problem and should try to minimize the rate of failure to choose the correct 4th word. However, classification error rate is not continuous. Therefore, it is much better to use a continuous objective function like mean-square-error or cross-entropy.

In the words of Confusious: "Try both, choose best"

Hope this helps.

Greg

2 Comments
Show NoneHide None

Mohamed Temraz on 29 Jan 2012

Thanks Greg. Here is a detailed explanation of the assignment.

In this assignment, you will run code that trains a simple neural language

model on a dataset of sentences that were culled from a large corpus of

newspaper articles so that the culled sentences would have a highly restricted

vocabulary of only 250 words.

The model you will train on these data produces a distribution over the next

word given the previous three words as input. Since the neural network will be

trained on 4-grams extracted from isolated sentences, this means that it will

never be asked to predict any of the first three words in a sentence. The

neural network learns a d-dimensional embedding for the words in the vocabulary

and has a hidden layer with numHid hidden units fully connected to the single

output softmax unit. If we are so inclined, we can view the embedding as

another (earlier) hidden layer with weights shared across the three words.

After you have loaded the data, you can set d and numHid like so:

>> d = 8; numHid = 64;

and then run the main training script,

>> train;

which will train the neural network using the embedding dimensionality and

number of hidden units specified.

The training script monitors the cross entropy error on the validation data and

uses that information to decide when to stop training. Training stops as soon

as the validation error increases and the final weights are set to be the

weights from immediately before this increase. This procedure is a form of

"early stopping" and is a common method for avoiding overfitting in neural

network training.

Here is a list of the variables of interest that the train script puts in the

workspace:

wordRepsFinal - the learned word embedding

repToHidFinal - the learned embedding-to-hidden unit weights

hidToOutFinal - the learned hidden-to-output unit weights

hidBiasFinal - the learned hidden biases

outBiasFinal - the learned output biases

epochsBeforeVErrUptick - the number of epochs of training before validation

error increased. In other words, the number of training epochs used to produce

the final weights above.

finalTrainCEPerCase - The per case cross entropy error on the training set of

the weights with the best validation error.

finalValidCEPerCase - The per case cross entropy error on the validation set

of the final weights (this will always be the best validation error the

training script has seen).

finalTestCEPerCase - The per case cross entropy error on the test set of

the final weights.

You must train the model four times, trying all possible combinations of

d=8,d=32 and numHid=64,numHid=256. You must record the final cross entropy

error on the training, validation, and test sets (stored in the appropriate

variables mentioned above) for each of these runs. You must also record the

number of epochs before a validation error increase (stored in

epochsBeforeVErrUptick) for each of the runs.

Select the best configuration that you ran. The function wordDistance has been

provided for you so that you can compute the distance between the learned

representations of two words. The wordDistance function takes two strings, the

wordReps matrix that you learned (use wordRepsFinal unless you have a good

reason not to) and the vocabulary. For example, if you wanted to compute the

distance between the words "and" and "but" you would do the following (after

training the model of course).

>> wordDistance('and', 'but', wordRepsFinal, vocab)

The wordDistance function simply takes the feature vector corresponding to each

word and computes the L2 norm of the difference vector. Because of this, you

can only meaningfully compare the relative distances between two pairs of words

and discover things like "the word 'and' is closer to the word 'but' than it is

to the word 'or' in the learned embedding." If you are especially enterprising,

you can compare the distance between two words to the average distance to each

of those words. Remember that if you want to enter a string that contains the

single quote character in matlab you must escape it with another single

quote. So the string apostrophe s, which is in the vocabulary, would have to be

entered as

>> '''s'

in matlab.

Compute the distances between a few words and look for patterns. See if you can

discover a few interesting things about the learned word embedding by looking

at the distances between various pairs of words. What words would you expect to

be close together? Are they? Think about what factors contribute to words being

given nearby feature vectors. You can access the vocabulary of words from the

'vocab' variable and the raw sentences from the file rawSentences.txt.gz

Greg Heath on 20 Feb 2015

I don't remember this 3 year old post. However, it sure would have been useful to see a few inputs and the corresponding targets.

Sign in to comment.

Neural Networks Algorithm for predicting the fourth word in a sentence

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

Neural Networks Algorithm for predicting the fourth word in a sentence

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None