The following link has a sample code for classifying a sequence of images. The networks is built to classify a sequence of 28 by 28 grayscale images. It was not clear fot me the shape of the input data. How does the network "understand" that the sequence in this case consists of one image? What is the shape of the matrix or object holding all images? is it [28 28 1 1 1000] for [height, width, channels, time, number of sequences]?