As far as I know, batchnormalization is used for speeding up model training, it does not influence network results too much. However, the model training failed after i adding them. what is strange is that it is ok when i removed them. the both results are as follows:
with batchNormalizationLayer:
without batchNormalizationLayer:
so why?