I will provie the best comments as an answer that can help to solve this problem o NaN Accuracy:
Because i have been experienced some issues with PNG format images, I highlight recommend to use JPG/JPEG format, that is because sometimes, due to some layers that a PNG image has, it take the last layer and the image becomes the color of this layer, i.e., all the image is converted to a black or red... image. so, when you send these image to the network, it only will se one color image... nothing related to the rest of the images and the network will not be able to learn the features. Also be careful with the size of your filters. Also Johannes answer might be a solution in some cases.
Be careful with the size of your input image... When it is really big, as happened with Alexander, using only one convolution will be really difficult to the network to learn, because will have only one structure of weights for a really big amount of features that the network want to learn. I would recomend use at least 2 or 3 convolution for that size, even a size of 128x128, and to use Pooling layers to reduce the size that will enter to the Fully-conneced layer, because it will help but to classify the features extracted.
To initialize the weights, you need to define the convolution layer before the Layer struct:
conv1 = convolution2dLayer(F,D,'Padding',0,...
conv1.Weights = gpuArray(single(randn([F F 3 D])*0.0001));
conv1.Bias = gpuArray(single(randn([1 1 D])*0.00001+1));
You can initialize weights and the bias if needed. Remember, D is the amount of Filters to be used and F the size of the filter. Then, call your variable in the layer struct
layers = [ ...
imageInputLayer([128 128 3]);
and that is all.
Hope it helps,