This should not be that difficult. One trick might be to use ndgrid, and then a sort. For example, consider the function adunroll below.
[R,C] = ndgrid(1:nr,1:nc);
M = [reshape(R+C,,1),R(:)];
I won't test the time required, but the above code should be eminently reasonable in terms of time required, mainly becaue it needs only a few very simple function calls. The nice thing is, the code is what I would call elegant, clean and simple to follow. It also has no need to worry about the general shape of A. Square or rectangular in any form will still work. All of that makes the code good, since it will be easy to debug and maintain.
Look carefully at how I created the array M, as that is the key to making this work for any size of array. For example, look at the matrix (R+C). Does that give you a hint at what it was done?
The trick of building on ndgrid is a nice one to remember.
I was going to compare the time required, but I see that your code wants to take an nxmx3 array, but your question was specific to a nxm array. The trick here would be simple enough. Only compute the indices ONCE, then apply them to each panel of the 3 dimensional array.