Is anyone able to run gpu-based conv2 on a GTX 1080 or other pascal hardware?

I repeatedly get unspecified CUDA launch errors, despite the function calculating correctly on older Maxwell hardware using the same datasets. Can anyone else reproduce this?
Thanks, Nick

12 Comments

For example:
A = gpuArray.ones(100, 'single');
B = gpuArray.ones(50, 'single');
A(1:10,1:10) = conv2(A(1:10,1:10), B, 'same');
Error using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_FAILED
Error in dispInternal>iTransferPortionDense (line 36)
data = gather( subsref( x, s ) );
Error in parallel.internal.shared.buildDisplayHelper>iFirstNNumericDisplayHelper (line 72)
maybeTruncatedValue = transferDenseFcn( x, rangeStruct );
Error in parallel.internal.shared.buildDisplayHelper>iBuildDisplayHelper (line 33)
dh = iFirstNNumericDisplayHelper( ...
Error in parallel.internal.shared.buildDisplayHelper (line 24)
dh = iBuildDisplayHelper( x, transferDenseFcn, transferSparseFcn, xClassName, xName, N );
Error in dispInternal (line 13)
dh = parallel.internal.shared.buildDisplayHelper( ...
Error in gpuArray/display (line 21)
dh = dispInternal( obj, thisClassName, objName );
This works for me on the GTX 1080.
Your error appears to be in display, not in the convolution. Did you definitely enter the command with a semicolon at the end to suppress display? What happens when you call
Ac = gather(A);
? And what version of MATLAB are you using?
Hi Joss, thanks for your response.
I see this behaviour on R2015b, R2016a, and R2016b.
The error is not always reproducible. After a fresh matlab restart, it's often possible to run a few convolutions as above without error. However, once I start running my program it will crash.
I'm not sure it's the display that's causing it, per se, but that the kernel is only invoked once the data it calculates is requested. I can also generate the error by use the kernel result in another calculation.
I also want to point out that this is the same error I see in matlab when invoking a self-written kernel that has an indexing error. But, if this were the case I can't see why the same program should run fine on Maxwell and error on Pascal.
Anyway, I'll keep trying to find a code snippet that will guarantee an error and post it here. In the meantime I've rewritten the conv2 as an fft and worked around the issue :)
Are you using your GPU for display? Are these kernels long-running? It's possible Windows is timing-out your display. You might need to disable Timeout Detection and Recovery: https://msdn.microsoft.com/en-us/library/windows/hardware/ff570087(v=vs.85).aspx
I don't know what you mean by kernels only being invoked when the data is requested. This kind of lazy evaluation doesn't happen with conv. However, not all runtime errors can be picked up by MATLAB and may be reported as launch failures on the next line of GPU code.
Apologies, you may be right about the execution for conv, I was thinking about how my other parallel.gpu.CUDAKernels behave. However, it's definitely not the WDDM timeout. This is set to more than 20s on my machine, and the Maxwell card takes << 1s to calculate these.
Here, this code snippet creates the "CUDA_ERROR_ILLEGAL_ADDRESS" during gather with a GTX 1080 selected as the current GPU every time. Configuration is:
R2016b (Student) Intel 6950 32 GB RAM, 2x GTX 1080 1x (Maxwell) TITAN X. Windows 10
A = gpuArray.ones(100, 'single');
B = gpuArray.ones(10, 'single');
for c = 1:100
C = conv2(A,B, 'same')
C(1)
end;
I ran this a number of times on my GTX 1080 with no errors. It could be a Windows 10 display issue. A couple of questions:
  1. Are you definitely running on a compute card that isn't running the display? You appear to be saying you're running on your Titan X, which, being lower-performing than the GTX 1080, I'm guessing you have attached to the display? What is the output of gpuDevice?
  2. Does line 5 have to have no semicolon at the end? What if you put a semicolon at the end to suppress display? If it doesn't error any more, what happens if you put C = gather(C); after the convolution?
1) Yes. One 1080 runs the display, but it fails on both. gpuDevice returns normally for all cards.
2) It fails with or without suppressing the output.
Still, it's great to know that in principle there are platforms that don't have this issue. I am going to reinstall my graphics drivers and CUDA toolkit and then try to minimize the number of background processes and services to see if that helps.
Update: Re-installed CUDA Toolkit and Driver, shut down all non-essential background programs and still having problems.
The CUDA toolkit only affects your own MEX functions, it has nothing to do with MATLAB's own kernels or your kernels run using the CUDAKernel class. The driver it comes with could be an issue though - you may fair better installing the driver straight out of NVIDIA's driver downloads page, whichever is latest for your device. I for instance am running 367.44 on my GTX 1080s (but that's under linux, the version number will be different for Windows).
What I mean by the output of gpuDevice is that it would help to see what MATLAB displays when you call gpuDevice.
Can you confirm again that the above code is the first thing you run after starting MATLAB, and you haven't run any of your own CUDA kernels.
Name: 'GeForce GTX 1080'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5899e+09
AvailableMemory: 7.0585e+09
MultiprocessorCount: 20
ClockRateKHz: 1733500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Yes. Above code is first thing run.
Well, there are no obvious problem there, except that KernelExecutionTimeout is 1. But that is also true on my machine and I have no issues.
I'm going to get someone with Windows and a GTX 1080 to test your code, and then I may have to move you over to tech support. Meanwhile you should try some different legacy drivers listed at http://www.nvidia.com/Download/Find.aspx?lang=en-us.
Your issue reproduces on a GTX 1080 on Windows. Thanks for reporting. It will take a little while to investigate this.
No problem, like I said above the ifft(fft*fft) approach works and the performance hit isn't that severe to my application. I'm mostly glad I can stop pulling my hair out trying to figure out what I've misconfigured. Thanks for being so responsive.

Sign in to comment.

Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

on 29 Sep 2016

Commented:

on 6 Oct 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!