gpuDevice Crashing Matlab

I'm try to get up and running with GPU computing through the Parallel Computing Toolbox, but I'm having trouble getting the toolbox to work. When I run "gpuDevice", "gpuDeviceCount", or "gpuArray", Matlab instantaneously crashes, leaving only a "6573 Floating point exception" error in my shell window (the number changes every time). The crash leaves behind a "matlab_crash_dump" file, but the file is empty. Has anyone had this problem before and been able to discover what the problem was?
I'm on a Linux machine with a Quadro 4000 GPU and NVidia's 295.20 drivers. I've had this problem since I got the toolbox a few months ago, but at the time assumed it was because I was using an old and unsupported set of drivers. Those have been updated now, but I still get the same problem.
Thanks

Answers (2)

Jason Ross
Jason Ross on 13 Apr 2012

0 votes

What distro? What version of MATLAB? 64 or 32 bit?
If you run "nvidia-smi --query", do you get usable output? How does the device show up in the nvidia-settings application?
Is the Quadro being used for display and compute, or is it compute only?
FWIW when I've seen odd problems like this, the cause has come down to a defective card. Typical setup is to install the driver and start MATLAB, then it works.

4 Comments

Greg
Greg on 13 Apr 2012
Running R2012a now, though I had the same issue before in R2011b (before I got the GPU driver updated). 64-bit Linux. The Quadro is being used for display and compute purposes, but I haven't attempted to run any other compute applications on it.
(I am running two monitors in "Twinview" which has caused Matlab issues before when trying to perform certain graphically-complex tasks on the secondary monitor. That shouldn't play into this, should it?)
The nvidia-smi --query returns:
==============NVSMI LOG==============
Timestamp : Fri Apr 13 11:49:32 2012
Driver Version : 295.20
Attached GPUs : 1
GPU 0000:03:00.0
Product Name : Quadro 4000
Display Mode : Enabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320211009948
GPU UUID : GPU-66059132-1610-05c0-618b-ad9e5cd80320
VBIOS Version : 70.00.2F.00.12
Inforom Version
OEM Object : 1.0
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x06DD10DE
Bus Id : 0000:03:00.0
Sub System Id : 0x078010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : 36 %
Performance State : P12
Memory Usage
Total : 2047 MB
Used : 189 MB
Free : 1857 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 14 %
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : 58 C
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : 50 MHz
SM : 101 MHz
Memory : 135 MHz
Max Clocks
Graphics : 475 MHz
SM : 950 MHz
Memory : 1404 MHz
Compute Processes : None
Could you try running it with a single monitor only?
Greg
Greg on 13 Apr 2012
Tried that and got the same crash.
Huh. I'm rapidly running short on ideas.
Do you by chance have the CUDA toolkit / SDK installed? There is an example in there called deviceQuery. I'm wondering if it would give you a response, or crash?
Also, do you have the capability to put this card in another machine and/or use it in Windows? It would be interesting to see if the crash would follow it.

Sign in to comment.

Yair Carmon
Yair Carmon on 13 Aug 2015

0 votes

I had a similar issue on a remote server that ran Ubuntu 12.04, Matlab 2015a, CUDA 7.0, and a GeForce GTX 960. During a routine run of my application, the nvidia-smi utility (which was open using watch nvidia-smi, to monitor GPU utilization) suddenly printed "Error" instead of things like temperature and available memory. A complete system crash followed immediately, and it was necessary to power cycle the machine before it started responding to ping again.
When the system came back online I had the problems reported above: any attempt to run nvidia-smi or gpuDevice/gpuArray would result in a crash. It was not a problem with the card - we swapped GPU's and the issue persisted. Uninstalling and reinstalling the CUDA toolkit using apt-get did not help either. The problem was finally resolved by reinstalling the entire OS, Matlab and CUDA 7.0 in that order. I suspect that using the CUDA 7.0 .run installation might have solved the problem without having to go through OS installation. I hope to never have a chance to check that :).

Asked:

on 13 Apr 2012

Answered:

on 13 Aug 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!