Run MEX Functions Containing CUDA Code

If running MATLAB^® functions on the GPU does not sufficiently speed up your code, or if you need to use advanced GPU CUDA^® features, you can write your own CUDA code and run it in MATLAB by generating an executable MEX file using mexcuda.

Write MEX File Containing CUDA Code

All MEX files, including those containing CUDA code, have a single entry point known as mexFunction. The MEX function contains the host-side code that interacts with gpuArray objects from MATLAB and launches the CUDA code. The CUDA code in the MEX file must conform to the CUDA runtime API.

You must call the mxInitGPU function at the entry to your MEX file to ensure that the GPU device is properly initialized and known to MATLAB.

The interface you use to write a MEX file for gpuArray objects is different from the MEX interface for standard MATLAB arrays.

You can see an example of a MEX file containing CUDA code here:

matlabroot/toolbox/parallel/gpu/extern/src/mex/mexGPUExample.cu

The file contains this CUDA device function:

void __global__ TimesTwo(double const * const A,
                         double * const B,
                         int const N)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < N)
        B[i] = 2.0 * A[i];
}

The file also contains these lines, which determine the array size and launch a grid of the proper size:

N = (int)(mxGPUGetNumberOfElements(A));
blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
TimesTwo<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, N);

Compile GPU MEX File

Use the mexcuda function in MATLAB to compile a MEX file containing the CUDA code. By default, the mexcuda function compiles the CUDA code using the NVIDIA^® CUDA compiler (nvcc) installed with MATLAB. The software forwards further compilation steps to a C++ host compiler installed on your system. To check which compilers mexcuda is using, use the -v flag for verbose output in the mexcuda function.

mexcuda mexGPUExample.cu

If you have installed the CUDA toolkit in a nondefault location, you can specify the location of nvcc on your system using the environment variable MW_NVCC_PATH. To set this variable, use the setenv command.

setenv("MW_NVCC_PATH","/usr/local/CUDA/bin")

Supported Host Compilers

To compile a MEX file using the mexcuda function, you must have a supported C++ host compiler installed. mexcuda only supports a subset of Visual Studio^® compilers. To determine whether your compiler is supported, follow these steps:

Determine which version of CUDA your version of MATLAB uses by consulting the table in Install CUDA Toolkit (Optional).
Consult the NVIDIA CUDA Toolkit Documentation corresponding to the CUDA version determined in step 1. The documentation lists the supported compilers in the installation guide section.

Run Resulting MEX Functions

The MEX function in this example multiplies every element in the input array by 2 to get the values in the output array. To test the function, start with a gpuArray matrix in which every element is 1:

x = ones(4,4,"gpuArray");
y = mexGPUExample(x)

y = 

    2    2    2    2
    2    2    2    2
    2    2    2    2
    2    2    2    2

The input and output arrays are gpuArray objects.

Compare to a CUDA Kernel

Parallel Computing Toolbox™ software also supports CUDAKernel objects, which you can use to integrate CUDA code with MATLAB. You can create CUDAKernel objects using CU and PTX files. Generally, using MEX files is more flexible than using CUDAKernel objects because:

MEX files can include calls to host-side libraries, including NVIDIA libraries such as the NVIDIA performance primitives (NPP) or cuFFT libraries. For more information, see Call Host-Side Libraries. MEX files can also contain calls from the host to functions in the CUDA runtime library.
MEX files can analyze the size of the input and allocate memory of a different size, or launch grids of a different size, from C or C++ code. In contrast, MATLAB code that calls CUDAKernel objects must preallocate output memory and determine the grid size.

Call Host-Side Libraries

To allow your MEX function to call a host-side library, such as NPP or cuFFT:

Link to the library using the -L and -l options when you compile the mex function. Specify the folder containing the library with the -L option and the library name with the -l option. For example, this code compiles the source file and links to the cuFFT library. Note that the path to the library might be different on your machine.
```
mexcuda '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\lib\x64' -lcufft mexFFTGPUExample.cu
```
At the top of your CUDA C++ source file, include the relevant header file. For example, this code includes the contents of cufft.h into the source file.
```
#include <cufft.h>
```

Access Complex Data

Complex data on a GPU device is stored in interleaved complex format. That is, for a complex gpuArray A, the real and imaginary parts of each element are stored in consecutive addresses. MATLAB uses CUDA built-in vector types to store complex data on the device. For more information, see the NVIDIA CUDA C++ Programming Guide.

Depending on the needs of your kernel, you can cast the pointer to complex data as the real type or as the built-in vector type. For example, in MATLAB, suppose you create this matrix:

a = complex(ones(4,"gpuArray"),ones(4,"gpuArray"));

If you pass a gpuArray to a MEX function as the first argument prhs[0], then you can get a pointer to the complex data by using these calls:

mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]);
mwSize numel_complex = mxGPUGetNumberOfElements(A);
double2 * d_A = (double2 const *)(mxGPUGetDataReadOnly(A));

To treat the array as a real, double-precision array of twice the length, use these calls:

mxGPUArray const * A = mxGPUCreateFromMxArray(prhs[0]);
mwSize numel_real = 2*mxGPUGetNumberOfElements(A);
double * d_A = (double const *)(mxGPUGetDataReadOnly(A));

You can convert data between complex and real formats on the GPU using these Parallel Computing Toolbox functions. These operations require a copy to interleave the data.

The mxGPUCreateComplexGPUArray function creates a complex mxGPUArray from two real mxGPUArray objects that specify the real and imaginary components.
The mxGPUCopyReal and mxGPUCopyImag functions copy the real or the imaginary elements, respectively, of an mxGPUArray to a single real mxGPUArray.

The mxGetImagData function has no equivalent for mxGPUArray objects.

Install CUDA Toolkit (Optional)

The CUDA Toolkit installed with MATLAB does not contain all libraries that are available in the CUDA Toolkit. If you want to use a specific library that is not installed with MATLAB, install the CUDA Toolkit.

Note

You do not need the CUDA Toolkit to run MATLAB functions on a GPU or to generate CUDA-enabled MEX functions.

The CUDA Toolkit contains CUDA libraries and tools for compilation.

Download the appropriate CUDA toolkit version for the version of MATLAB you are using. Check which version of the toolkit is compatible with your version of MATLAB using this table. Recommended best practice is to use the latest version of your supported CUDA Toolkit, including any updates and patches from NVIDIA.

MATLAB Release	CUDA Toolkit Version
R2026a	12.8
R2025b	12.2
R2025a	12.2
R2024b	12.2
R2024a	12.2
R2023b	11.8
R2023a	11.8
R2022b	11.2
R2022a	11.2
R2021b	11.0
R2021a	11.0
R2020b	10.2
R2020a	10.1
R2019b	10.1
R2019a	10.0
R2018b	9.1
R2018a	9.0
R2017b	8.0
R2017a	8.0
R2016b	7.5
R2016a	7.5
R2015b	7.0
R2015a	6.5
R2014b	6.0
R2014a	5.5
R2013b	5.0
R2013a	5.0
R2012b	4.2
R2012a	4.0
R2011b	4.0

For more information about the CUDA Toolkit and to download your supported version, see CUDA Toolkit Archive (NVIDIA).