# QR Decomposition on NVIDIA GPU Using cuSOLVER Libraries

This example shows how to create a standalone CUDA® executable that leverages the CUDA Solver library (cuSOLVER). The example uses a curve fitting application that mimics automatic lane tracking on a road to illustrate:

Fitting an arbitrary-order polynomial to noisy data by using matrix QR factorization.

Using the

`coder.LAPACKCallback`

class to provide the LAPACK library information for the code generator when generating standalone executables.

### Prerequisites

CUDA enabled NVIDIA® GPU.

NVIDIA CUDA toolkit and driver.

LAPACK library that is optimized for your execution environment. For more information, see LAPACK vendors implementations. This example uses the

`mwlapack`

libraries that MATLAB® provides in*matlabroot/extern/lib*.Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-Party Hardware. For setting up the environment variables, see Setting Up the Prerequisite Products.

### Verify GPU Environment

To verify that the compilers and libraries necessary for running this example are set up correctly, use the `coder.checkGpuInstall`

function.

```
envCfg = coder.gpuEnvConfig('host');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);
```

### Solve a Linear System by Using Matrix Factorization

In curve fitting applications, the objective is to estimate the coefficients of a low-order polynomial. The polynomial is then used as a model for observed noisy data, which in this example represents the lane boundary of the road ahead of a vehicle. For example, when using a quadratic polynomial, there are three coefficients ($$a$$, $$b$$, and $$c$$) to estimate:

$$a{x}^{2}+bx+c$$

The polynomial that fits best is defined as the one that minimizes the sum of the squared errors between itself and the noisy data. To solve this least-squares problem, you get and solve an overdetermined linear system. An explicit matrix inverse is not required to solve the system.

In this example, the unknowns are the coefficients of each term in the polynomial. Because the polynomial you use as a model always starts from the current position on the road, the constant term in the polynomial is assumed to be zero. Estimate the coefficients for the linear and higher-order terms. Set up a matrix equation *Ax=y* such that:

$$y$$ contains the sensor outputs.

$$x$$ contains the polynomial coefficients that we need to obtain.

$$A$$ is a constant matrix related to the order of the polynomial and the locations of the sensors.

Solve the equation using the *QR* factorization of $$A$$:

$$Ax=QRx=y$$

and

$$x=pinv(A)*y={R}^{-1}{Q}^{T}*y$$

where *pinv()* represents pseudo-inverse. Given the matrix $$A$$, you can use the following code to implement a solution of this matrix equation. Factoring $$A$$ allows for an easier solution of the system.

[Q,R,P] = qr(A); z = Q' * y; x = R \ z; yhat = A * x;

Use the linsolveQR function to solve the equation using the *QR* factorization of $$A$$.

`type linsolveQR.m`

function [yhat,x] = linsolveQR(A,y) %#codegen % Copyright 2019 The MathWorks, Inc. [Q,R,P] = qr(A); z = Q' * y; x = R \ z; yhat = A * x; end

### Signal Model for the Road

To test the algorithm, a continuously curving road model is used, that is, a sinusoid that is contaminated with additive noise. By varying the frequency of the sinusoid in the model, you can stress the algorithm by different amounts. This code simulates noisy sensor outputs using our road model:

% Duration - Distance that we look ahead % N - Total number of sensors providing estimates of road boundary % Ts - Sample interval % FracPeriod - Fraction of period of sinusoid to match % y - Contains the simulated sensor outputs Duration = 2; N = 25; Ts = Duration / N; FracPeriod = 0.5; y = sin(2*pi* (0:N-1)' * (FracPeriod/N)) + sqrt(0.3) * randn(N,1);

Use this code to form the Vandermonde matrix $$A$$:

Npoly = 3; % Order of polynomial to use in fitting v = (0:Ts:((N-1)*Ts))'; A = zeros(length(v), Npoly); for i = Npoly : -1 : 1 A(:,i) = v.^i; end

The Vandermonde matrix $$A$$ and sensor outputs matrix $$y$$ are passed as input parameters to the `linsolveQR`

entry-point function. These inputs are written to comma-separated files and are read from the handwritten main qrmain.cu.

writematrix(reshape(A, 1, 75), 'inputA.csv'); writematrix(reshape(y, 1, 25), 'inputY.csv');

### Custom Callback Class for Standalone Code Generation

The `qr`

function is only partially supported in the `cuSOLVER`

library. In such cases, GPU Coder™ uses the `LAPACK`

library for certain linear algebra function calls. `LAPACK`

is an external software library for numeric linear algebra. For MEX targets, the code generator uses the `LAPACK`

library included in MATLAB.

For standalone targets, you must define a custom `coder.LAPACKCallback`

class that specifies the LAPACK libraries along with the header files to use for linear algebra calls in the generated code. In this example, the lapackCallback callback class specifies the paths to these libraries in `updateBuildInfo`

method. You must modify this file with the library names and paths for the custom LAPACK installation on your computer.

`type lapackCallback.m`

classdef lapackCallback < coder.LAPACKCallback % % Copyright 2019 The MathWorks, Inc. methods (Static) function hn = getHeaderFilename() hn = 'lapacke.h'; end function updateBuildInfo(buildInfo, buildctx) [~, libExt] = buildctx.getStdLibInfo(); % Specify path to LAPACK library if ispc lapackLocation = [matlabroot,'\extern']; libName = ['libmwlapack' libExt]; buildInfo.addIncludePaths([lapackLocation,'\include']); libPath = [lapackLocation,'\lib\win64\microsoft\']; else lapackLocation = [matlabroot]; libName = ['libmwlapack' libExt]; buildInfo.addIncludePaths([lapackLocation,'/extern/include']); libPath = [lapackLocation,'/bin/glnxa64']; end % Add include path and LAPACK library for linking buildInfo.addLinkObjects(libName, libPath, 1000, true, true); buildInfo.addDefines('HAVE_LAPACK_CONFIG_H'); buildInfo.addDefines('LAPACK_COMPLEX_STRUCTURE'); end end end

### Standalone Code Generation

Generate a standalone executable by the specifying `CustomLAPACKCallback`

property in the code configuration object and using a handwritten main `qrmain.cu`

.

cfg = coder.gpuConfig('exe'); cfg.GpuConfig.EnableCUSOLVER = 1; cfg.CustomLAPACKCallback = 'lapackCallback'; cfg.CustomSource = 'qrmain.cu'; cfg.CustomInclude = '.'; codegen -config cfg -args {A,y} linsolveQR -report

Code generation successful: View report

### Standalone Code Execution

When you execute the generated standalone executable, the outputs $$yhat$$ and $$x$$ are computed and written to comma-separated files. Read these outputs back in MATLAB and use the `plot`

function to visualize the sensor data and fitted curve.

if ispc system('linsolveQR.exe'); else system('./linsolveQR'); end yhat = reshape(readmatrix('outputYhat.csv'), 25, 1); x = reshape(readmatrix('outputX.csv'), 3, 1); figure plot(v, y, 'k.', v, yhat, 'r') axis([0 N*Ts -3 3]); grid; xlabel('Distance Ahead of the Vehicle'); legend('Sensor data','Curve fit'); title('Estimate the Lane Boundary Ahead of the Vehicle');

## See Also

### Functions

`codegen`

|`coder.gpu.kernel`

|`coder.gpu.kernelfun`

|`gpucoder.matrixMatrixKernel`

|`coder.gpu.constantMemory`

|`gpucoder.stencilKernel`

|`coder.checkGpuInstall`