# Improve Performance of Small Matrix Problems on the GPU Using `pagefun`

This example shows how to use `pagefun`

to improve the performance of independent operations applied to multiple matrices arranged in a multidimensional array.

Multidimensional arrays are an extension of 2-D matrices and use additional subscripts for indexing. A 3-D array, for example uses three subscripts. The first two dimensions represent a matrix and the third represents pages of elements (sometimes referred to as slices). For more information, see Multidimensional Arrays.

While GPUs can effectively apply small independent operations to large matrices, performance is suboptimal when these operations are applied in serial, for example when the operations are applied in a `for`

-loop. In order to avoid serial processing, the `arrayfun`

function applies a scalar operation to each element of an array in parallel on the GPU. Similarly, the `pagefun`

function applies a function to each page of a multidimensional GPU array.

The `pagefun`

function supports applying most element-wise functions and a number of matrix operations that support GPU array input. MATLAB® also provides a number of dedicated page-wise functions, including `pagemtimes`

, `pagemldivide`

, `pagemrdivide`

, `pagetranspose`

, `pagectranspose`

, `pageinv`

, `pagenorm`

, and `pagesvd`

. Depending on the task, these functions might simplify your code or provide better performance than using `pagefun`

.

In this example, a robot is navigating a known map containing a large number of features that the robot can identify using its sensors. The robot locates itself in the map by measuring the relative position and orientation of those features and comparing them to the map locations. Assuming the robot is not completely lost, it can use any difference between the two to correct its position, for instance by using a Kalman Filter. This example shows an efficient way to compute the feature positions relative to the robot.

### Set up the Map

Define the dimensions of a room containing a number of features.

roomDimensions = [50 50 5];

The supporting function `randomTransforms`

is provided at the end of this example and initializes `N`

transforms with random values, providing a structure as output. It represents positions and orientations using 3-by-1 vectors `T`

and 3-by-3 rotation matrices `R`

. The `N`

translations are packed into a 3-by-`N`

matrix and the rotations are packed into a 3-by-3-by-`N`

array.

Use the `randomTransforms`

function to set up a map of `1000`

features, and a start location for the robot.

numFeatures = 1000; Map = randomTransforms(numFeatures,roomDimensions); Robot = randomTransforms(1,roomDimensions);

The `plotRobot`

function is provided as a supporting file with this example and plots a top-down view of the room, and a close up view of the robot and nearby features. The robot is represented by a blue box with wheels and the features are represented by red circles with accompanying lines representing their orientation. To use this function, open the example as a livescript.

Call the `plotRobot`

function.

plotRobot(Robot,Map)

### Define the Equations

To correctly identify the features in the map, the robot needs to transform the map to put its sensors at the origin. Then it can find map features by comparing what it sees with what it expects to see.

For a map feature $$i$$ we can find its position relative to the robot $${T}_{rel}(i)$$ and orientation $${R}_{rel}(i)$$ by transforming its global map location:

$$\begin{array}{l}{R}_{rel}(i)\phantom{\rule{0.2777777777777778em}{0ex}}=\phantom{\rule{0.2777777777777778em}{0ex}}{R}_{bot}^{\top}{R}_{map}(i)\\ {T}_{rel}(i)\phantom{\rule{0.2777777777777778em}{0ex}}=\phantom{\rule{0.2777777777777778em}{0ex}}{R}_{bot}^{\top}({T}_{map}(i)-{T}_{bot})\end{array}$$

where $${T}_{bot}$$ and $${R}_{bot}$$ are the position and orientation of the robot, and $${T}_{map}(i)$$ and $${R}_{map}(i)$$ represent the map data. The equivalent MATLAB code looks like this:

Rrel(:,:,i) = Rbot' * Rmap(:,:,i) Trel(:,i) = Rbot' * (Tmap(:,i) - Tbot)

### Perform Matrix Transforms on the CPU using a `for`

-loop

The supporting function `loopingTransform`

is provided at the end of this example and loops over all the transforms in turn, transforming each feature to its location relative to the robot. Note the `like`

name-value argument for `zeros`

function which makes the function return an array of zeros of the same data type as a prototype array. For example, if the prototype array is a `gpuArray`

, then `zeros`

returns a `gpuArray`

. This allows you to use the same code on the GPU in the next section.

Time the calculations using the `timeit`

function. The `timeit`

function times the execution of `loopingTransform`

multiple times and returns the median of the measurements. Since `timeit`

requires a function with no arguments, use the `@()`

syntax to create an anonymous function of the right form.

cpuTime = timeit(@()loopingTransform(Robot,Map,numFeatures))

cpuTime = 0.0042

### Perform Matrix Transforms on the GPU using a `for`

-loop

To run the same code on the GPU, simply pass the input data to the function as a `gpuArray`

. A `gpuArray`

represents an array stored in GPU memory. Many functions in MATLAB and in other toolboxes support `gpuArray`

objects, allowing you to run your code on GPUs with minimal changes to the code. For more information, see Run MATLAB Functions on a GPU.

Ensure that your desired GPU is available and selected.

```
gpu = gpuDevice;
disp(gpu.Name + " GPU selected.")
```

NVIDIA RTX A5000 GPU selected.

Create GPU arrays containing the position and orientation of the robot and the features in the map.

gMap.R = gpuArray(Map.R); gMap.T = gpuArray(Map.T); gRobot.R = gpuArray(Robot.R); gRobot.T = gpuArray(Robot.T);

Time the calculations using the `gputimeit`

function. The `gputimeit`

function is the equivalent of `timeit`

for code that includes GPU computation. It makes sure all GPU operations have finished before recording the time.

gpuTime = gputimeit(@()loopingTransform(gRobot,gMap,numFeatures))

gpuTime = 0.1905

### Perform Matrix Transforms on the GPU using `pagefun`

The GPU version is very slow because, although all calculations were independent, they ran in series. Using `pagefun`

we can run all the computations in parallel.

The supporting function `pagefunTransform`

is provided at the end of this example and applies the same transforms as the `loopingTransform`

function using `pagefun`

instead of a `for`

-loop. The first computation is the calculation of the rotations. This involves a matrix multiply, which translates to the function `mtimes`

(`*`

). Pass this to `pagefun`

along with the two sets of rotations to be multiplied:

Rel.R = pagefun(@mtimes,Robot.R',Map.R);

`Robot.R'`

is a 3-by-3 matrix, and `Map.R`

is a 3-by-3-by-N array. The `pagefun`

function matches each independent matrix from the map to the same robot rotation, and gives us the required 3-by-3-by-N output.

The translation calculation also involves a matrix multiply, but the normal rules of matrix multiplication allow this to come outside the loop without any changes:

Rel.T = Robot.R' * (Map.T - Robot.T);

Time the calculations using the `gputimeit`

function.

gpuPagefunTime = gputimeit(@()pagefunTransform(gRobot,gMap))

gpuPagefunTime = 3.3066e-04

### Compare Results

Plot the timing results.

figure labels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]); bar(labels,[cpuTime,gpuTime,gpuPagefunTime]) ylabel("Execution Time (s)") set(gca,YScale="log")

Calculate how much faster the execution using `pagefun`

is than CPU and simple GPU execution.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ... cpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 12.65 times faster than on the CPU.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using for-loops on the GPU.\n", ... gpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 576.17 times faster than using for-loops on the GPU.

### Locate a Lost Robot Using Multiple Possible Robot Positions

If the robot is in an unknown part of the map, it can use a global search algorithm to locate itself. The algorithm tests a number of possible locations by carrying out the above computation and looking for good correspondence between the features seen by the robot's sensors and what it would expect to see at that position.

Now there are multiple possible robot positions as well as multiple features. N features and M robots requires `N*M`

transforms. To distinguish 'robot space' from 'feature space', use the 4th dimension for rotations and the 3rd for translations. That means that the robot rotations will be 3-by-3-by-1-by-M, and the translations will be 3-by-1-by-M.

Initialize the search with ten random robot locations. A good search algorithm would use topological or other clues to seed the search more intelligently.

numRobots = 10; Robot = randomTransforms(numRobots,roomDimensions); Robot.R = reshape(Robot.R,3,3,1,[]); % Spread along the 4th dimension Robot.T = reshape(Robot.T,3,1,[]); % Spread along the 3rd dimension

A supporting function `loopingTransform2`

is defined at the end of this example and performs a looping transform using two nested loops, to loop over the robots as well as over the features.

Time the calculations using `timeit`

.

cpuTime = timeit(@()loopingTransform2(Robot,Map,numFeatures,numRobots))

cpuTime = 0.0759

Create GPU arrays containing the robot rotations and translations.

gRobot.R = gpuArray(Robot.R); gRobot.T = gpuArray(Robot.T);

Time the calculations on the GPU using `gputimeit`

.

gpuTime = gputimeit(@() loopingTransform2(gRobot,gMap,numFeatures,numRobots))

gpuTime = 2.1059

As before, the looping version runs much slower on the GPU because it is not doing calculations in parallel.

A supporting function `pagefunTransform2`

is provided at the end of this example and applies the same transforms as the `loopingTransform2`

function using two `pagefun`

calls instead of nested `for`

-loops. This function needs to incorporate the `transpose`

operator as well as `mtimes`

into a call to `pagefun`

. The function also applies the `squeeze`

function to the transposed robot orientations to put the spread over robots into the 3rd dimension, to match the translations. Despite this, the resulting code is considerably more compact.

The `pagefun`

function expands dimensions appropriately so where we multiply 3-by-3-by-1-by-M matrix `Rt`

with 3-by-3-by-N-by-1 matrix `Map.R`

, we get a 3-by-3-by-N-by-M matrix out.

Time the calculations on the GPU using `gputimeit`

.

gpuPagefunTime = gputimeit(@()pagefunTransform2(gRobot,gMap))

gpuPagefunTime = 0.0014

### Compare Results

Plot the timing results.

labels = categorical(["CPU Execution","GPU Execution","GPU Execution with \fontname{consolas}pagefun"]); bar(labels,[cpuTime,gpuTime,gpuPagefunTime]) ylabel("Execution Time (s)") set(gca,YScale="log")

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than on the CPU.\n", ... cpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 55.85 times faster than on the CPU.

fprintf("Executing the transforms on the GPU using pagefun is %3.2f times faster than using nested for-loops on the GPU.\n", ... gpuTime/gpuPagefunTime);

Executing the transforms on the GPU using pagefun is 1549.19 times faster than using nested for-loops on the GPU.

### Conclusion

The `pagefun`

function supports a number of 2-D operations, as well as most of the scalar operations supported by `arrayfun`

. Together, these functions allow you to vectorize a range of computations involving matrix algebra and array manipulation, eliminating the need for loops and making huge performance gains.

Wherever you are doing small calculations on GPU data in a loop, you should consider converting to a vectorized implementation in this way. This can also be an opportunity to make use of the GPU to improve performance where previously it gave no performance gains.

### Supporting Functions

#### Random Transform Function

The `randomTransforms`

function creates matrices defining `N`

random transforms in a room of specified dimensions. Each transform comprises a random translation `T`

and a random rotation `R`

. The function can be used to set up a map of features in a room and the starting position and orientation of a robot.

function Tform = randomTransforms(N,roomDimensions) % Preallocate matrices. Tform.T = zeros(3,N); Tform.R = zeros(3,3,N); for i = 1:N % Create random translation. Tform.T(:,i) = rand(3,1) .* roomDimensions'; % Create random rotation by extracting an orthonormal % basis from a random 3-by-3 matrix. Tform.R(:,:,i) = orth(rand(3,3)); end end

#### Looping Transform Function

The `loopingTransform`

function transforms every feature to its location relative to the robot by looping over the transforms in turn.

function Rel = loopingTransform(Robot,Map,numFeatures) % Preallocate matrices. Rel.R = zeros(size(Map.R),like=Map.R); Rel.T = zeros(size(Map.T),like=Map.T); for i = 1:numFeatures % Find orientation of map feature relative to the robot. Rel.R(:,:,i) = Robot.R' * Map.R(:,:,i); % Find position of map feature relative to the robot. Rel.T(:,i) = Robot.R' * (Map.T(:,i) - Robot.T); end end

`pagefun`

Transform Function

The `pagefunTransform`

function transforms every feature to its location relative to the robot by applying the transforms using the `pagefun`

function.

function Rel = pagefunTransform(Robot,Map) % Find orientation of map feature relative to the robot. Rel.R = pagefun(@mtimes,Robot.R', Map.R); % Apply translation. Rel.T = Robot.R' * (Map.T - Robot.T); end

#### Nested Looping Transform Function

The `loopingTransform2`

function performs a looping transform using two nested loops, to loop over the robots as well as over the features. The transforms map every feature to its location relative to every robot.

function Rel = loopingTransform2(Robot,Map,numFeatures,numRobots) % Preallocate matrices. Rel.R = zeros(3,3,numFeatures,numRobots,like=Map.R); Rel.T = zeros(3,numFeatures,numRobots,like=Map.T); for i = 1:numFeatures for j = 1:numRobots % Find orientation of map feature relative to the robot. Rel.R(:,:,i,j) = Robot.R(:,:,1,j)' * Map.R(:,:,i); % Find position of map feature relative to the robot. Rel.T(:,i,j) = ... Robot.R(:,:,1,j)' * (Map.T(:,i) - Robot.T(:,1,j)); end end end

#### Two-call `pagefun`

Transform Function

The `pagefunTransform2`

function performs transforms to map every feature to its location relative to every robot using two calls to the `pagefun`

function.

function Rel = pagefunTransform2(Robot,Map) % Find orientation of map feature relative to the robot. Rt = pagefun(@transpose,Robot.R); Rel.R = pagefun(@mtimes,Rt,Map.R); % Find position of map feature relative to the robot. Rel.T = pagefun(@mtimes,squeeze(Rt), ... (Map.T - Robot.T)); end

## See Also

`pagefun`

| `gpuArray`

| `arrayfun`

| `gputimeit`