Matlab Mex File with OpenMP compiling and showing active threads but not actually using them

9 views (last 30 days)
Hi, I am trying to write a Mex file to speed up some of my computations. However, for there to actually be a speedup the Mex must be multithreaded. I can get the file to compile and show that multiple threads are active but it appears that all computation is still being done with thread 0. Using Matlab 2022a and Visual Studio 2019. Here is a stripped down example. Mex runs fine albeit slower but returns the same results.
The MEX File
#include "mex.h"
#include "matrix.h"
#include <omp.h>
#include <stdio.h>
// Sample code to multiply two vectors
// Call: C = OMPTest_MatlabAnswers(A,B)
// A and B are 32 bit floats/singles
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, mxArray* prhs[])
{
mwSize n1 = mxGetNumberOfElements(prhs[0]);
mwSize n2 = mxGetNumberOfElements(prhs[1]);
mwSize length = (n1);
mwSize dims[] = {length,1};
float* A = mxGetSingles(prhs[0]);
float* B = mxGetSingles(prhs[1]);
mxArray* mC = mxCreateNumericArray(2, dims, mxSINGLE_CLASS, mxREAL);
float* C = mxGetSingles(mC);
int a[16] = { '0' };
printf("max threads = %d\n", omp_get_max_threads());
// Compute the sum in parallel, change number of threads as necessary
#pragma omp parallel num_threads(4)
{
a[omp_get_thread_num()] = 1;
#pragma omp parallel for
for (int i = 0; i < n1; i++)
{
C[i] = A[i] * B[i];
}
}
printf(" %d, %d, %d, %d \n", a[0], a[1], a[2], a[3]);
printf(" %d, %d, %d, %d \n", a[4], a[5], a[6], a[7]);
printf(" %d, %d, %d, %d \n", a[8], a[9], a[10], a[11]);
printf(" %d, %d, %d, %d \n", a[12], a[13], a[14], a[15]);
plhs[0] = mC;// Return the sum.
}
The Matlab test file looks like this.
A = single(rand(100000000,1));
B = single(rand(100000000,1));
tic
D = OMPTest_MatlabAnswers(A,B);
toc
tic
C = A .* B;
toc
I have tried several compile commands with not much luck. It appears that the threads are active but there is no processing going on in them.
Using
mex -R2018a OMPTest_MatlabAnswers.cpp COMPFLAGS="/openmp $COMPFLAGS"
Compiles fine and it appears that the threads (4) are active. However, changing the number of threads shows no time change outside of variance so it looks like only one thread is ever performing the calculations.
max threads = 8
1, 1, 1, 1
0, 0, 0, 0
0, 0, 0, 0
0, 0, 0, 0
Elapsed time is 0.181612 seconds.
Elapsed time is 0.071325 seconds.
Using (from other similar posts)
mex -R2018a CFLAGS = "$CFLAGS -fopenmp" LDFLAGS = "$LDFLAGS -fopenmp" OMPTest_MatlabAnswers.c
or
mex -R2018a CXXFLAGS = "\$CXXFLAGS -fopenmp" LDFLAGS = "\$LDFLAGS -fopenmp" OMPTest_MatlabAnswers.cpp
returns
<current directory>\CFLAGS not detected;
check that you are in the correct current folder, and check the spelling of <current directory>\CFLAGS
or
<current directory>\CXXFLAGS not detected;
check that you are in the correct current folder, and check the spelling of <current directory>\CXXFLAGS
Another suggestion from previous questions was to try something like this
mex -R2018a CFLAGS='$CFLAGS -fopenmp' LDFLAGS='$LDFLAGS -fopenmp' COPTIMFLAGS='$COPTIMFLAGS -fopenmp -O2' LDOPTIMFLAGS='$LDOPTIMFLAGS -fopenmp -O2' DEFINES='$DEFINES -fopenmp' VectorAddOMP_V2.cpp
While compiles fine and runs. But now it looks like only one thread is active.
max threads = 8
1, 0, 0, 0
0, 0, 0, 0
0, 0, 0, 0
0, 0, 0, 0
Elapsed time is 0.161888 seconds.
Elapsed time is 0.084831 seconds.
How do I get Matlab/OpenMP to fully utilize all threads? This looks like a compiler flag issue but the commands from other previous posts don't appear to work here.
Also, is the use of mx arrays appropriate or is there a better way to reference the Matlab data arrays?
Thanks!

Answers (1)

Bruno Luong
Bruno Luong on 4 Nov 2022
Edited: Bruno Luong on 4 Nov 2022
A little bit late reply.
I believe the main reason is there is a problem in your code, not in the compilation option. because you parallelize in 2 levels
#pragma omp parallel num_threads(4)
{
...
#pragma omp parallel for
for (i ...)
{
}
}
The first level will try to split the work in 4 threasd, then the first thread that come to the for loop try to parallelize the for-loop again in thtreads, but probably all the core are reserved, so it just run in the current thread.
This is OpenMP workflow issue.

Categories

Find more on MATLAB Compiler in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!