LDPC Link Simulation Using GPU Processing
This example shows how to use the ldpcDecode
function and gpuArray
to increase the speed of a communications system simulation. The performance improvement is illustrated by modeling part of the ETSI (European Telecommunications Standards Institute) EN 302 307 standard for Broadcasting, Interactive Services, News Gathering and other broadband satellite applications (DVBS-2) [ 1 ]. For further information on simulating the DVBS-2 system see DVB-S.2 Link, Including LDPC Coding in Simulink. You must have a Parallel Computing Toolbox™ user license to use GPU processing for ldpcDecode
.
Introduction
The LDPC Decoding algorithm is computationally expensive and constitutes the vast majority of the time spent in a DVBS-2 simulation. Using the ldpcDecode
function to execute the decoding algorithm on a GPU dramatically improves simulation run time. The example simulates the DVBS-2 system, obtaining a benchmark for speed (run time), once on CPU and once on GPU. The example captures the bit error rate for both versions, to show there is no loss in decoding performance using the GPU.
fprintf(... 'DVBS-2 Digital Video Broadcast Standard Bit Error Rate Simulation\n\n');
DVBS-2 Digital Video Broadcast Standard Bit Error Rate Simulation
fprintf(... 'Performance comparison of CPU- and GPU- accelerated decoders.\n');
Performance comparison of CPU- and GPU- accelerated decoders.
GPU Presence Detection
The example attempts to query the GPU to detect a Parallel Computing Toolbox user license and the presence of a supported GPU. If the GPU or the Parallel Computing Toolbox is unavailable, a CPU-only simulation can be performed.
try % Query the GPU dev = gpuDevice; % Print out information about the GPU that was found fprintf(... 'GPU detected (%s, %d multiprocessors, Compute Capability %s)\n',... dev.Name,dev.MultiprocessorCount,dev.ComputeCapability); % Include a GPU-based simulation. doGPU = true; catch % #ok<CTCH> % The GPU is not supported or not present, or the Parallel Computing %Toolbox was not present and licensed. Consider a CPU-only simulation. inp = input(['***NOTE: GPU not detected. ', ... 'Continue with CPU-only simulation? [Y]/N '],'s'); if strcmpi(inp, 'y') || isempty(inp) doGPU = false; else return; end end
GPU detected (NVIDIA GeForce RTX 3090 Ti, 84 multiprocessors, Compute Capability 8.6)
Initialization
The getParamsDVBS2Demo.m function generates a structure, dvb, which holds the configuration information for the DVBS-2 system given the parameters below. Subsequently, the example includes creating and configuring System objects, based on the dvb structure.
The createSimObjDVBS2Demo.m script constructs most of the System objects used in DVBS-2 and configures them based on the dvb structure.
Then an LDPC decoder configuration object is created. The LDPC decoder configuration object is passed to the ldpcDecode
function.
% DVBS-2 System Parameters subsystemType = 'QPSK 1/2'; % Constellation and LDPC code rate EsNodB = 0.75; % Energy per symbol to noise PSD ratio in dB SNR = convertSNR(EsNodB,'esno',SamplesPerSymbol=1); numFramesPerCall = 20; % Number of frames per call to ldpcDecode numCalls = 10; % Total number of frames = numFramesPerCall * numCalls maxNumLDPCIterations = 50; % LDPC Decoder iterations dvb = getParamsDVBS2Demo(subsystemType,EsNodB,maxNumLDPCIterations); % Create and configure the BCH Encoder and Decoder, Modulator, Demodulator. createSimObjDVBS2Demo; % Construct an LDPC Encoder configuration object encoderCfg = ldpcEncoderConfig(dvb.LDPCParityCheckMatrix); % Construct an LDPC Decoder configuration object decoderCfg = ldpcDecoderConfig(dvb.LDPCParityCheckMatrix); % Create an ErrorRate object to analyze the differences in bit error rate % between the CPU and GPU. BER = comm.ErrorRate;
CPU and GPU Performance Comparison
This example simulates the DVBS-2 system using the CPU first, and then the GPU. The example obtains system benchmarks for each LDPC Decoder by passing several frames of data through the system and measuring the total system simulation time. The first frame of data incurs a large simulation initialization time, and so, it is excluded from the benchmark calculations. The per frame and average system simulation times are printed to the Command Window. The bit error rate (BER) of the system is also printed to the Command Window to illustrate that both the CPU and the GPU achieve the same BER.
if doGPU architectures = 2; else architectures = 1; end % Initialize run time results vectors runtime = zeros(architectures,numCalls); avgtime = zeros(1,architectures); % Seed the random number generator used for the channel and message % creation. This will allow a fair BER comparison between CPU and GPU. % Cache the original random stream to restore later. original_rs = RandStream.getGlobalStream; rs = RandStream.create('mrg32k3a','seed',25); RandStream.setGlobalStream(rs); % Loop for each processing unit - CPU and GPU for ii = 1:architectures % Do some initial setup for the execution loop if (ii == 1) arch = 'CPU'; % Use CPU else arch = 'GPU'; % Use GPU end % Reset the Error Rate object reset(BER); % Reset the random stream reset(rs); % Notice to the user that DVBS-2 simulation is beginning. fprintf(['\nUsing ' arch '-based LDPC Decoder:\n']); dels = repmat('\b',1,fprintf(' Initializing ...')); % Main simulation loop. Run numCalls+1 times and ignore the first % call (which has initialization overhead) for the run time % calculation. Use the first run for the BER calculation. for rr = 1:(numCalls+1) % Start timer ts = tic; % Create input messages msg = zeros(encbch.MessageLength, numFramesPerCall); msg(1:dvb.NumInfoBitsPerCodeword, :) = ... logical(randi([0 1],dvb.NumInfoBitsPerCodeword,numFramesPerCall)); % Transmit bchencOut = encbch(msg(:)); ldpcencOut = ldpcEncode(reshape(bchencOut,[],numFramesPerCall),encoderCfg); xlvrOut = intrlv(ldpcencOut,dvb.InterleaveOrder); modOut = pskmod(xlvrOut,dvb.ModulationOrder,dvb.PhaseOffset,'InputType','bit'); % Corrupt with noise chanOut = awgn(modOut,SNR); % Receive demodOut = pskdemod(chanOut,dvb.ModulationOrder,dvb.PhaseOffset,'OutputType','approxllr','NoiseVariance',dvb.NoiseVar); if strcmpi(arch,'GPU') dexlvrOut = deintrlv(gpuArray(demodOut),dvb.InterleaveOrder); ldpcdecOut = gather(logical(ldpcDecode(dexlvrOut,decoderCfg,dvb.LDPCNumIterations,'DecisionType','hard','Termination','max','OutputFormat','info'))); else dexlvrOut = deintrlv(demodOut,dvb.InterleaveOrder); ldpcdecOut = logical(ldpcDecode(dexlvrOut,decoderCfg,dvb.LDPCNumIterations,'DecisionType','hard','Termination','max','OutputFormat','info')); end bchdecOut = decbch(ldpcdecOut(:)); % Compute BER % Calculate BER at output of LDPC, not BCH. ber = BER(logical(bchencOut),ldpcdecOut(:)); % Stop timer runtime(ii, rr) = toc(ts); % Don't report the first call with the initialization overhead. if (rr > 1) fprintf(dels); newCharsToDelete = fprintf(' Decode call %d : %.2f sec', ... rr-1, runtime(ii,rr)); dels = repmat('\b',1,newCharsToDelete); end end % end of running frames through the DVBS-2 system. % Report the run time results to the Command Window. fprintf(dels); % Delete the last line printed out. % Calculate the average run time. Don't include call 1 because it % includes some System object initialization time. avgtime(ii) = mean(runtime(ii,2:end)); fprintf(' %d frames decoded, %.2f sec/frame\n',numCalls*numFramesPerCall,avgtime(ii)); fprintf(' Bit error rate: %g \n',ber(1) ); end % architecture loop
Using CPU-based LDPC Decoder:
Initializing ...
Decode call 1 : 3.44 sec Decode call 2 : 3.45 sec Decode call 3 : 3.43 sec Decode call 4 : 3.39 sec Decode call 5 : 3.37 sec Decode call 6 : 3.49 sec Decode call 7 : 3.46 sec Decode call 8 : 3.39 sec Decode call 9 : 3.52 sec Decode call 10 : 3.50 sec
200 frames decoded, 3.44 sec/frame
Bit error rate: 0.0100918
Using GPU-based LDPC Decoder:
Initializing ...
Decode call 1 : 1.38 sec Decode call 2 : 1.24 sec Decode call 3 : 1.21 sec Decode call 4 : 1.30 sec Decode call 5 : 1.33 sec Decode call 6 : 1.21 sec Decode call 7 : 1.26 sec Decode call 8 : 1.36 sec Decode call 9 : 1.28 sec Decode call 10 : 1.32 sec
200 frames decoded, 1.29 sec/frame
Bit error rate: 0.0100918
% Reset the random stream to the cached object
RandStream.setGlobalStream(original_rs);
Using code similar to what is shown above, a bit error rate measurement was made offline. The bit error rate performance of the GPU- and CPU-based LDPC Decoders are identical as seen in this plot.
Summary
If a GPU was used, show the speedup based on the average run time of a DVBS-2 system using a GPU vs a CPU.
if ~doGPU fprintf('\n*** GPU not present ***\n\n'); else % Calculate system-wide speedup fprintf(['\nFull system simulation runs %.2f times faster using ' ... 'ldpcDecode with a GPU.\n\n'],avgtime(1) / avgtime(2)); end
Full system simulation runs 2.67 times faster using ldpcDecode with a GPU.
Appendix
This example uses the createSimObjDVBS2Demo.m script and getParamsDVBS2Demo.m helper function.
Selected Bibliography
ETSI Standard EN 302 307 V1.1.1: Digital Video Broadcasting (DVB); Second generation framing structure, channel coding and modulation systems for Broadcasting, Interactive Services, New Gathering and other broadband satellite applications (DVBS-2), European Telecommunications Standards Institute, Valbonne, France, 2005-03.