Main Content

combineByKey

Class: matlab.compiler.mlspark.RDD
Namespace: matlab.compiler.mlspark

Combine the elements for each key using a custom set of aggregation functions

Syntax

result = combineByKey(obj,createCombiner,mergeValue,mergeCombiners,numPartitions)

Description

result = combineByKey(obj,createCombiner,mergeValue,mergeCombiners,numPartitions) combines the elements for each key using a custom set of aggregation functions: createCombiner and mergeValue. The input argument numPartitions specifies the number of partitions to create in the resulting RDD.

Input Arguments

expand all

An input RDD to combine, specified as a RDD object.

Combiner function (C), given a value (V), specified as a function handle.

Data Types: function_handle

Function representing a merging of the given value (V) with an existing combiner (C), specified as a function handle.

Data Types: function_handle

Function representing the merging of two combiners to return a new combiner, specified as a function handle.

Example:

Data Types: function_handle

Number of partitions to create, specified as a scalar value.

Example:

Data Types: double

Output Arguments

expand all

A MATLAB® cell array containing the elements of an RDD.

Examples

expand all

%% Connect to Spark
sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);
sc = matlab.compiler.mlspark.SparkContext(conf);

%% combineByKey
inputRdd = sc.parallelize({{'a',1}, {'b',1},{'a',1}});
resRdd = inputRdd.combineByKey(@(value) num2str(value), ...
            @(acc,value) strcat(acc, value), ...
            @(rdd1value, rdd2Value) strcat(rdd1Value, rdd2Value));
viewRes = resRdd.collect()

Version History

Introduced in R2016b