combineByKey

Class: matlab.compiler.mlspark.RDD
Namespace: matlab.compiler.mlspark

Combine the elements for each key using a custom set of aggregation functions

Syntax

result = combineByKey(obj,createCombiner,mergeValue,mergeCombiners,numPartitions)

Description

result = combineByKey(obj,createCombiner,mergeValue,mergeCombiners,numPartitions) combines the elements for each key using a custom set of aggregation functions: createCombiner and mergeValue. The input argument numPartitions specifies the number of partitions to create in the resulting RDD.

Input Arguments

expand all

`obj` — Input RDD to combine
`RDD` object

An input RDD to combine, specified as a RDD object.

`createCombiner` — Combiner function (C), given a value (V)
function handle

Combiner function (C), given a value (V), specified as a function handle.

Data Types: function_handle

`mergeValue` — Function representing a merging of the given value (V) with an existing combiner (C)
function handle

Function representing a merging of the given value (V) with an existing combiner (C), specified as a function handle.

Data Types: function_handle

`mergeCombiners` — Function representing the merging of two combiners to return a new combiner
function handle

Function representing the merging of two combiners to return a new combiner, specified as a function handle.

Example:

Data Types: function_handle

`numPartitions` — Number of partitions to create
scalar value

Number of partitions to create, specified as a scalar value.

Example:

Data Types: double

Output Arguments

expand all

`result` — Cell array containing the elements of an RDD
cell array

A MATLAB^® cell array containing the elements of an RDD.

Examples

expand all

Combine the Elements for Each Key

%% Connect to Spark
sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);
sc = matlab.compiler.mlspark.SparkContext(conf);

%% combineByKey
inputRdd = sc.parallelize({{'a',1}, {'b',1},{'a',1}});
resRdd = inputRdd.combineByKey(@(value) num2str(value), ...
            @(acc,value) strcat(acc, value), ...
            @(rdd1value, rdd2Value) strcat(rdd1Value, rdd2Value));
viewRes = resRdd.collect()

Version History

Introduced in R2016b

combineByKey

Syntax

Description

Input Arguments

obj — Input RDD to combine RDD object

createCombiner — Combiner function (C), given a value (V) function handle

mergeValue — Function representing a merging of the given value (V) with an existing combiner (C) function handle

mergeCombiners — Function representing the merging of two combiners to return a new combiner function handle

numPartitions — Number of partitions to create scalar value

Output Arguments

result — Cell array containing the elements of an RDD cell array

Examples

Combine the Elements for Each Key

Version History

See Also

`obj` — Input RDD to combine
`RDD` object

`createCombiner` — Combiner function (C), given a value (V)
function handle

`mergeValue` — Function representing a merging of the given value (V) with an existing combiner (C)
function handle

`mergeCombiners` — Function representing the merging of two combiners to return a new combiner
function handle

`numPartitions` — Number of partitions to create
scalar value

`result` — Cell array containing the elements of an RDD
cell array