Main Content

aggregateByKey

Class: matlab.compiler.mlspark.RDD
Namespace: matlab.compiler.mlspark

Aggregate the values of each key, using given combine functions and a neutral “zero value”

Syntax

result = aggregateByKey(obj,zeroValue,seqFunc,combFunc,numPartitions)

Description

result = aggregateByKey(obj,zeroValue,seqFunc,combFunc,numPartitions) aggregates the values of each key, using given combine functions specified by seqFunc and combFunc, and a neutral “zero value” specified by zeroValue. The input argument numPartitions is optional.

Input Arguments

expand all

An input RDD, specified as a RDD object.

A neutral “zero value”, specified as a cell array of numbers.

Data Types: cell

Function that aggregates the values of each key, specified as a function handle.

Data Types: function_handle

Function to aggregate results of seqFunc, specified as a function handle.

Data Types: function_handle

Number of partitions to create, specified as a scalar value. This argument is optional.

Data Types: double

Output Arguments

expand all

An RDD containing elements aggregated by key, returned as a RDD object.

Examples

expand all

%% Connect to Spark
sparkProp = containers.Map({'spark.executor.cores'}, {'1'});
conf = matlab.compiler.mlspark.SparkConf('AppName','myApp', ...
                        'Master','local[1]','SparkProperties',sparkProp);
sc = matlab.compiler.mlspark.SparkContext(conf);

%% aggregateByKey
x = sc.parallelize({'a','b','c','d'},4);
y = x.map(@(x)({x,1}));
z = y.aggregateByKey(10,@(x,y)(x+y),@(x,y)(x+y));
viewRes = z.collect()  % { {'d',11},{'a',11},{'b',11},{'c',11}}

Version History

Introduced in R2016b