Apache DataFu™

Apache DataFu-Spark 2.0.0 Released

Eyal Allweil

I'd like to announce the release of Apache DataFu-Spark 2.0.0.

This version is the first to support Spark 3.x. In this release, Spark versions 3.0.0 to 3.1.3 are supported.

The four classes in SparkUDAFs - MultiSet, MultiArraySet, MapMerge and CountDistinctUpTo are deprecated. Instead of them, there are new versions which use the Spark Aggregator API. The deprecated versions will be removed in DataFu 2.1.0.


Improvements

  • Spark 3.0.0 - 3.1.3 supported (DATAFU-169)
  • New Aggregators replace deprecated UserDefinedAggregateFunction (DATAFU-173)

Breaking changes

  • Spark 2.x no longer supported


The source release can be obtained from:

http://www.apache.org/dyn/closer.cgi/datafu/apache-datafu-2.0.0/apache-datafu-sources-2.0.0.tgz

Artifacts for DataFu are published in Apache's Maven Repository:

https://repository.apache.org/content/groups/public/org/apache/datafu/

Please visit the Download page for instructions on building from source or retrieving the artifacts in your build system.