Apache DataFu-Spark 1.7.0 Released
Eyal Allweil
I'd like to announce the release of Apache DataFu-Spark 1.7.0.
Many thanks to new contributors Arpit Bhardwaj, Ben Rahamim and Shaked Aharon!
Additions
- Add collectLimitedList and dedupRandomN methods (DATAFU-165)
- Improve broadcastJoinSkewed function performance and allow all join types (DATAFU-170)
Improvements
- Upgrade Log4j version (DATAFU-162)
- Added count filtering option to broadcastJoinSkewed (PR #27)
Fixes
- explodeArray method not exposed in Python (DATAFU-163)
Breaking changes
- Spark 2.1.x no longer supported
The source release can be obtained from:
http://www.apache.org/dyn/closer.cgi/datafu/apache-datafu-1.7.0/apache-datafu-sources-1.7.0.tgz
Artifacts for DataFu are published in Apache's Maven Repository:
https://repository.apache.org/content/groups/public/org/apache/datafu/
Please visit the Download page for instructions on building from source or retrieving the artifacts in your build system.