Apache DataFu™

Apache DataFu-Spark 1.7.0 Released

Eyal Allweil

I'd like to announce the release of Apache DataFu-Spark 1.7.0.

Many thanks to new contributors Arpit Bhardwaj, Ben Rahamim and Shaked Aharon!


  • Add collectLimitedList and dedupRandomN methods (DATAFU-165)
  • Improve broadcastJoinSkewed function performance and allow all join types (DATAFU-170)


  • Upgrade Log4j version (DATAFU-162)
  • Added count filtering option to broadcastJoinSkewed (PR #27)


  • explodeArray method not exposed in Python (DATAFU-163)

Breaking changes

  • Spark 2.1.x no longer supported

The source release can be obtained from:


Artifacts for DataFu are published in Apache's Maven Repository:


Please visit the Download page for instructions on building from source or retrieving the artifacts in your build system.