Apache DataFu™ (incubating)

Apache DataFu

Apache DataFu Pig

Apache DataFu Hourglass


Apache DataFu

Quick Start

Apache DataFu is available for download as a source release and as compiled artifacts stored in a Maven repository.

Downloading from the Maven Repository

The latest release can be found in Apache's Maven Repository for DataFu:

You can also use a dependency management system to download the DataFu artifacts and all their dependencies.


compile "org.apache.datafu:datafu-pig-incubating:1.3.1"
compile "org.apache.datafu:datafu-hourglass-incubating:1.3.1"


<dependency org="org.apache.datafu" name="datafu-pig-incubating" rev="1.3.1"/>
<dependency org="org.apache.datafu" name="datafu-hourglass-incubating" rev="1.3.1"/>



See the following guides for next steps:

Building from Source

You can also build DataFu from the source release. Download the source release from one of the mirrors:

Make sure you have Gradle installed. Extract the source and bootstrap the gradlew script that's used for building:

tar xvf apache-datafu-incubating-sources-1.3.1.tgz
cd apache-datafu-incubating-sources-1.3.1
gradle -b bootstrap.gradle

To build the JARs from the source release, run:

./gradlew assemble

After building, the DataFu Pig artifacts can be found in datafu-pig/build/libs. This should contain:

  • datafu-pig-incubating-1.3.1.jar
  • datafu-pig-incubating-1.3.1-javadoc.jar
  • datafu-pig-incubating-1.3.1-sources.jar

The datafu-pig-incubating-1.3.1.jar file can now be used in Pig.

The DataFu Hourglass artifacts can be found in datafu-hourglass/build/libs. This should contain:

  • datafu-hourglass-incubating-1.3.1.jar
  • datafu-hourglass-incubating-1.3.1-javadoc.jar
  • datafu-hourglass-incubating-1.3.1-sources.jar

DataFu Hourglass has several external library dependencies that are required in order to use it. Therefore, the easiest way to get started using it is to install DataFu to your local maven repository:

./gradlew install

Assuming your local maven repository is at ~/.m2, you should see the DataFu Hourglass libraries under ~/.m2/repository/org/apache/datafu/datafu-hourglass-incubating/1.3.1.

You should now be able to declare a dependency on DataFu Hourglass as shown above.