Apache DataFu is available for download as a source release and as compiled artifacts stored in a Maven repository.
The latest release can be found in Apache's Maven Repository for DataFu:
You can also use a dependency management system to download the DataFu artifacts and all their dependencies.
compile "org.apache.datafu:datafu-pig-incubating:1.3.1" compile "org.apache.datafu:datafu-hourglass-incubating:1.3.1"
<dependency org="org.apache.datafu" name="datafu-pig-incubating" rev="1.3.1"/> <dependency org="org.apache.datafu" name="datafu-hourglass-incubating" rev="1.3.1"/>
<dependency> <groupId>org.apache.datafu</groupId> <artifactId>datafu-pig-incubating</artifactId> <version>1.3.1</version> </dependency> <dependency> <groupId>org.apache.datafu</groupId> <artifactId>datafu-hourglass-incubating</artifactId> <version>1.3.1</version> </dependency>
See the following guides for next steps:
You can also build DataFu from the source release. Download the source release from one of the mirrors:
Make sure you have Gradle installed. Extract the source and bootstrap the
gradlew script that's used for building:
tar xvf apache-datafu-incubating-sources-1.3.1.tgz cd apache-datafu-incubating-sources-1.3.1 gradle -b bootstrap.gradle
To build the JARs from the source release, run:
After building, the DataFu Pig artifacts can be found in
datafu-pig/build/libs. This should contain:
datafu-pig-incubating-1.3.1.jar file can now be used in Pig.
The DataFu Hourglass artifacts can be found in
datafu-hourglass/build/libs. This should contain:
DataFu Hourglass has several external library dependencies that are required in order to use it. Therefore, the easiest way to get started using it is to install DataFu to your local maven repository:
Assuming your local maven repository is at
~/.m2, you should see the DataFu Hourglass libraries under
You should now be able to declare a dependency on DataFu Hourglass as shown above.