mahout git commit: [WEBSITE] Move BuildingMahout.md
2017-11-29
refs/heads/master e59101243 -> fe77fc19f

[WEBSITE] Move BuildingMahout.md

Author: Trevor a.k.a @rawkintrevo
Authored: Wed Nov 29 13:25:14 2017 -0600
Committer: Trevor a.k.a @rawkintrevo <***@gmail.com>
Committed: Wed Nov 29 13:25:14 2017 -0600

diff --git a/website/oldsite/developers/buildingmahout.md b/website/oldsite/developers/buildingmahout.md
index 8e1e7f0..40b509b 100644
--- a/website/oldsite/developers/buildingmahout.md
+++ b/website/oldsite/developers/buildingmahout.md
@@ -1,16 +1,17 @@
layout: default
-title: BuildingMahout
- name: retro-mahout
+title: Building Mahout
+ name: mahout2

-# Building Mahout from source
+# Building Mahout from Source

## Prerequisites

* Java JDK 1.7
-* Apache Maven 3.3.3
+* Apache Maven 3.3.9

## Getting the source code
@@ -23,40 +24,170 @@ or

git clone https://github.com/apache/mahout.git

-##Hadoop version
-Mahout code depends on hadoop-client artifact, with the default version 2.4.1. To build Mahout against to a
-different hadoop version, hadoop.version property should be set accordingly and passed to the build command.
-Hadoop1 clients would additionally require hadoop1 profile to be activated.
+## Building From Source
+###### Prerequisites:
+Linux Environment (preferably Ubuntu 16.04.x) Note: Currently only the JVM-only build will work on a Mac.
+gcc > 4.x
+NVIDIA Card (installed with OpenCL drivers alongside usual GPU drivers)
+###### Downloads
+Install java 1.7+ in an easily accessible directory (for this example, ~/java/)
+Create a directory ~/apache/ .
+Download apache Maven 3.3.9 and un-tar/gunzip to ~/apache/apache-maven-3.3.9/ .
+Download and un-tar/gunzip Hadoop 2.4.1 to ~/apache/hadoop-2.4.1/ .
+Download and un-tar/gunzip spark-1.6.3-bin-hadoop2.4 to ~/apache/ .
+Choose release: Spark-1.6.3 (Nov 07 2016)
+Choose package type: Pre-Built for Hadoop 2.4
+Install ViennaCL 1.7.0+
+If running Ubuntu 16.04+
+sudo apt-get install libviennacl-dev
+Otherwise if your distribution’s package manager does not have a viennniacl-dev package >1.7.0, clone it directly into the directory which will be included in when being compiled by Mahout:
+mkdir ~/tmp
+cd ~/tmp && git clone https://github.com/viennacl/viennacl-dev.git
+cp -r viennacl/ /usr/local/
+cp -r CL/ /usr/local/
+Ensure that the OpenCL 1.2+ drivers are installed (packed with most consumer grade NVIDIA drivers). Not sure about higher end cards.
+Clone mahout repository into `~/apache`.
+git clone https://github.com/apache/mahout.git
+###### Configuration
+When building mahout for a spark backend, we need four System Environment variables set:
+ export MAHOUT_HOME=/home/<user>/apache/mahout
+ export HADOOP_HOME=/home/<user>/apache/hadoop-2.4.1
+ export SPARK_HOME=/home/<user>/apache/spark-1.6.3-bin-hadoop2.4
+ export JAVA_HOME=/home/<user>/java/jdk-1.8.121
+Mahout on Spark regularly uses one more env variable, the IP of the Spark cluster’s master node (usually the node which one would be logged into).
+To use 4 local cores (Spark master need not be running)
+export MASTER=local[4]
+To use all available local cores (again, Spark master need not be running)
+export MASTER=local[*]
+To point to a cluster with spark running:
+export MASTER=spark://master.ip.address:7077
+We then add these to the path:
+These should be added to the your ~/.bashrc file.
+###### Building Mahout with Apache Maven
+From the $MAHOUT_HOME directory we may issue the commands to build each using mvn profiles.
+JVM only:
+mvn clean install -DskipTests
+JVM with native OpenMP level 2 and level 3 matrix/vector Multiplication
+mvn clean install -Pviennacl-omp -Phadoop2 -DskipTests
+JVM with native OpenMP and OpenCL for Level 2 and level 3 matrix/vector Multiplication. (GPU errors fall back to OpenMP, currently only a single GPU/node is supported).
+mvn clean install -Pviennacl -Phadoop2 -DskipTests
+### Changing Scala Version
+To change the Scala version used it is possible to use profiles, however the resulting artifacts seem to have trouble being resolved with SBT.
+mvn clean install -Pscala-2.11
+Maven is able to resolve the resulting artifacts effectively, this will also work if the goal is simply to use the Mahout-Shell. However if the goal is to build with SBT, the following tool should be used
+cd $MAHOUT_HOME/buildtools
+./change-scala-version.sh 2.11
+Now go back to `$MAHOUT_HOME` and execute
+mvn clean install -Pscala-2.11
+**NOTE:** you still need to pass the `-Pscala-2.11` profile, as this determines and propegates the minor scala version (e.g. 2.11.8)
+### The Distribution Profile

-The build lifecycle is illustrated below.
+The distribution profile, among other things, will produce the same artifact for multiple Scala and Spark versions.

-## Compiling
+Specifically, in addition to creating all of the

-Compile Mahout using standard maven commands
+Default Targets:
+- Spark 1.6 Bindings, Scala-2.10
+- Mahout-Math Scala-2.10
+- ViennaCL Scala-2.10*
+- ViennaCL-OMP Scala-2.10*
+- H2O Scala-2.10

- # With hadoop-2.4.1 dependency
- mvn clean compile
+It will also create:
+- Spark 2.0 Bindings, Scala-2.11
+- Spark 2.1 Bindings, Scala-2.11
+- Mahout-Math Scala-2.11
+- ViennaCL Scala-2.11*
+- ViennaCL-OMP Scala-2.11*
+- H2O Scala-2.11

- # With hadoop-1.2.1 dependency
- mvn -Phadoop1 -Dhadoop.version=1.2.1 clean compile
+Note: * ViennaCLs are only created if the `viennacl` or `viennacl-omp` profiles are activated.

+By default, this phase will execute the `package` lifecycle goal on all built "extra" varients.

-Mahout has an extensive test suite which takes some time to run. If you just want to build Mahout, skip the tests like this
+E.g. if you were to run

- # With hadoop-2.4.1 dependency
- mvn -DskipTests=true clean package
+mvn clean install -Pdistribution

- # With hadoop-1.2.1 dependency
- mvn -Phadoop1 -Dhadoop.version=1.2.1 -DskipTests=true clean package
+You will `install` all of the "Default Targets" but only `package` the "Also created".

+If you wish to `install` all of the above, you can set the `lifecycle.target` switch as follows:

-In order to add mahout artifact to your local repository, run
+mvn clean install -Pdistribution -Dlifecycle.target=install

- # With hadoop-2.4.1 dependency
- mvn clean install

- # With hadoop-1.2.1 dependency
- mvn -Phadoop1 -Dhadoop.version=1.2.1 clean install

\ No newline at end of file
