Goals

Install RHadoop system for testing R capability to manage and analyze data in Hadoop cluster

Components

Installation Steps:

  1. Install Ubuntu Server on Amazon EC2
  1. Setup Ubuntu system
  1. Set up Apache Hadoop 2.7.0 Single Node Cluster
  1. Set up R:
  1. Set up Rstudio Server
  1. Install RHadoop packages
install.packages("<path>/rhdfs*.tar.gz", repos=NULL, type="source")
install.packages("<path>/rmr2*.tar.gz", repos=NULL, type="source")
install.packages("<path>plyrmr*.tar.gz", repos=NULL, type="source")

Testing

  1. Test Hadoop MapReduce job with example

$ cd $HADOOP_COMMON_HOME

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 10 100

  1. Test R and Rstudio Server
  1. Demo with RHadoop packages