Sunday, 6 December 2015

Storm single node cluster Installation


What is Storm?

Storm is an open-source stream-processing framework originally developed by Backtype and
subsequently at Twitter. While the underlying framework is in Clojure, a Storm application --
called a Topology -- can be written in any programming language, with Java as the
predominant language of choice. Users are free to stitch together a directed graph of
execution, with Spouts (data sources) and Bolts (operators). Architecturally, it consists of a
central job and node management entity dubbed the Nimbus node, and a set of per-node
managers called Supervisors. The Nimbus node is in charge of work distribution, job
orchestration, communication, fault-tolerance, and state management (for which it relies on
ZooKeeper. The parallelism of a Topology can be controlled at 3 different levels:
number of workers (cluster wide processes), executors (number of threads per worker), and
tasks (number of bolts/spouts executed per thread). Intra-worker communication in Storm is
enabled by LMAX Disruptor while ZeroMQ1 is employed for inter-worker
communication. Moreover, tuple distribution across tasks is decided by groupings; with
shuffle grouping, which does random distribution, being the default option.

stormpicture.png


Pre-Requirements to be installed before the installation of Storm.


JAVA INSTALLATION:
Java 6 or 7 or 8 to be  installed on the linux (Ubuntu) machine .For Setting up the  java we need to install openjdk-(6 or 7 or 8).The Oracle JDK is the official JDK; however, it is no longer provided by Oracle as a default installation for Ubuntu.
command to install java on ubuntu machine :
sudo apt-get install openjdk-7-jdk

To know the path on which  java is installed use the following command
sudo update-alternatives --config java 

javapathcommand.png

update the part in ~/.bashrc profile (export the path)
javapath1.png

javapath2.png

 Install Git,Libtool,Automake,Uuid,g++,gcc-multilib packages using following commands :
  1. “sudo apt-get install git -y”
  2. “sudo apt-get install libtool -y”
  3. “sudo apt-get install automake -y”
  4. “sudo apt-get install uuid-dev”
  5. “sudo apt-get install g++ -y”
  6. “sudo apt-get install gcc-multilib -y"

ZooKeeper Installation:After the completion of all the pre-requirements packages, We Install Zookeeper service.
 step 1 : Download the zookeeper tarball from the any of  mirrors into your linux machine local directory using wget Command.(Mirror may change)
                                                                       zookeeperdownload1.png
step 2: Untar the Zookeeper tarball using “tar -xvf zookeeper-3.4.6.tar.gz”
step 3: we have to change the configuration properties of zookeeper so enter into zookeeper-3.4.6 extracted folder & go into the conf folder and make a copy of zoo-sample file and rename it as zoo.conf.
    Configure the properties of Zookeeper
  1. cd /usr/local/zookeeper (i renamed the folder zookeeper-3.4.6 into zookeeper).
  2. cd conf
  3. cp zoo-sample.cfg  zoo.cfg
  4. vi zoo.cfg
change the below details to conf-properties file
tickTime
This is the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
 zootktym.png
dataDir
This is the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
zoodatdir.png
Note: When we are running a multi node cluster the data path be should change in the slave nodes.
clientPort : This the port to listen for client connections.
zooclentport.png

  1. vi zoo.cfg
change the below details to conf-properties file
tickTime
This is the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
 zootktym.png
dataDir
This is the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
zoodatdir.png
Note: When we are running a multi node cluster the data path be should change in the slave nodes.
clientPort : This the port to listen for client connections.
zooclentport.png

  1. vi zoo.cfg
change the below details to conf-properties file
tickTime
This is the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
 zootktym.png
dataDir
This is the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
zoodatdir.png
Note: When we are running a multi node cluster the data path be should change in the slave nodes.
clientPort : This the port to listen for client connections.
zooclentport.png
step 4: When all the required changes are made in zoo.cfg start the zookeeper.change to the bin folder
cd /usr/local/zookeeper/bin

step 6: In this path we find some shell scripts among them use “zkServer.sh”script to start zookeeper.
usr/local/zookeeper/bin/ zkServer.sh start
zoozkserverstart.png
zkServer.sh supports the following commands:
start ,stop,status,start-foreground,restart,upgrade,print-cmd.

step 7: update the part in ~/.bashrc profile.

 Installing the ZeroMQ and Jzmq:

  • Create a directory with the name storm and download the needed tarballs in it for setting up the Storm-cluster.
  1. mkdir storm
  2. cd storm
  3. tar -xzf zeromq-2.1.7.tar.gz
  4. cd zeromq-2.1.7
 zeromq.png
  • change the directory to zeromq-2.1.7 & follow the following commands
  1. ./configure
zeromq2.png
  1. make
  2. sudo make install
  • now we have to add java-bindings to zeromq.
    Install jzmq
  1. cd jzmq
  2. ./Makefile.am
  3. ./autogen.sh
  4. ./configure
  5. make
  6. sudo make install

Finally installation of Storm:
Install storm from the storm downloads get the mirrors and download the storm tarball.
If we need to install all the pre-requirements without any errors.
installation :
step 1:     Download the tarball of storm . (Mirror may change)           

step 2:  untar the tarball of storm
          “ tar -xvf apache-storm-0.9.0.tar.gz
step3:  update the ~/.bashrc file with the storm path.
       stormpath1.png

4.2 setup configuration


step 4: Edit the config file of storm. After setting up the path for the storm edit the storm.yaml file in the conf directory of the storm directory .( i changed the apache-storm-0.9.0 to storm)
sudo  cd/usr/local/storm/conf/storm.yaml
stormconfig.png

step 5: Edit the config file of Storm By adding the following
        “ sudo  vi /usr/local/storm/conf/storm.yaml “    

Install storm package from the storm downloads get the mirrors and download the storm tarball.
Note :If we need to install Storm, all the pre-requirements should be installed without any errors.

Installation :

step 1:     Download the tarball of storm .               

step 2:  untar the tarball of storm
          “ tar -xvf apache-storm-0.9.0.tar.gz
step3:  update the ~/.bashrc file with the storm path.
       stormpath1.png

setup configuration


step 4: Edit the config file of storm. After setting up the path for the storm edit the storm.yaml file in the conf directory of the storm directory .( i changed the apache-storm-0.10.0-beta1 to storm)
sudo  cd/usr/local/storm/conf/storm.yaml
stormconfig.png

step 5: Edit the config file of Storm By adding the following
        “ sudo  vi /usr/local/storm/conf/storm.yaml “   


step 6:  To start the storm nimbus, supervisor & ui            
             “   cd /usr/local/storm/bin/
              To start nimbus       : “storm nimbus”
              To start supervisor  : “storm supervisor”
              To start storm ui      : “storm ui ”  
NOTE : Before starting the storm services make sure that zookeeper service is running.

step 7: Start the zookeeper service and supervisor service on supervisor nodes.
  • When we start all the storm services the storm ui shows the connected supervisors on ui and shows the nimbus up time on storm ui.
  • open Net-browser type on the Nimbus or UI address eg:  127.0.0.1:8772 which open up the Storm UI.


No comments:

Post a Comment