DevOps(5)Spark Deployment on VM

sillycat

浏览: 2486367 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Distributed

DevOps(5)Spark Deployment on VM

1. Old Environment

1.1 Jdk

java version "1.6.0_45"

Switch version on ubuntu system.

>sudo update-alternatives --config java

Set up ubuntu JAVA_HOME

>vi ~/.profile

export JAVA_HOME="/usr/lib/jvm/java-6-oracle"

Java Compile Version Problem

[warn] Error reading API from class file : java.lang.UnsupportedClassVersionError: com/digby/localpoint/auth/util/Base64$OutputStream : Unsupported major.minor version 51.0

>sudo update-alternatives --config java

>sudo update-alternatives --config javac

1.2 Cassandra

cassandra 1.2.13 version

http://archive.apache.org/dist/cassandra/1.2.13/ls -

> sudo mkdir -p /var/log/cassandra

> sudo chown -R carl /var/log/cassandra

carl is my username

> sudo mkdir -p /var/lib/cassandra

> sudo chown -R carl /var/lib/cassandra

Change the config if needed, start the cassandra single mode

> cassandra -f conf/cassandra.yaml

Test that from client

> cassandra-cli -host ubuntu-dev1 -port 9160

Setup the multiple nodes, Config changes

listen_address: ubuntu-dev1

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

parameters:

- seeds: "ubuntu-dev1,ubuntu-dev2"

Change that on both nodes on ubuntu-dev1, ubuntu-dev2.

Start the 2 nodes in backend

> nohup cassandra -f conf/cassandra.yaml &

Verify that the cluster is working

> nodetool -h ubuntu-dev1 ring

Datacenter: datacenter1
==========
Address         Rack        Status State Load            Owns                Token
                                                                               7068820527558753619
10.190.191.195 rack1       Up     Normal 132.34 KB       36.12%              -4714763636920163240

10.190.190.190 rack1 Up Normal 65.18 KB 63.88% 7068820527558753619

1.3 Spark

https://spark.apache.org/downloads.html

I am choosing this old version.

spark-0.9.0-incubating-bin-hadoop1.tgz

Place that in the right place.

Set up the access across among the masters and slaves.

On Master

> ssh-keygen -t rsa

> cat ~/.ssh/id_rsa.pub

On slave

> mkdir ~/.ssh

> vi ~/.ssh/authorized_keys

Put the public key from rsa.pub

Config the Spark file here /opt/spark/conf/spark-env.sh

SCALA_HOME=/opt/scala/scala-2.10.3
SPARK_WORKER_MEMORY=512m
#SPARK_CLASSPATH='/opt/localpoint-profiles-spark/*jar'
#SPARK_JAVA_OPTS="-Dbuild.env=lmm.sdprod"

USER=carl

/opt/spark/conf/slaves

ubuntu-dev1

ubuntu-dev2

Command to start the Spark Server

>sbin/start-all.sh

Spark single mode Command

>java -Dbuild.env=sillycat.dev cp /opt/YOU_PROJECT/lib/*.jar com.sillycat.YOUR_CLASS

>java -Dbuild.env=sillycat.dev -Dsparkcontext.Master=“spark://YOURSERVER:7070” cp /opt/YOU_PROJECT/lib/*.jar com.sillycat.YOUR_CLASS

Visit the homepage for Spark Master

http://ubuntu-master:8080/

3. Prepare Mysql

>sudo apt-get install software-properties-common

>sudo add-apt-repository ppa:ondrej/mysql-5.6
>sudo apt-get update

>sudo apt-get install mysql-server

Command to create the database and set up the password

>use mysql;

>grant all privileges on test.* to root@"%" identified by 'kaishi';

>flush privileges;

on the client, maybe only install mysql client

>sudo apt-get install mysql-client-core-5.6

Change the bind address in sudo vi /etc/mysql/my.cnf

bind-address = 127.0.0.1

>sudo service mysql stop

>sudo service mysql start

4. Install Grails

Download from here, I am using an old version.

>wget http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-1.3.7.zip

5. Install tomcat on Master

>wget http://apache.cs.utah.edu/tomcat/tomcat-7/v7.0.57/bin/apache-tomcat-7.0.57.tar.gz

Config the database in this file, TOMCAT_HOME/conf/context.xml

    <Resource name="jdbc/lmm" auth="Container" type="javax.sql.DataSource"
              maxIdle="30" maxWait="-1" maxActive="100"
              factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
              testOnBorrow="true"
              validationQuery="select 1"
              logAbandoned="true"
              username="root"
              password="kaishi"
              driverClassName="com.mysql.jdbc.Driver"

url="jdbc:mysql://localhost:3306/lmm?autoReconnect=true&useServerPrepStmts=false&rewriteBatchedStatements=true"/>

Download and place the right mysql driver

> ls -l lib | grep mysql

-rw-r--r-- 1 carl carl 786484 Dec 10 09:30 mysql-connector-java-5.1.16.jar

Change the config to avoid OutOfMemoryError

> vi bin/catalina.sh

JAVA_OPTS="$JAVA_OPTS -Xms2048m -Xmx2048m -XX:PermSize=256m -XX:MaxPermSize=512m"

6. Running Assembly Jar File

build the assembly jar and place in the lib directory, create a shell file in the bin directory

> cat bin/startup.sh
#!/bin/bash

java -Xms512m -Xmx1024m -Dbuild.env=lmm.sparkvm -Dspray.can.server.request-timeout=300s -Dspray.can.server.idle-timeout=360s -cp /opt/YOUR_MODULE/lib/*.jar com.sillycat,YOUPACKAGE.YOUMAINLCASS

Setup the Bouncy Castle Jar

>cd /usr/lib/jvm/java-6-oracle/jre/lib/ext

> sudo wget http://repo1.maven.org/maven2/org/bouncycastle/bcprov-jdk16/1.46/bcprov-jdk16-1.46.jar

>cd /usr/lib/jvm/java-6-oracle/jre/lib/security

>sudo vi java.security

security.provider.9=org.bouncycastle.jce.provider.BouncyCastleProvider

7. JCE Problem

http://sillycat.iteye.com/blog/2089322

download file jce_policy-6.zip from http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html

Unzip the file and place the jar into this directory.

8. Command to Check data in cqlsh

Connect to cassandra

> cqlsh localhost 9160

Check the key space

cqlsh> select * from system.schema_keyspaces;

Check the version

cqlsh> show version

[cqlsh 3.1.8 | Cassandra 1.2.13 | CQL spec 3.0.0 | Thrift protocol 19.36.2]

Use the key space, something like database;

cqlsh> use device_lookup;

check the table

cqlsh:device_lookup> select count(*) from profile_devices limit 300000;

During testing, if need to clear the data

delete from profile_devices where deviceid = 'ios1009528' and brandcode = 'spark' and profileid = 5;

delete from profile_devices where brandcode = 'spark' and profileid = 5;

Deployment Option One

1 Put a serialize class there.

package com.sillycat.easyspark.profile

import com.sillycat.easyspark.model.Attributes
import org.apache.spark.serializer.KryoRegistrator

import com.esotericsoftware.kryo.Kryo

import com.sillycat.easyspark.model.Profile

class ProfileKryoRegistrator extends KryoRegistrator {

  override def registerClasses(kryo: Kryo) {
   kryo.register(classOf[Attributes])

kryo.register(classOf[Profile])

}

Change the configuration and start SparkContent part as follow:

val config = ConfigFactory.load()

val conf = new SparkConf()
conf.setMaster(config.getString("sparkcontext.Master"))
conf.setAppName("Profile Device Update")

conf.setSparkHome(config.getString("sparkcontext.Home"))
if (config.hasPath("jobJar")) {
  conf.setJars(List(config.getString("jobJar")))
} else {
  conf.setJars(SparkContext.jarOfClass(this.getClass).toSeq)