Prediction(4)Logistic Regression - Local Cluster Set Up - 快马扬鞭须努力！

sillycat

浏览: 2482812 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Prediction(4)Logistic Regression - Local Cluster Set Up

博客分类：

Summary

Prediction(4)Logistic Regression - Local Cluster Set Up

1. Try to Set Up Hadoop
Download the right version
> wget http://apache.spinellicreations.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Place it in the right place and soft link the file
> hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a

Set up the Cluster
> mkdir /opt/hadoop/temp

Config core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>

> mkdir /opt/hadoop/dfs
> mkdir /opt/hadoop/dfs/name

> mkdir /opt/hadoop/dfs/data

Configure hdfs-site.xml
<configuration>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>ubuntu-master:9001</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>

> mv mapred-site.xml.template mapred-site.xml

Configure mapred-site.xml
<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>ubuntu-master:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>ubuntu-master:19888</value>
</property>
</configuration>

Configure the yarn-site.xml
<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>ubuntu-master:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>ubuntu-master:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>ubuntu-master:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>ubuntu-master:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>ubuntu-master:8088</value>
</property>
</configuration>

Configure slaves
ubuntu-dev1
ubuntu-dev2
ubuntu-dev3

Prepare the 3 slave machines if needed.
> mkdir ~/.ssh

> vi ~/.ssh/authorized_keys

Copy the keys there, the content is from cat ~/.ssh/id_rsa.pub

scp all the files to all slaves machines.

The same command will start hadoop
7. Hadoop hdfs and yarn
cd /opt/hadoop
sbin/start-dfs.sh
sbin/start-yarn.sh

visit the page
http://ubuntu-master:50070/dfshealth.html#tab-overview
http://ubuntu-master:8088/cluster

Error Message:
> sbin/start-dfs.sh
Starting namenodes on [ubuntu-master]
ubuntu-master: Error: JAVA_HOME is not set and could not be found.
ubuntu-dev1: Error: JAVA_HOME is not set and could not be found.
ubuntu-dev2: Error: JAVA_HOME is not set and could not be found.

Solution:
> vi hadoop-env.sh

export JAVA_HOME="/usr/lib/jvm/java-8-oracle"

Error Message:
2015-09-30 19:39:49,482 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/hadoop/dfs/name/in_use.lock acquired by nodename 3017@ubuntu-master
2015-09-30 19:39:49,487 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)

Solution:
hdfs namenode -format

Cool, all things are up and running for yarn cluster.

2. Try to Set Up Spark 1.5.0
Fetch the latest Spark
> wget http://apache.mirrors.ionfish.org/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz

Unzip and place that in the right working directory.

3. Try to Set Up Zeppelin
Fetch the source codes first.
> git clone https://github.com/apache/incubator-zeppelin.git

> npm install -g grunt-cli

> grunt --version
grunt-cli v0.1.13

> mvn clean package -Pspark-1.5 -Dspark.version=1.5.0 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests

Exception:
[ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:0.0.23:grunt (grunt build) on project zeppelin-web: Failed to run task: 'grunt --no-color' failed. (error code 3) -> [Help 1]

INFO [launcher]: Trying to start PhantomJS again (1/2).
ERROR [launcher]: Cannot start PhantomJS

INFO [launcher]: Trying to start PhantomJS again (2/2).
ERROR [launcher]: Cannot start PhantomJS

ERROR [launcher]: PhantomJS failed 2 times (cannot start). Giving up.
Warning: Task "karma:unit" failed. Use --force to continue.

Solution:
>cd /home/carl/install/incubator-zeppelin/zeppelin-web

> mvn clean install

I get more exceptions in detail. It shows that the PhantomJS is not installed.
Install PhantomJS
http://sillycat.iteye.com/blog/1874971

Build own PhantomJS from source
http://phantomjs.org/build.html

Or find an older version from here
https://code.google.com/p/phantomjs/downloads/list

Download the right version
> wget https://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2

> bzip2 -d phantomjs-1.9.2-linux-x86_64.tar.bz2

> tar -xvf phantomjs-1.9.2-linux-x86_64.tar

Move to the proper directory. Add to path. Verify installation.
Error Exception:
phantomjs --version
phantomjs: error while loading shared libraries: libfontconfig.so.1: cannot open shared object file: No such file or directory

Solution:
> sudo apt-get install libfontconfig

It works.
> phantomjs --version
1.9.2

Build Success.

4. Configure Spark and Zeppelin
Set Up Zeppelin
>cp zeppelin-env.sh.template zeppelin-env.sh
> cp zeppelin-site.xml.template zeppelin-site.xml

>vi zeppelin-env.sh
export MASTER="yarn-client"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"

export SPARK_HOME="/opt/spark"
. ${SPARK_HOME}/conf/spark-env.sh
export ZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"

Set Up Spark
>cp spark-env.sh.template spark-env.sh
>vi spark-env.sh
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop"
export SPARK_WORKER_MEMORY=768m
export SPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
export USER=carl

Rebuild and set up the zeppelin.
> mvn clean package -Pspark-1.5 -Dspark.version=1.5.0 -Dhadoop.version=2.7.0 -Phadoop-2.6 -Pyarn -DskipTests -P build-distr

The final gz file will be here:
/home/carl/install/incubator-zeppelin-0.6.0/zeppelin-distribution/target

> mv zeppelin-0.6.0-incubating-SNAPSHOT /home/carl/tool/zeppelin-0.6.0

> sudo ln -s /opt/zeppelin-0.6.0 /opt/zeppelin

Start the Server
> bin/zeppelin-daemon.sh start

Visit the Zeppelin
http://ubuntu-master:8080/#/

Exception:
Found both spark.driver.extraJavaOptions and SPARK_JAVA_OPTS. Use only the former.

Solution:
Zeppelin Configuration
export ZEPPELIN_JAVA_OPTS="-Dspark.akka.frameSize=100 -Dspark.jars=/home/hadoop/spark-seed-assembly-0.0.1.jar"

Spark Configuration
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70"
export SPARK_LOCAL_DIRS=/opt/spark

export SPARK_LOG_DIR=/var/log/apps
export SPARK_CLASSPATH=“/opt/spark/conf:/home/hadoop/conf:/opt/spark/classpath/emr/*:/opt/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar"

References:
http://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression

zeppelin
http://sillycat.iteye.com/blog/2216604
http://sillycat.iteye.com/blog/2223622

https://github.com/apache/incubator-zeppelin

hadoop
http://sillycat.iteye.com/blog/2242559
http://sillycat.iteye.com/blog/2193762
http://sillycat.iteye.com/blog/2103457
http://sillycat.iteye.com/blog/2084169
http://sillycat.iteye.com/blog/2090186

分享到：

Prediction(5)Cluster Trouble Shooting | Prediction(3)Model - Decision Tree

2015-10-02 07:03
浏览 1309
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Prediction(4)Logistic Regression - Local Cluster Set Up

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Prediction(4)Logistic Regression - Local Cluster Set Up

评论

发表评论

相关推荐

Stop Update Here

NodeJS12 and Zlib

Docker Swarm 2020(2)Docker Swarm and Portainer

Docker Swarm 2020(1)Simply Install and Use Swarm

Traefik 2020(1)Introduction and Installation

Portainer 2020(4)Deploy Nginx and Others

Private Registry 2020(1)No auth in registry Nginx AUTH for UI

Docker Compose 2020(1)Installation and Basic

VPN Server 2020(2)Docker on CentOS in Ubuntu

Buffer in NodeJS 12 and NodeJS 8

NodeJS ENV Similar to JENV and PyENV

Prometheus HA 2020(3)AlertManager Cluster

Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings

GraphQL 2019(3)Connect to MySQL

GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud

GraphQL 2019(1)Apollo Basic

Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit

Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree

Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF

Serverless with NodeJS and TencentCloud 2020(1)Running with Component

最近访客更多访客>>