- 浏览: 2486523 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Apache Zeppelin(2)Zeppelin and Spark Yarn Cluster
Recently try to debug something on zeppelin, if some error happens, we need to go to the log file to check more information.
Check the log file under zeppelin/opt/zeppelin/logs
zeppelin-carl-carl-mac.local.log
zeppelin-interpreter-spark-carl-carl-mac.local.log
Error Message:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.deploy.SparkHadoopUtil$
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1959)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:272)
Build the Zeppelin again.
> mvn clean package -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
Error Message
ERROR [2015-06-30 17:04:43,588] ({Thread-43} JobProgressPoller.java[run]:57) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: java.lang.IllegalStateException: Pool not open
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:286)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:179)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54)
Caused by: java.lang.IllegalStateException: Pool not open
at org.apache.commons.pool2.impl.BaseGenericObjectPool.assertOpen(BaseGenericObjectPool.java:662)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:412)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:284)
Error Message:
ERROR [2015-06-30 17:18:05,297] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logError]:75) - Lost executor 13 on ubuntu-dev1: remote Rpc client disassociated
INFO [2015-06-30 17:18:05,297] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logInfo]:59) - Re-queueing tasks for 13 from TaskSet 3.0
WARN [2015-06-30 17:18:05,298] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logWarning]:71) - Lost task 0.3 in stage 3.0 (TID 14, ubuntu-dev1): ExecutorLostFailure (executor 13 lost)
ERROR [2015-06-30 17:18:05,298] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logError]:75) - Task 0 in stage 3.0 failed 4 times; aborting job
Solutions:
After I fetch and load the recently zeppelin from github and build it myself again. Everything works.
Some configuration are as follow:
> less conf/zeppelin-env.sh
export MASTER="yarn-client"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
export SPARK_HOME="/opt/spark"
. ${SPARK_HOME}/conf/spark-env.sh
export ZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"
Start the yarn cluster, start the zeppelin with command
> bin/zeppelin-daemon.sh start
Check the yarn cluster
http://ubuntu-master:8088/cluster/apps
Visit the zeppelin UI
http://ubuntu-master:8080/
Check the Interpreter to make sure we are using the yarn-client mode and other information.
Place this Simple List there:
val threshold = "book1"
val products = Seq("book1", "book2", "book3", "book4")
val rdd = sc.makeRDD(products,2)
val result = rdd.filter{ p =>
p.equals(threshold)
}.count()
println("!!!!!!!!!!!!!!================result=" + result)
Run that simple list, zeppelin will start a something like spark-shell context on yarn cluster, that jobs will be always running, and after that, we will visit the spark master from this URL
http://ubuntu-master:4040/
We can see all the spark jobs, executors there.
If you plan to try some complex example like this one, you need to open the interpreter and increase the memory.
import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
val data:RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "file:///opt/spark/data/mllib/sample_libsvm_data.txt")
// Split data into training (60%) and test (40%).
val splits:Array[RDD[LabeledPoint]] = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training:RDD[LabeledPoint] = splits(0).cache()
val test:RDD[LabeledPoint] = splits(1)
// Run training algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)
// Clear the default threshold.
model.clearThreshold()
// Compute raw scores on the test set.
val scoreAndLabels:RDD[(Double,Double)] = test.map { point =>
val score = model.predict(point.features)
(score, point.label)
}
scoreAndLabels.take(10).foreach { case (score, label) =>
println("Score = " + score + " Label = " + label);
}
// Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()
println("Area under ROC = " + auROC)
Reference:
http://sillycat.iteye.com/blog/2216604
Recently try to debug something on zeppelin, if some error happens, we need to go to the log file to check more information.
Check the log file under zeppelin/opt/zeppelin/logs
zeppelin-carl-carl-mac.local.log
zeppelin-interpreter-spark-carl-carl-mac.local.log
Error Message:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.deploy.SparkHadoopUtil$
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1959)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:269)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:272)
Build the Zeppelin again.
> mvn clean package -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests
Error Message
ERROR [2015-06-30 17:04:43,588] ({Thread-43} JobProgressPoller.java[run]:57) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: java.lang.IllegalStateException: Pool not open
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:286)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:110)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:179)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:54)
Caused by: java.lang.IllegalStateException: Pool not open
at org.apache.commons.pool2.impl.BaseGenericObjectPool.assertOpen(BaseGenericObjectPool.java:662)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:412)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:284)
Error Message:
ERROR [2015-06-30 17:18:05,297] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logError]:75) - Lost executor 13 on ubuntu-dev1: remote Rpc client disassociated
INFO [2015-06-30 17:18:05,297] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logInfo]:59) - Re-queueing tasks for 13 from TaskSet 3.0
WARN [2015-06-30 17:18:05,298] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logWarning]:71) - Lost task 0.3 in stage 3.0 (TID 14, ubuntu-dev1): ExecutorLostFailure (executor 13 lost)
ERROR [2015-06-30 17:18:05,298] ({sparkDriver-akka.actor.default-dispatcher-4} Logging.scala[logError]:75) - Task 0 in stage 3.0 failed 4 times; aborting job
Solutions:
After I fetch and load the recently zeppelin from github and build it myself again. Everything works.
Some configuration are as follow:
> less conf/zeppelin-env.sh
export MASTER="yarn-client"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
export SPARK_HOME="/opt/spark"
. ${SPARK_HOME}/conf/spark-env.sh
export ZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"
Start the yarn cluster, start the zeppelin with command
> bin/zeppelin-daemon.sh start
Check the yarn cluster
http://ubuntu-master:8088/cluster/apps
Visit the zeppelin UI
http://ubuntu-master:8080/
Check the Interpreter to make sure we are using the yarn-client mode and other information.
Place this Simple List there:
val threshold = "book1"
val products = Seq("book1", "book2", "book3", "book4")
val rdd = sc.makeRDD(products,2)
val result = rdd.filter{ p =>
p.equals(threshold)
}.count()
println("!!!!!!!!!!!!!!================result=" + result)
Run that simple list, zeppelin will start a something like spark-shell context on yarn cluster, that jobs will be always running, and after that, we will visit the spark master from this URL
http://ubuntu-master:4040/
We can see all the spark jobs, executors there.
If you plan to try some complex example like this one, you need to open the interpreter and increase the memory.
import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
val data:RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "file:///opt/spark/data/mllib/sample_libsvm_data.txt")
// Split data into training (60%) and test (40%).
val splits:Array[RDD[LabeledPoint]] = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training:RDD[LabeledPoint] = splits(0).cache()
val test:RDD[LabeledPoint] = splits(1)
// Run training algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)
// Clear the default threshold.
model.clearThreshold()
// Compute raw scores on the test set.
val scoreAndLabels:RDD[(Double,Double)] = test.map { point =>
val score = model.predict(point.features)
(score, point.label)
}
scoreAndLabels.take(10).foreach { case (score, label) =>
println("Score = " + score + " Label = " + label);
}
// Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()
println("Area under ROC = " + auROC)
Reference:
http://sillycat.iteye.com/blog/2216604
发表评论
-
Update Site will come soon
2021-06-02 04:10 1609I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 260I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 430NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 310Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 321Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 291Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 378Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 373Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 326Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 397VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 334Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 415NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 359Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 291Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 207GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 390GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 274GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 263Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 259Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 251Serverless with NodeJS and Tenc ...
相关推荐
Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档
藏经阁-nabling Apache Zeppelin_ and Spark_ for Data Science in the
apache zeppelin使用文档
藏经阁-State of Security_Apache Spark&Apache Zeppelin.pdf
• Apply Apache Spark in Elastic deployments using Jupyter and Zeppelin Notebooks, Docker, Kubernetes and the IBM Cloud • Understand internal details of cost based optimizers used in Catalyst, ...
vagrant-spark-zeppelin:Vagrant,Apache Spark和Apache Zeppelin VM,带有用于学习Spark的笔记本
Apply Apache Spark in Elastic deployments using Jupyter and Zeppelin Notebooks, Docker, Kubernetes and the IBM Cloud Understand internal details of cost based optimizers used in Catalyst, SystemML and...
藏经阁-Enabling Apache Zeppelin and Sp.pdf
Apache Zeppelin 未授权任意命令执行
在Kubernetes上部署Apache Zeppelin的演练 Apache Zeppelin是基于Web的笔记本,可通过SQL,Scala等实现数据驱动的交互式数据分析和协作文档。 作为数据科学家,我喜欢Apache Zeppelin是因为它的多功能性和灵活性-从...
You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio. By the end of this book, ...
During the course of the book, you will also learn about the latest enhancements in Apache Spark 2.2, such as using the DataFrame and Dataset APIs exclusively, building advanced, fully automated ...
Chapter 2: Getting Started with Apache Hadoop and Apache Spark Chapter 3: Deep Dive into Apache Spark Chapter 4: Big Data Analytics with Spark SQL, DataFrames, and Datasets Chapter 5: Real-Time ...
适用于Cloudera的Apache Zeppelin包裹和CSD 此git存储库用于构建CDH的CSD和宗地。 Zeppelin将在Cloudera Manager中作为服务运行,并且其配置通过CM Web UI进行维护。 CSD只有python interpeter。 使用的是齐柏林...
Github Viewer for Apache Zeppelin 安装该文件以按预期的方式在Github中查看Apache Zeppelin笔记本。 请在此处创建任何问题:https://github.com/rishabhbhardwaj/zeppelin-github-viewer/issues欢迎您为改善此问题...
Apache Zeppelin R解释器 这是代码的插入物。 它支持横截面变量和R图(ggplot2 ...)。 由于使用了库的GPL2许可,因此该软件未在ASL2下发行()。 先决条件 您需要在运行笔记本计算机的主机上使用R(带有Rserve和...
齐柏林飞艇笔记本Apache Zeppelin笔记本该存储库包含Apache Zeppelin的笔记本。
Work through steps to scale your machine learning models and deploy them into a standalone cluster, EC2, YARN, and Mesos. Finally dip into the powerful options presented by Spark Streaming, and ...
Apache Zeppelin的Github Viewer 安装它可以在Github上查看Apache Zeppelin笔记本,因为它们应该是。 请在这里创建任何问题:https://github.com/rishabhbhardwaj/zeppelin-github-viewer/issues 欢迎您为改善这一...
使用 Apache Zeppelin 进行机器学习分析注意:Zeppelin 不再需要构建此程序集以使用最新的 Apache Parquet。 有关说明,请参阅