- 浏览: 2486928 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Classification(1)Find Phrases from String
1. Find Import Phrase in All the Content
Start my Local Zeppelin
> bin/zeppelin-daemon.sh start
Because My local Zeppelin is connecting to my virtual box yarn cluster. So I need to start my virtual box and ubuntu-master, ubuntu-dev1, ubuntu-dev2.
How to Load Jar
z.load("org.scalaz:scalaz-core_2.10:7.2.0-M2")
How to Connect to S3
val rdd = sc.textFile("s3n://sillycat/jobs.csv")
How to Add Customer Jar to Zeppelin
in the file zeppelin-env.sh
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/home/spark-seed-assembly-0.0.1.jar,/home/classifier-assembly-1.0.jar"
README.md Format will Help a lot
# Classification System #
### What is this repository for? ###
* NLP and classification
### How do I get set up? (TODO)###
* Summary of set up
Special Character in HTML
http://www.degraeve.com/reference/specialcharacters.php
Really Nice Codes to Filter the Charactors
IncludetextMunging.scala
IncludeTextMungingSpec.scala
Get Phrases from One String
/**
* Counts phrases using a sliding window.
*
* Example:
* In: getPhrasesInTitle(Job("foo foo foo foo foo foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5)
*
* In: getPhrasesInTitle(Job("foo foo foo foo foo foo bar foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5, foo bar -> 1, bar foo -> 1)
*/
def getPhrasesInTitle(job: Job, numWordsInPhrase: Int) = {
val phrases = job.title.split(" ").sliding(numWordsInPhrase).foldLeft(Map("" -> 0)) {
(phraseCounts: Map[String, Int], phrase: Array[String]) =>
phrase.size == numWordsInPhrase match {
case true =>
val str = phrase.mkString(" ")
val count = phraseCounts.getOrElse(str, 0) + 1
phraseCounts + (str -> count)
case false =>
phraseCounts
}
}
phrases - ""
}
One Map Operation
scala> val m1 = Map( ""->0, "s1" ->1)
val m2 = m1 - ""
m2: scala.collection.immutable.Map[String,Int] = Map(s1 -> 1)
val m3 = m2 - "s1"
m3: scala.collection.immutable.Map[String,Int] = Map()
Merge Map
http://stackoverflow.com/questions/20047080/scala-merge-map
http://www.nimrodstech.com/scala-map-merge/
Then merge the map by map1 |+| map2
https://github.com/scalaz/scalaz
How to add scalaz-core in your class path
https://keramida.wordpress.com/2013/12/02/using-sbt-to-experiment-with-new-scala-libraries/
Directly on Command
> wget http://central.maven.org/maven2/org/scalaz/scalaz-core_2.10/7.1.3/scalaz-core_2.10-7.1.3.jar
> scala -cp scalaz-core_2.10-7.1.3.jar
scala> import scalaz.Scalaz._
scala> val k1 = Map( "key"->1, "key22"->3)
k1: scala.collection.immutable.Map[String,Int] = Map(key -> 1, key22 -> 3)
scala> val k2 = Map( "key1"->11, "key122"->13)
k2: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13)
scala> val k3 = k1 |+| k2
k3: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13, key -> 1, key22 -> 3)
Or put the jar in one place and this will work
> scala -cp lib/*
The Whole Flow of Phrase Finding will be
item = “foo foo foo foo” —> Map(“foo foo” -> 4, “ok hello” -> 3)
items.map( item => ).reduce(_ |+| _ )
Scala Skill Tip
1. How to use _
var className: ClassName = _
similar to
var className: ClassName = null
2. foldLeft/: and foldRight:\ and fold
val numbers = List(5,1,3,3)
numbers.fold(0) { (z, i) =>
z+i
}
This function will init the 0, use 0 and add one element in the list, the result will be 5, then the result will add another element in the list.
Another UseCase
class Foo(val name: String, val age: Int, val sex: Symbol)
object Foo {
def apply(name:String, age:Int, sex: Symbol) = new Foo(name, age, sex)
}
val fooList = Foo(“Carl”, 33, ‘male) :: Foo(“Kiko”, 23, ‘female) :: Nil
val stringList = fooList.foldLeft(List[String]()) { (z, f) =>
val title = f.sex match {
case ‘male => “Mr."
case ‘female => “Ms."
}
z :+ s”$title ${f.name}, ${f.age}"
} //stringList(0) Mr. Carl, 33
folerLeft will begin from Left, folderRight will from Right, fold will be no order.
3. Iterator.Sliding
sliding[B>:A](size: Int, step: Int) size of the window, step of the window
scala> (1 to 5).iterator.sliding(3).toList
res0: List[Seq[Int]] = List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))
scala> (1 to 5).iterator.sliding(4, 3).toList
res1: List[Seq[Int]] = List(List(1, 2, 3, 4), List(4, 5))
scala> (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
res2: List[Seq[Int]] = List(List(1, 2, 3, 4))
References:
scala underscore
http://stackoverflow.com/questions/8000903/what-are-all-the-uses-of-an-underscore-in-scala
foldLeft
http://hongjiang.info/foldleft-and-foldright/
http://www.iteblog.com/archives/1228
sliding
http://daily-scala.blogspot.com/2009/11/iteratorsliding.html
http://hongjiang.info/scala-counting-reduplicated-character/
1. Find Import Phrase in All the Content
Start my Local Zeppelin
> bin/zeppelin-daemon.sh start
Because My local Zeppelin is connecting to my virtual box yarn cluster. So I need to start my virtual box and ubuntu-master, ubuntu-dev1, ubuntu-dev2.
How to Load Jar
z.load("org.scalaz:scalaz-core_2.10:7.2.0-M2")
How to Connect to S3
val rdd = sc.textFile("s3n://sillycat/jobs.csv")
How to Add Customer Jar to Zeppelin
in the file zeppelin-env.sh
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/home/spark-seed-assembly-0.0.1.jar,/home/classifier-assembly-1.0.jar"
README.md Format will Help a lot
# Classification System #
### What is this repository for? ###
* NLP and classification
### How do I get set up? (TODO)###
* Summary of set up
Special Character in HTML
http://www.degraeve.com/reference/specialcharacters.php
Really Nice Codes to Filter the Charactors
IncludetextMunging.scala
IncludeTextMungingSpec.scala
Get Phrases from One String
/**
* Counts phrases using a sliding window.
*
* Example:
* In: getPhrasesInTitle(Job("foo foo foo foo foo foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5)
*
* In: getPhrasesInTitle(Job("foo foo foo foo foo foo bar foo", ""), 2)
* Out: Map( -> 0, foo foo -> 5, foo bar -> 1, bar foo -> 1)
*/
def getPhrasesInTitle(job: Job, numWordsInPhrase: Int) = {
val phrases = job.title.split(" ").sliding(numWordsInPhrase).foldLeft(Map("" -> 0)) {
(phraseCounts: Map[String, Int], phrase: Array[String]) =>
phrase.size == numWordsInPhrase match {
case true =>
val str = phrase.mkString(" ")
val count = phraseCounts.getOrElse(str, 0) + 1
phraseCounts + (str -> count)
case false =>
phraseCounts
}
}
phrases - ""
}
One Map Operation
scala> val m1 = Map( ""->0, "s1" ->1)
val m2 = m1 - ""
m2: scala.collection.immutable.Map[String,Int] = Map(s1 -> 1)
val m3 = m2 - "s1"
m3: scala.collection.immutable.Map[String,Int] = Map()
Merge Map
http://stackoverflow.com/questions/20047080/scala-merge-map
http://www.nimrodstech.com/scala-map-merge/
Then merge the map by map1 |+| map2
https://github.com/scalaz/scalaz
How to add scalaz-core in your class path
https://keramida.wordpress.com/2013/12/02/using-sbt-to-experiment-with-new-scala-libraries/
Directly on Command
> wget http://central.maven.org/maven2/org/scalaz/scalaz-core_2.10/7.1.3/scalaz-core_2.10-7.1.3.jar
> scala -cp scalaz-core_2.10-7.1.3.jar
scala> import scalaz.Scalaz._
scala> val k1 = Map( "key"->1, "key22"->3)
k1: scala.collection.immutable.Map[String,Int] = Map(key -> 1, key22 -> 3)
scala> val k2 = Map( "key1"->11, "key122"->13)
k2: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13)
scala> val k3 = k1 |+| k2
k3: scala.collection.immutable.Map[String,Int] = Map(key1 -> 11, key122 -> 13, key -> 1, key22 -> 3)
Or put the jar in one place and this will work
> scala -cp lib/*
The Whole Flow of Phrase Finding will be
item = “foo foo foo foo” —> Map(“foo foo” -> 4, “ok hello” -> 3)
items.map( item => ).reduce(_ |+| _ )
Scala Skill Tip
1. How to use _
var className: ClassName = _
similar to
var className: ClassName = null
2. foldLeft/: and foldRight:\ and fold
val numbers = List(5,1,3,3)
numbers.fold(0) { (z, i) =>
z+i
}
This function will init the 0, use 0 and add one element in the list, the result will be 5, then the result will add another element in the list.
Another UseCase
class Foo(val name: String, val age: Int, val sex: Symbol)
object Foo {
def apply(name:String, age:Int, sex: Symbol) = new Foo(name, age, sex)
}
val fooList = Foo(“Carl”, 33, ‘male) :: Foo(“Kiko”, 23, ‘female) :: Nil
val stringList = fooList.foldLeft(List[String]()) { (z, f) =>
val title = f.sex match {
case ‘male => “Mr."
case ‘female => “Ms."
}
z :+ s”$title ${f.name}, ${f.age}"
} //stringList(0) Mr. Carl, 33
folerLeft will begin from Left, folderRight will from Right, fold will be no order.
3. Iterator.Sliding
sliding[B>:A](size: Int, step: Int) size of the window, step of the window
scala> (1 to 5).iterator.sliding(3).toList
res0: List[Seq[Int]] = List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5))
scala> (1 to 5).iterator.sliding(4, 3).toList
res1: List[Seq[Int]] = List(List(1, 2, 3, 4), List(4, 5))
scala> (1 to 5).iterator.sliding(4, 3).withPartial(false).toList
res2: List[Seq[Int]] = List(List(1, 2, 3, 4))
References:
scala underscore
http://stackoverflow.com/questions/8000903/what-are-all-the-uses-of-an-underscore-in-scala
foldLeft
http://hongjiang.info/foldleft-and-foldright/
http://www.iteblog.com/archives/1228
sliding
http://daily-scala.blogspot.com/2009/11/iteratorsliding.html
http://hongjiang.info/scala-counting-reduplicated-character/
发表评论
-
Stop Update Here
2020-04-28 09:00 262I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 430NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 311Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 322Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 294Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 380Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 374Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 327Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 399VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 336Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 415NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 360Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 293Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 209GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 391GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 275GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 264Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 259Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 252Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 236Serverless with NodeJS and Tenc ...
相关推荐
目标跟踪Moving Target Classification and Tracking from Real-time Video
binaryClassification
image-classification;图像分类;场景分类;tensorflow;python代码
Pattern Classification 模式分类。第二版。英文版 作者 duda
classification papers classification papers classification papers
此为2019年发表于 Database Systems for Advanced Applications(DSFAA)的一篇关于联合学习文本表示和用户社会关系进行情感分类的一篇论文,作者为Kangzhi Zhao, Yong Zhang, Yan Zhang, Chunxiao Xing, 以及Chao Li...
This book is a practical guide that explains the classification algorithms provided in Apache Mahout with the help of actual examples. Starting with the introduction of classification and model ...
Pattern Classification duda 课后答案 Pattern Classification duda 课后答案 Pattern Classification duda 课后答案 Pattern Classification duda 课后答案 Pattern Classification duda 课后答案
Pattern Classification 模式识别领域经典著作
A Survey on Text Classification From Shallow to Deep Learning
Classification of Tumid Lymph Nodes Metastases and Non-Metastases from Lung Cancer in CT Image (1).pdf
We propose an adaptive figure-ground classification algorithm to automatically extract a foreground region using a user-provided bounding-box. The image is first over-segmented with an adaptive mean-...
Deep Learning for the Classification Deep Learning for the Classification Deep Learning for the Classification
Pattern Classification,模式识别经典书籍
Pattern classification Second Edition David G. Stork Richard O. Duda Peter E. Hart 中文翻译人员: 李宏东 姚天翔
Pattern_classification Second Edition David G. Stork Richard O. Duda Peter E. Hart
Pattern classification
pattern classification duda的ppt课件
JCOS之Classification.ppt
xgboost训练数据,Mushroom Classification。Mushroom Classification--xgboost训练数据