- 浏览: 2487255 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
TextExtract(2)NLP Basic
1. Basic Introduction
NLP - Natural Language Processing
remove noise, remove the html tag, remove the stop word, stem.
OpenNLP
including sentence detector, parts-of-speech(POS) tagger (verbs, nouns or etc), treebank parser
Sentence Detector - return the sentences
Tokenizer - usually word is token, sometimes one word will be 2 tokens. For example don’t will be “do” “n't"
POS Tagger - put the tokens into speech tags( verb, adverb, personal pronoun and etc)
Treebank Chunker - verb phrase and noun. phrase
Treebank Parser -
2. Basic Code Example
Download and get the file apache-opennlp-1.6.0-bin.tar.gz. Place them in the working directory.
> opennlp
OpenNLP 1.6.0. Usage: opennlp TOOL
Pattern
>opennlp ToolName lang-model-name.bin
>opennlp ToolName lang-model-name.bin < input.txt > output.txt
General Pattern
Build the model on top of xxx.bin file, Build the Tool based on Model, execute the task on the tool, return us an array of strings.
http://opennlp.apache.org/download.html
The plane that we can download the models http://opennlp.sourceforge.net/models-1.5/
The pattern is as follow, but I did not see any real examples there.
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class OpenNLPMain {
public static void main(String[] args) {
InputStream modelIn = OpenNLPMain.class.getClassLoader()
.getResourceAsStream("models/en-parser-chunking.bin");
ParserModel model = null;
try {
model = new ParserModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Parser parser = ParserFactory.create(model);
String sentence = "I am carl. I worked in US for about 3 years. Before that I was working in China for 8 years.";
Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
for (int i = 0 ; i< topParses.length;i++){
System.out.println(i + " " + topParses[i]);
}
}
}
Latest dependency, but I am using the embedded version in TIKA. So it is still 1.5.3 version.
https://opennlp.apache.org/maven-dependency.html
Tokenizer Example
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
public class OpenNLPTokenizerMain {
static final String SAMPLE_STR = "I am Carl. I am a software engineer. Totally I worked 12 years. About 9 years in China, 3 years in US.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-token.bin");
TokenizerModel model = null;
try {
model = new TokenizerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize(SAMPLE_STR);
for (int i = 0 ; i< tokens.length;i++){
System.out.println(i + " " + tokens[i]);
}
}
}
3. Some Useful NLP Tools and Models
Sentences
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;
public class OpenNLPSentenceMain {
static final String SAMPLE_STR = "Carl is a Chinese. He worked in China for 9 years. Then he relocated to Austin, Texas, USA. And he spends 3 years there till now.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-sent.bin");
SentenceModel model = null;
try {
model = new SentenceModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
Span[] spans = sentenceDetector.sentPosDetect(SAMPLE_STR);
double[] sentenceProbabilities = sentenceDetector
.getSentenceProbabilities();
for(int i = 0;i<spans.length; i++){
int start = spans[i].getStart();
int end = spans[i].getEnd();
String value = SAMPLE_STR.substring( start, end );
System.out.println( i + " possibility: " + sentenceProbabilities[i] + " string:" + value);
}
}
}
Tokenizer
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
public class OpenNLPTokenizerMain {
static final String SAMPLE_STR = "I am Carl. I am a software engineer. Totally I worked 12 years. About 9 years in China, 3 years in US.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-token.bin");
TokenizerModel model = null;
try {
model = new TokenizerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
TokenizerME tokenizer = new TokenizerME(model);
Span[] spans = tokenizer.tokenizePos(SAMPLE_STR);
double[] tokenProbabilities = tokenizer.getTokenProbabilities();
for (int i = 0; i < spans.length; i++) {
int start = spans[i].getStart();
int end = spans[i].getEnd();
String value = SAMPLE_STR.substring(start, end);
System.out.println(i + " possibility: " + tokenProbabilities[i]
+ " string:" + value);
}
}
}
POS
here is the list of the links
http://cs.nyu.edu/grishman/jet/guide/PennPOS.html
( ) [ ] { }
become, in parsed files: -LRB- -RRB- -RSB- -RSB- -LCB- -RCB-
(The acronyms stand for (Left|Right) (Round|Square|Curly) Bracket.)
Here is the codes
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
public class OpenNLPPOSMain {
public static void main(String[] args) {
String[] data = new String[]{"Carl","engineer","am","a","totally","worked"};
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-pos-maxent.bin");
POSModel model = null;
try {
model = new POSModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
POSTaggerME posTagger = new POSTaggerME( model );
String[] tags = posTagger.tag( data );
double[] probs = posTagger.probs();
for ( int i = 0; i < tags.length; i++ )
{
System.out.println(data[i] + " " + probs[i] + " " + tags[i] );
}
}
}
Chunk
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.util.Span;
public class OpenNLPChunkMain {
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-chunker.bin");
ChunkerModel model = null;
try {
model = new ChunkerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
// I 0.9732879282256719 PRP
// am 0.964606681960317 VBP
// Carl 0.9816758912754017 NNP
// . 0.3823051156140692 .
// I 0.95524464076097 PRP
// am 0.9801383116579873 VBP
// a 0.9863774195781929 DT
// software 0.9071380751356256 NN
// engineer 0.9836540552245981 NN
// . 0.985789375461335 .
String[] data = new String[] { "I", "am", "Carl", ".",
"I", "am", "a", "software","engineer", "." };
String[] tags2 = new String[] { "PRP", "VBP", "NNP", ".", "PRP", "VBP", "DT", "NN", "NN", "." };
ChunkerME chunker = new ChunkerME(model);
Span[] spans = chunker.chunkAsSpans(data, tags2);
double[] probs = chunker.probs();
for (int i = 0; i < spans.length; i++) {
int start = spans[i].getStart();
int end = spans[i].getEnd();
StringBuilder buffer = new StringBuilder();
for (int j = start; j < end; j++) {
buffer.append(data[j]);
if (j != (end - 1)) {
buffer.append(' ');
}
}
String value = buffer.toString();
System.out.println(probs[i] + " " + value);
}
}
}
The Result is amazing
0.9818474273481409 I
0.9839139471783958 am
0.9503687937291497 Carl
0.6471572589002946 I
0.6740306961591902 am
0.9328973760592183 a software engineer
References:
http://sillycat.iteye.com/blog/2231432
http://danielmclaren.com/node/49
http://blog.csdn.net/robinliu2010/article/details/7624863
https://remonstrate.wordpress.com/2011/08/27/opennlp-%E5%88%9D%E6%AD%A5/
http://fuhao-987.iteye.com/blog/891697
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
http://blog.dpdearing.com/2011/12/opennlp-part-of-speech-pos-tags-penn-english-treebank/
chinese
http://blog.csdn.net/robinliu2010/article/details/7627095
1. Basic Introduction
NLP - Natural Language Processing
remove noise, remove the html tag, remove the stop word, stem.
OpenNLP
including sentence detector, parts-of-speech(POS) tagger (verbs, nouns or etc), treebank parser
Sentence Detector - return the sentences
Tokenizer - usually word is token, sometimes one word will be 2 tokens. For example don’t will be “do” “n't"
POS Tagger - put the tokens into speech tags( verb, adverb, personal pronoun and etc)
Treebank Chunker - verb phrase and noun. phrase
Treebank Parser -
2. Basic Code Example
Download and get the file apache-opennlp-1.6.0-bin.tar.gz. Place them in the working directory.
> opennlp
OpenNLP 1.6.0. Usage: opennlp TOOL
Pattern
>opennlp ToolName lang-model-name.bin
>opennlp ToolName lang-model-name.bin < input.txt > output.txt
General Pattern
Build the model on top of xxx.bin file, Build the Tool based on Model, execute the task on the tool, return us an array of strings.
http://opennlp.apache.org/download.html
The plane that we can download the models http://opennlp.sourceforge.net/models-1.5/
The pattern is as follow, but I did not see any real examples there.
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class OpenNLPMain {
public static void main(String[] args) {
InputStream modelIn = OpenNLPMain.class.getClassLoader()
.getResourceAsStream("models/en-parser-chunking.bin");
ParserModel model = null;
try {
model = new ParserModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Parser parser = ParserFactory.create(model);
String sentence = "I am carl. I worked in US for about 3 years. Before that I was working in China for 8 years.";
Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
for (int i = 0 ; i< topParses.length;i++){
System.out.println(i + " " + topParses[i]);
}
}
}
Latest dependency, but I am using the embedded version in TIKA. So it is still 1.5.3 version.
https://opennlp.apache.org/maven-dependency.html
Tokenizer Example
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
public class OpenNLPTokenizerMain {
static final String SAMPLE_STR = "I am Carl. I am a software engineer. Totally I worked 12 years. About 9 years in China, 3 years in US.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-token.bin");
TokenizerModel model = null;
try {
model = new TokenizerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize(SAMPLE_STR);
for (int i = 0 ; i< tokens.length;i++){
System.out.println(i + " " + tokens[i]);
}
}
}
3. Some Useful NLP Tools and Models
Sentences
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;
public class OpenNLPSentenceMain {
static final String SAMPLE_STR = "Carl is a Chinese. He worked in China for 9 years. Then he relocated to Austin, Texas, USA. And he spends 3 years there till now.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-sent.bin");
SentenceModel model = null;
try {
model = new SentenceModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
SentenceDetectorME sentenceDetector = new SentenceDetectorME(model);
Span[] spans = sentenceDetector.sentPosDetect(SAMPLE_STR);
double[] sentenceProbabilities = sentenceDetector
.getSentenceProbabilities();
for(int i = 0;i<spans.length; i++){
int start = spans[i].getStart();
int end = spans[i].getEnd();
String value = SAMPLE_STR.substring( start, end );
System.out.println( i + " possibility: " + sentenceProbabilities[i] + " string:" + value);
}
}
}
Tokenizer
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
public class OpenNLPTokenizerMain {
static final String SAMPLE_STR = "I am Carl. I am a software engineer. Totally I worked 12 years. About 9 years in China, 3 years in US.";
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-token.bin");
TokenizerModel model = null;
try {
model = new TokenizerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
}
}
}
TokenizerME tokenizer = new TokenizerME(model);
Span[] spans = tokenizer.tokenizePos(SAMPLE_STR);
double[] tokenProbabilities = tokenizer.getTokenProbabilities();
for (int i = 0; i < spans.length; i++) {
int start = spans[i].getStart();
int end = spans[i].getEnd();
String value = SAMPLE_STR.substring(start, end);
System.out.println(i + " possibility: " + tokenProbabilities[i]
+ " string:" + value);
}
}
}
POS
here is the list of the links
http://cs.nyu.edu/grishman/jet/guide/PennPOS.html
( ) [ ] { }
become, in parsed files: -LRB- -RRB- -RSB- -RSB- -LCB- -RCB-
(The acronyms stand for (Left|Right) (Round|Square|Curly) Bracket.)
Here is the codes
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
public class OpenNLPPOSMain {
public static void main(String[] args) {
String[] data = new String[]{"Carl","engineer","am","a","totally","worked"};
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-pos-maxent.bin");
POSModel model = null;
try {
model = new POSModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
POSTaggerME posTagger = new POSTaggerME( model );
String[] tags = posTagger.tag( data );
double[] probs = posTagger.probs();
for ( int i = 0; i < tags.length; i++ )
{
System.out.println(data[i] + " " + probs[i] + " " + tags[i] );
}
}
}
Chunk
package com.sillycat.resumeparse;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.util.Span;
public class OpenNLPChunkMain {
public static void main(String[] args) {
InputStream modelIn = OpenNLPParserMain.class.getClassLoader()
.getResourceAsStream("models/en-chunker.bin");
ChunkerModel model = null;
try {
model = new ChunkerModel(modelIn);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (modelIn != null) {
try {
modelIn.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
// I 0.9732879282256719 PRP
// am 0.964606681960317 VBP
// Carl 0.9816758912754017 NNP
// . 0.3823051156140692 .
// I 0.95524464076097 PRP
// am 0.9801383116579873 VBP
// a 0.9863774195781929 DT
// software 0.9071380751356256 NN
// engineer 0.9836540552245981 NN
// . 0.985789375461335 .
String[] data = new String[] { "I", "am", "Carl", ".",
"I", "am", "a", "software","engineer", "." };
String[] tags2 = new String[] { "PRP", "VBP", "NNP", ".", "PRP", "VBP", "DT", "NN", "NN", "." };
ChunkerME chunker = new ChunkerME(model);
Span[] spans = chunker.chunkAsSpans(data, tags2);
double[] probs = chunker.probs();
for (int i = 0; i < spans.length; i++) {
int start = spans[i].getStart();
int end = spans[i].getEnd();
StringBuilder buffer = new StringBuilder();
for (int j = start; j < end; j++) {
buffer.append(data[j]);
if (j != (end - 1)) {
buffer.append(' ');
}
}
String value = buffer.toString();
System.out.println(probs[i] + " " + value);
}
}
}
The Result is amazing
0.9818474273481409 I
0.9839139471783958 am
0.9503687937291497 Carl
0.6471572589002946 I
0.6740306961591902 am
0.9328973760592183 a software engineer
References:
http://sillycat.iteye.com/blog/2231432
http://danielmclaren.com/node/49
http://blog.csdn.net/robinliu2010/article/details/7624863
https://remonstrate.wordpress.com/2011/08/27/opennlp-%E5%88%9D%E6%AD%A5/
http://fuhao-987.iteye.com/blog/891697
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
http://blog.dpdearing.com/2011/12/opennlp-part-of-speech-pos-tags-penn-english-treebank/
chinese
http://blog.csdn.net/robinliu2010/article/details/7627095
发表评论
-
Stop Update Here
2020-04-28 09:00 263I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 433NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 312Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 323Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 294Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 381Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 376Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 329Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 400VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 337Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 417NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 363Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 293Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 209GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 392GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 276GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 265Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 261Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 253Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 238Serverless with NodeJS and Tenc ...
相关推荐
自然语言处理概述 什么是自然语言处理 自然语言处理的典型应用 自然语言处理的基本任务 自然语言处理的基本策略和实现方法 自然语言处理的难点 自然语言处理所涉及的学科 基于规则的自然语言处理方法(理性方法,...
本科毕业设计——自然语言处理+NLP+中文文本分类实战——垃圾短信识别本科毕业设计——自然语言处理+NLP+中文文本分类实战——垃圾短信识别本科毕业设计——自然语言处理+NLP+中文文本分类实战——垃圾短信识别本科...
本课程适合所有需要学习自然语言处理技术的同学,课件内容制作精细,由浅入深,适合入门或进行知识回顾。 本章为该课程的其中一个章节,如有需要可下载全部课程 全套资源下载地址:...
NLP自然语言处理的经典题目,简单,基础,面试经常考的问题。
本书第一版综合了自然语言处理、计算语言学和语音识别的内容,全面论述计算机自然语言处理,深入探讨计算机处理自然语言的词汇、句法、语义、语用等各个方面的问题,介绍了自然语言处理的各种现代技术。该版对于第一...
NLP课件(自然语言处理课件)
2_机器学习与自然语言处理 共33页.pdf 3_n元模型 共33页.pdf 4_数据平滑技术 共39页.pdf 5_汉语分词 共34页.pdf 6_隐马尔科夫模型 共40页.pdf 7_词类标注 共32页.pdf 8_ME&CRF 共48页.pdf 9_常见深度学习模型 共49页...
NLTK是Python的⾃然语⾔处理⼯具包,在NLP领域中,最常使⽤的⼀个Python库。 简单来说,⾃然语⾔处理(NLP)就是开发能够理解⼈类语⾔的应⽤程序或服务。 这⾥讨论⼀些⾃然语⾔处理(NLP)的实际应⽤例⼦,如语⾳识别、...
NLP汉语自然语言处理原理与实践-带目录完整版 郑捷 NLP汉语自然语言处理原理与实践-带目录完整版 郑捷
NLP汉语自然语言处理原理与实践_郑捷(著)_1
NLP汉语自然语言处理原理与实践是一本研究汉语自然语言处理方面的基础性、综合性书籍,涉及NLP的语言理论、算法和工程实践的方方面面,内容繁杂。 本书包括NLP的语言理论部分、算法部分、案例部分,涉及汉语的发展...
2_机器学习与自然语言处理 共33页.pdf 3_n元模型 共33页.pdf 4_数据平滑技术 共39页.pdf 5_汉语分词 共34页.pdf 6_隐马尔科夫模型 共40页.pdf 7_词类标注 共32页.pdf 8_ME&CRF 共48页.pdf 9_常见深度学习模型 共49页...
NLP基于深度学习的自然语言处理综述.pdf 相比传统的机器学习,深度学习试图模仿人的学习思路,通过计算机自动完成海量数据的特征提取,文中对近三年的相关文献进行了研究,首先从深度学习的基本概念进行说明,然后对...
spark-nlp:面向Spark的自然语言处理(NLP)库
hanlp上相关代码文件面向生产环境的多语种自然语言处理工具包,基于 TensorFlow 2.x,目标是普及落地最前沿的NLP技术。HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。目前,基于深度学习的...
NLP自然语言处理情感分析 舆情监测 需要用到的台大简体中文情感词典
Python 文本数据 药品数据挖掘NLP朴素贝叶斯分类 自然语言处理 向量化 python输入输出函数编写 jupyter notebook numpy pandas sklearn 数据分析 数据挖掘
一款轻量级的自然语言处理(NLP)工具包自然语言处理(NLP)工具包是一种用于处理和分析人类语言的软件。它包括许多不同的工具和技术,可以帮助计算机理解和处理语言。 NLP工具包通常包括以下功能: - 分词:将...
Step 2 自然语言处理核心技术学习 HanLP自然语言处理入门 某知乎er推荐的书籍 Step 3 深度学习框架学习 动手学习深度学习pytorch版 Step 4 实战 一个简单的慕课入门上机实践 YSDA Natural Language Processing ...
自然语言处理分词大作业