- 浏览: 2486513 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Perl Huge XML Solution(1)Split Files and Multiple Threads
1. Upgrade the Perl
>sudo yum install cpan
>sudo cpan
cpan>install Bundle::CPAN
cpan>reload cpan
cpan>upgrade
Not working with Error Message
make NO isa perl
Solution:
> sudo yum install perl-Config*
Not working to upgrade the perl, but I can install the modules one by one
cpan> install Time::Piece
cpan> install Path::Class
cpan> install autodie
cpan> install Thread::Queue
2. Split The File
split_hero.pl
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
use Path::Class;
use autodie; # die if problem reading or writing a file
my $OutputSize = 0;
my $OutputCount = 0;
my $MaxSize = 100_000_000;
my $HugeFileName = "data/728";
print localtime->strftime('%Y-%m-%d %X') . "\n";
my $out;
open(my $in, '<', $HugeFileName . '.xml') or die "input: $!\n";
while(<$in>) {
if(!$out) {
$OutputCount++;
$OutputSize = 0;
open($out, '>', $HugeFileName . "/output$OutputCount.xml") or die "output: $!\n";
unless($OutputCount==1) {
print $out qq{<?xml version='1.0' encoding='UTF-8'?>\n};
print $out qq{<source>\n};
}
}
print $out $_;
$OutputSize += length($_);
if(m|</job>|i) { #/
if($OutputSize > $MaxSize) {
print $out "</source>\n";
close($out);
$out = undef;
}
}
}
close($in);
my @files = glob($HugeFileName . "/*.xml");
my $dir = dir($HugeFileName);
my $list_file = $dir->file("file_list");
my $list_file_handle = $list_file->open('>>');
foreach my $file (@files) {
$list_file_handle->print($file . "\n");
print "$file\n";
}
print localtime->strftime('%Y-%m-%d %X') . "\n";
3. Multiple Threads on Perl
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/sbin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
print("what is the task list = " . $input_fh . "\n");
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
I change that a little bit to call PHP
my $result = `php src/import.php 728 $server`;
4. Test Result
split Huge XML(4.5G) on 2 cores CPU 4G memory Machine in 00:02:05
04:17:24
04:19:29
send to Redis/SQS on 2 cores CPU 4G memory Machine in 00:03:12
04:23:46
04:26:58
References:
http://sillycat.iteye.com/blog/1017590 file handler
http://sillycat.iteye.com/blog/2193773
Perl 1, 2, 3, 4, 6
http://sillycat.iteye.com/blog/1012882
http://sillycat.iteye.com/blog/1012923
http://sillycat.iteye.com/blog/1012940
http://sillycat.iteye.com/blog/1016428
http://sillycat.iteye.com/blog/1017632 string
http://sillycat.iteye.com/blog/1021197 web
http://sillycat.iteye.com/blog/1027282 queue client
http://sillycat.iteye.com/blog/1073593 browser info
Split XML File
http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter
http://stackoverflow.com/questions/15503980/split-file-by-xml-tag
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24760607.html
https://metacpan.org/pod/XML::Twig#xml_split---cut-a-big-XML-file-into-smaller-chunks
http://code.izzid.com/2008/01/21/How-to-move-back-a-line-with-reading-a-perl-filehandle.html
Perl threads
http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240
http://stackoverflow.com/questions/6556976/how-to-use-perl-to-run-the-same-php-script-parallel
Perl Zip the File
http://perldoc.perl.org/IO/Compress/Zip.html
1. Upgrade the Perl
>sudo yum install cpan
>sudo cpan
cpan>install Bundle::CPAN
cpan>reload cpan
cpan>upgrade
Not working with Error Message
make NO isa perl
Solution:
> sudo yum install perl-Config*
Not working to upgrade the perl, but I can install the modules one by one
cpan> install Time::Piece
cpan> install Path::Class
cpan> install autodie
cpan> install Thread::Queue
2. Split The File
split_hero.pl
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
use Path::Class;
use autodie; # die if problem reading or writing a file
my $OutputSize = 0;
my $OutputCount = 0;
my $MaxSize = 100_000_000;
my $HugeFileName = "data/728";
print localtime->strftime('%Y-%m-%d %X') . "\n";
my $out;
open(my $in, '<', $HugeFileName . '.xml') or die "input: $!\n";
while(<$in>) {
if(!$out) {
$OutputCount++;
$OutputSize = 0;
open($out, '>', $HugeFileName . "/output$OutputCount.xml") or die "output: $!\n";
unless($OutputCount==1) {
print $out qq{<?xml version='1.0' encoding='UTF-8'?>\n};
print $out qq{<source>\n};
}
}
print $out $_;
$OutputSize += length($_);
if(m|</job>|i) { #/
if($OutputSize > $MaxSize) {
print $out "</source>\n";
close($out);
$out = undef;
}
}
}
close($in);
my @files = glob($HugeFileName . "/*.xml");
my $dir = dir($HugeFileName);
my $list_file = $dir->file("file_list");
my $list_file_handle = $list_file->open('>>');
foreach my $file (@files) {
$list_file_handle->print($file . "\n");
print "$file\n";
}
print localtime->strftime('%Y-%m-%d %X') . "\n";
3. Multiple Threads on Perl
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/sbin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
print("what is the task list = " . $input_fh . "\n");
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
I change that a little bit to call PHP
my $result = `php src/import.php 728 $server`;
4. Test Result
split Huge XML(4.5G) on 2 cores CPU 4G memory Machine in 00:02:05
04:17:24
04:19:29
send to Redis/SQS on 2 cores CPU 4G memory Machine in 00:03:12
04:23:46
04:26:58
References:
http://sillycat.iteye.com/blog/1017590 file handler
http://sillycat.iteye.com/blog/2193773
Perl 1, 2, 3, 4, 6
http://sillycat.iteye.com/blog/1012882
http://sillycat.iteye.com/blog/1012923
http://sillycat.iteye.com/blog/1012940
http://sillycat.iteye.com/blog/1016428
http://sillycat.iteye.com/blog/1017632 string
http://sillycat.iteye.com/blog/1021197 web
http://sillycat.iteye.com/blog/1027282 queue client
http://sillycat.iteye.com/blog/1073593 browser info
Split XML File
http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter
http://stackoverflow.com/questions/15503980/split-file-by-xml-tag
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24760607.html
https://metacpan.org/pod/XML::Twig#xml_split---cut-a-big-XML-file-into-smaller-chunks
http://code.izzid.com/2008/01/21/How-to-move-back-a-line-with-reading-a-perl-filehandle.html
Perl threads
http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240
http://stackoverflow.com/questions/6556976/how-to-use-perl-to-run-the-same-php-script-parallel
Perl Zip the File
http://perldoc.perl.org/IO/Compress/Zip.html
发表评论
-
Stop Update Here
2020-04-28 09:00 260I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 430NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 310Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 321Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 291Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 378Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 373Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 326Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 397VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 334Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 415NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 359Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 291Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 207GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 390GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 274GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 263Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 259Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 251Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 235Serverless with NodeJS and Tenc ...
相关推荐
can produce an XML solution in far less time using Python than he can with Java or C++. Of course, the cross-platform nature of Python keeps our work consistently usable whether we're developing on ...
Chapter 1: Threads and Runnables Chapter 2: Synchronization Chapter 3: Waiting and Notification Chapter 4: Additional Thread Capabilities Part II: Concurrency Utilities Chapter 5: Concurrency ...
sdk2003文档 DLLs, Processes, and Threads
Operating systems can’t increase the efficiency of platform threads, but the JDK will make better use of them by severing the one-to-one relationship between its threads and OS threads.
A self-contained reference that relies on the latest UNIX standards,UNIX Systems Programming provides thorough coverage of files, signals,semaphores, POSIX threads, and client-server communication....
离线安装包,亲测可用
装mysql时提示少perl,安装perl需要依赖包。已包含下面所有包, 版本号匹配。 [Linux]centOS7下RPM安装Perl 下载rpm依赖包,依照顺序安装. perl-parent-0.225-244.el7.noarch perl-...
Unix Systems Programming Communication, Concurrency, and Threads 2003.chm
It covers the fundamentals of data types and file handling through advanced features like regular expressions, object-oriented programming, threads, internationalization, and integrating Perl with ...
It describes in particular the mechanisms of synchronization (cooperative and competitive) and sharing of data (internal class, static variables) between threads in Java. He then discusses the use of...
离线安装包,亲测可用
线程数这是一个现代的外观新颖的论坛,主要为Perl杂志 (俄语)撰写。居住实时英文版可以在上找到。如何如何在本地启动它。配置将config/config.yml.example复制到config/config.yml并进行调整以适合您的需求。...
Actors That Unify Threads and Events-haller07coord.pdf
Coverage also includes files, signals, semaphores, POSIX threads, and client-server communication. The authors illustrate the best ways to write system calls, they present several hands-on projects ...
Java Threads and the Concurrency Utilities 初学者学习Java多线程的首选
生产者消费者问题的MFC实现
multi-thread programming guide(posix and solaris threads),pdf英文版图书,共387页,涉及到posix多线程编程各个方面。
Threads and the Concurrency Utilities helps all Java developers master and use these capabilities effectively. This book is divided into two parts of four chapters each. Part 1 focuses on the Thread...
"Programming iOS 10: Dive Deep into Views, View Controllers, and Frameworks" English | 2016 | ISBN: 1491970162 | [Early Release] ...Understand further topics, including files, networking, and threads
We've found that books that deal with these other APIs tend to give short shrift to how multiple threads can fully utilize these features of Java (though doubtless the reverse is true; we make no ...