我正试图找到一种使用perl解析数据流的正确方法。我已经阅读了许多例子、文档和问题,但找不到如何从数据流中基本上剪切出一个"包"并进行处理。情况如下:-从某个IP到某个IP和端口的数据流-流包含一些胡言乱语,然后是介于和之间的内容,其中的数据是用分号分隔的
到目前为止,我的尝试是让Socket在端口上侦听并处理$data var:
#!/usr/bin/perl
use IO::Socket::INET;
# auto-flush on socket
$| = 1;
# creating a listening socket
my $socket = new IO::Socket::INET (
LocalHost => '127.0.0.1',
LocalPort => '7070',
Proto => 'tcp',
Listen => 5,
Reuse => 1
);
die "cannot create socket $!n" unless $socket;
print "server waiting for client connection on port 7070 n";
while(1)
{
# waiting for a new client connection
my $client_socket = $socket->accept();
# get information about a newly connected client
my $client_address = $client_socket->peerhost();
my $client_port = $client_socket->peerport();
print "connection from $client_address:$client_portn";
# read up to 1024 characters from the connected client
my $data = "";
$client_socket->recv($data, 1024);
print "received data: $datan";
@data_array = split(/;/,$data);
foreach (@data_array) {
print "$_n";
}
# write response data to the connected client
$data = "ok";
$client_socket->send($data);
# notify client that response has been sent
shutdown($client_socket, 1);
}
$socket->close();
这是有效的,但据我所知,这将使整个流达到最大大小,然后进行处理。
我的问题:我如何确定我需要的部分(开始-结束),处理它,然后继续下一个?
我一直不明白为什么人们使用recv
从流套接字中读取。
通常,读取循环看起来如下:
my $buf = '';
while (1) {
my $rv = sysread($socket, $buf, 64*1024, length($buf));
if (!defined($rv)) {
die("Can't read from socket: $!n");
}
if (!$rv) {
die("Can't read from socket: Premature EOFn") if length($buf);
last;
}
while (my $msg = defined(check_for_full_message_and_extract_it_from_buf($buf))) {
process_msg($msg);
}
}
(请记住,即使数据少于请求的数据,sysread也会在有数据时立即返回。)
例如,哨兵终止数据的内部循环如下所示:
while ($buf =~ s/^(.*)n//) {
process_msg("$1");
}
例如,长度前缀块的内部循环如下所示:
while (1) {
last if length($buf) < 4;
my $len = unpack('N', $buf);
last if length($buf) < 4+$len;
substr($buf, 0, 4, '');
my $msg = substr($buf, 0, $len, '');
process_msg($msg);
}
如果你是特殊情况,你会从开始$buf
中删除任何你想忽略的数据,直到你找到你感兴趣的部分,然后你会开始提取你感兴趣。这是模糊的,但我对工作协议只有一个模糊的描述。
我通过使用原始代码并添加:解决了这个问题
if ( $data=~/<START>>/) {
print "nFound startn";
$message.=$data;
while ($message !~/END/){
$client_socket->recv($data, $message_length);
$message.=$data;
print "nStill readingn";
};
print "nFound endn"; # but may contain (part of) next START
}
我仍然需要实现检查块读取是否包含下一条消息的部分,但我会弄清楚的。谢谢你的帮助!