Perl:以与哈希相同的方式使用正则表达式



我希望从正则表达式中获得与下面哈希相同的输出。 我知道我的正则表达式很丑陋,但我正在努力改进它。

因此,正则表达式的预期输出为:

20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889

这里的代码:

#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw(strftime);
my (%hash); # initialization
if (<DATA>) { # if DATA exists
print "here the regex values: n";
while (<DATA>) { # open the DATA
chomp $_; # removes characters at the end of line
my @tab = split(/,/, $_); # split lines
my ($http, $ts, $macin, $caid) = (@tab[2, 3, 4, 5]);
my $timestamp = strftime '%Y%m%d%H%M%S', localtime($ts/1000); # from unix epoch time to human read-able date
my @value = split(///, $http); # split values of the http
my ($url, $filename) = ("http://$value[2]", $value[6]); # value in order to have url and the name of the file
if (! $hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url}) { # starting hash in order to avoid duplicates
$hash{$timestamp."|".$caid."|".$macin."|".$filename."|".$url} = $timestamp."|".$caid."|".$macin."|".$filename."|".$url;
}
my $regex = $_; # trying to have same output with a regex
$regex =~ s/(?:[^/]*/)([^\*]*/)([^.*]*)([^,*]*)(,)([^,*]*)(,)(.*)(.*)/http:/$1|$2|$3|$4|$5|$6|$7/;
print $regex, "n";
}
}
if (%hash) { # checking if hahs exists and contains values
print "nhere the hash values: n";
foreach (sort keys %hash) {
print $_, "n";
}
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929

这里的输出:

here the regex values:
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/|9836847|.3018322401"|,|1574443147021|,|40EVFVRFB,9836847
http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/|0292929|.5002731501"|,|1574443138223|,|BVFEFZZ9C4,0292929
here the hash values:
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889

这个正则表达式匹配你想要的,替代品给你预期的结果,除了时间戳,你必须像在代码的第一部分一样转换它:

^.+?(http://[^/]+).+/([^/]+?)/[^/]+?,(.+?),(.+?),(.+)

替换:$3|$5|$4|$2|$1

结果:

1574443147021|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
1574443138223|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889

正则表达式演示和解释

这是perl代码:

use strict;
use warnings;
use POSIX qw(strftime);
while (<DATA>) {
chomp $_;
s~                          # SUBSTITUTE
.+?                         # 1 or more any character but newline, not greedy
(http://[^/]+)              # group 1, URL until the first slash
.+/                         # 1 or more any character but newline until a slash
([^/]+?)                    # group 2, 1 or more non slash
/[^/]+?,                    # a slash, 1 or more non slash, a comma
(.+?)                       # group 3, 1 or more any character but newline, not greedy
,                           # a comma
(.+?)                       # group 4, 1 or more any character but newline, not greedy
,                           # a comma
(.+)                        # group 5, 1 or more any character but newline
~                           # WITH
strftime('%Y%m%d%H%M%S',    # convert time
localtime($3/1000))
.                           # CONCAT WITH
"|$5|$4|$2|$1"              # groups 5, 4, 2, 1 joined with pipes
~ex;                            # 
print $_, "n";
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929

输出:

"@timestamp",url,ts,macin,caid
20191122181907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
20191122181858|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889

嗯,有很多方法可以达到相同的结果。Bellow 是我的扩展版本,它不仅在字段周围打乱,而且将它们分成哈希并对其进行一些操作 [timestamp]。

从原始帖子中不清楚时间戳是从数据中获取还是在运行时生成 - 我从数据中获取时间戳

use strict;
use warnings;
use feature qw(say);
use Data::Dumper;
my $debug = 0;
my %row;
my %url;
my @fields  = qw( timestamp url ts macin caid );
my @address = qw( proto dn port dir id );
while( <DATA> ) {
next if /timestamp/;
print if $debug;
chomp;
s/,//;
s/"//g;
@row{@fields} = split ',';
print Dumper(%row) if $debug;
@url{@address} = ( $row{url} =~ m#(w+)://(.+):(d+)/(.+)/(.+)# );
$url{id}    =~ s/.d+//;
$url{dir}   =~ /(w+.ts)/;
$url{ts}    = $1;
print Dumper(%url) if $debug;
say join('|', (
timestamp($row{timestamp}),
$url{id},
$row{macin},
$url{ts},
"$url{proto}://$url{dn}:$url{port}"
));
}
sub timestamp {
my $input = shift;
my %data;
my $result;
my %months = ( Jan => 1, Feb => 2, Mar => 3, Apr => 4,
May => 5, Jun => 6, Jul => 7, Aug => 8,
Sep => 9, Oct => 10, Nov => 11, Dec => 12
);
my @fields = qw( month day year hour min sec msec ); 
@data{@fields} = /(w+)s+(d+)s+(d+)s+@s+(d+):(d+):(d+).(d+)/;
print Dumper(%data) if $debug;
$result = sprintf "%4d%02d%02d%02d%02d",
$data{year},
$months{$data{month}},
$data{hour},
$data{min},
$data{sec};
return $result;
}
__DATA__
"@timestamp",url,ts,macin,caid
"Nov 22, 2019 @ 17:19:07.571","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4432595.ts/9836847.3018322401",1574443147021,40EVFVRFB,9836847
"Nov 22, 2019 @ 17:18:59.264","http://pdzr.rt.pl:8889/qsdf/ZDF/vsLop/DIZJ4431573.ts/0292929.5002731501",1574443138223,BVFEFZZ9C4,0292929

结果的输出

201911171907|9836847|40EVFVRFB|DIZJ4432595.ts|http://pdzr.rt.pl:8889
201911171859|0292929|BVFEFZZ9C4|DIZJ4431573.ts|http://pdzr.rt.pl:8889

最新更新