Perl的新功能-解析文件并用动态值替换模式



我是Perl的新手,目前正在尝试将bash脚本转换为Perl。

我的脚本用于转换nmon文件(AIX/Linux性能监控工具),它获取目录中存在的nmon文件,grep并将特定部分重定向到临时文件,grep并将关联的时间戳重定向到另一个文件。

然后,它将数据解析为最终的csv文件,该文件将由第三个要利用的工具进行索引。

NMON数据示例如下:

TOP,%CPU Utilisation
TOP,+PID,Time,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass
TOP,5165226,T0002,10.93,9.98,0.95,1,54852,4232,51220,311014,0.755,1264,PatrolAgent,Unclassified
TOP,5365876,T0002,1.48,0.81,0.67,135,85032,132,84928,38165,1.159,0,db2sysc,Unclassified
TOP,5460056,T0002,0.32,0.27,0.05,1,5060,616,4704,1719,0.072,0,db2kmchan64.v9,Unclassified

字段"时间"(在NMON中被视为T0002,实际上被称为ZZZZ)是一个特定的NMON时间戳,该时间戳的实际值稍后(在专用部分中)出现在NMON文件中,看起来像:

ZZZZ,T0001,00:09:55,01-JAN-2014
ZZZZ,T0002,00:13:55,01-JAN-2014
ZZZZ,T0003,00:17:55,01-JAN-2014
ZZZZ,T0004,00:21:55,01-JAN-2014
ZZZZ,T0005,00:25:55,01-JAN-2014

NMON格式非常具体,不经过解析就无法直接利用,时间戳必须与相应的值相关联。(NMON文件几乎就像是许多不同csv文件的串联,每个文件都有不同的格式、不同的文件等等。)

我编写了以下bash脚本来解析我感兴趣的部分("TOP"部分表示每个主机的顶级进程cpu、mem、io统计数据)

#!/bin/bash
# set -x
################################################################
# INFORMATION
################################################################
# nmon2csv_TOP.sh
# Convert TOP section of nmon files to csv
# CAUTION: This script is expected to be launched by the main workflow
# $DST and DST_CONVERTED_TOP are being exported by it, if not this script will exit at launch time
################################################################
# VARS
################################################################
#  Location of NMON files
NMON_DIR=${DST}
# Location of generated files
OUTPUT_DIR=${DST_CONVERTED_TOP}
# Temp files
rawdatafile=/tmp/temp_rawdata.$$.temp
timestampfile=/tmp/temp_timestamp.$$.temp
# Main Output file
finalfile=${DST_CONVERTED_TOP}/NMON_TOP_processed_at_date_`date '+%F'`.csv
###########################
# BEGIN OF WORK
###########################
# Verify exported var are not null
if [ -z ${NMON_DIR} ]; then
echo -e "nERROR: Var NMON_DIR is null!n" && exit 1
elif [ -z ${OUTPUT_DIR} ]; then
echo -e "nERROR: Var OUTPUT_DIR is null!n" && exit 1
fi
# Check if temp and output files already exists
if [ -s ${rawdatafile} ]; then
rm -f ${rawdatafile}
elif [ -s ${timestampfile} ]; then
rm -f ${timestampfile}
elif [ -s ${finalfile} ]; then
rm -f ${finalfile}
fi
# Get current location
PWD=`pwd`
# Go to NMON files location
cd ${NMON_DIR}
# For each NMON file present:
# To restrict to only PROD env: `ls *.nmon | grep -E -i 'sp|gp|ge'`
for NMON_FILE in `ls *.nmon | grep -E -i 'sp|gp|ge'`; do
# Set Hostname identification
serialnum=`grep 'AAA,SerialNumber,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
hostname=`grep 'AAA,host,' ${NMON_FILE} | awk -F, '{print $3}' OFS=, | tr [:lower:] [:upper:]`
# Grep and redirect TOP Section
grep 'TOP' ${NMON_FILE} | grep -v 'AAA,version,TOPAS-NMON' | grep -v 'TOP,%CPU Utilisation' > ${rawdatafile}
# Grep and redirect associated timestamps (ZZZZ)
grep 'ZZZZ' ${NMON_FILE}> ${timestampfile}
# Begin of work
while IFS=, read TOP PID Time Pct_CPU Pct_Usr Pct_Sys Threads Size ResText ResData CharIO Pct_RAM Paging Command WLMclass
do
timestamp=`grep ${Time} ${timestampfile} | awk -F, '{print $4 " "$3}' OFS=,`
echo ${serialnum},${hostname},${timestamp},${Time},${PID},${Pct_CPU},${Pct_Usr},${Pct_Sys},${Threads},${Size},${ResText},${ResData},${CharIO},${Pct_RAM},${Paging},${Command},${WLMclass} 
| grep -v '+PID,%CPU,%Usr,%Sys,Threads,Size,ResText,ResData,CharIO,%RAM,Paging,Command,WLMclass' >> ${finalfile}
done < ${rawdatafile}
echo -e "INFO: Done for Serialnum: ${serialnum} Hostname: ${hostname}"
done
# Go back to initial location
cd ${PWD}

###########################
# END OF WORK
###########################

这可以根据需要工作,并生成一个主csv文件(你会在代码中看到,我自愿不在文件中保留csv头),它是所有解析主机的串联。

但是,我每天要处理大量的主机(大约3000台主机),使用当前的代码,在最坏的情况下,为1台主机生成数据可能需要几分钟,每台主机乘以几分钟就很容易变成几个小时。。。

所以,这个代码的性能真的不足以处理这么多的数据

10个主机代表大约200000行,最终代表大约20MB的csv文件。这并不算多,但我认为shell脚本可能不是管理这样一个过程的更好选择。。。

我想perl在这项任务上会做得更好(即使shell脚本可能会得到改进),但我对perl的了解(目前)非常差,这就是为什么我请求您的帮助。。。我认为用perl编写代码应该很简单,但我现在还不能让它工作。。。

一个人曾经开发了一个perl脚本来管理NMON文件,并将其转换为sql文件(将这些数据转储到数据库中),我将其移植以使用其功能,在一些shell脚本的帮助下,我管理sql文件以获得最终的csv文件。

但是TOP部分没有集成到perl脚本中,如果不进行重新开发,就无法使用它。

有问题的代码:

#!/usr/bin/perl
# Program name: nmon2mysql.pl
# Purpose - convert nmon.csv file(s) into mysql insert file
# Author - Bruce Spencer
# Disclaimer:  this provided "as is".  
# Date - March 2007
#
$nmon2mysql_ver="1.0. March 2007";
use Time::Local;

#################################################
##  Your Customizations Go Here            ##
#################################################
#  Source directory for nmon csv files
my $NMON_DIR=$ENV{DST_TMP};
my $OUTPUT_DIR=$ENV{DST_CONVERTED_CPU_ALL};

# End "Your Customizations Go Here".  
# You're on your own, if you change anything beyond this line :-)
####################################################################
#############       Main Program            ############
####################################################################
# Initialize common variables
&initialize;
# Process all "nmon" files located in the $NMON_DIR
# @nmon_files=`ls $NMON_DIR/*.nmon $NMON_DIR/*.csv`;
@nmon_files=`ls $NMON_DIR/*.nmon`;
if (@nmon_files eq 0 ) { die ("No *.nmon or csv files found in $NMON_DIRn"); }
@nmon_files=sort(@nmon_files);
chomp(@nmon_files);
foreach $FILENAME ( @nmon_files ) {
@cols= split(///,$FILENAME);
$BASEFILENAME= $cols[@cols-1];
unless (open(INSERT, ">$OUTPUT_DIR/$BASEFILENAME.sql")) { 
die("Can not open /$OUTPUT_DIR/$BASEFILENAME.sqln"); 
}
print INSERT ("# nmon version: $NMONVERn");
print INSERT ("# AIX version: $AIXVERn");
print INSERT ("use nmon;n");
$start=time();
@now=localtime($start);
$now=join(":",@now[2,1,0]);
print ("$now: Begin processing file = $FILENAMEn");
# Parse nmon file, skip if unsuccessful
if (( &get_nmon_data ) gt 0 ) { next; }
$now=time();
$now=$now-$start;
print ("t$now: Finished get_nmon_datan");

# Static variables (number of fields always the same)
#@static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE","PROC");
#@static_vars=("LPAR","CPU_ALL","FILE","MEM","PAGE","MEMNEW","MEMUSE");
@static_vars=("CPU_ALL");
foreach $key (@static_vars) {
&mk_mysql_insert_static($key);;
$now=time();
$now=$now-$start;
print ("t$now: Finished $keyn");
} # end foreach

# Dynamic variables (variable number of fields)
#@dynamic_vars=("DISKBSIZE","DISKBUSY","DISKREAD","DISKWRITE","DISKXFER","ESSREAD","ESSWRITE","ESSXFER","IOADAPT","NETERROR","NET","NETPACKET");
@dynamic_vars=("");
foreach $key (@dynamic_vars) {
&mk_mysql_insert_variable($key);;
$now=time();
$now=$now-$start;
print ("t$now: Finished $keyn");
}
close(INSERT);
#  system("gzip","$FILENAME");
}
exit(0);

############################################
#############  Subroutines  ############
############################################
##################################################################
## Extract CPU_ALL data for Static fields
##################################################################
sub mk_mysql_insert_static {
my($nmon_var)=@_; 
my $table=lc($nmon_var);
my @rawdata;
my $x;
my @cols;
my $comma;
my $TS;
my $n;

@rawdata=grep(/^$nmon_var,/, @nmon);
if (@rawdata < 1) { return(1); }
@rawdata=sort(@rawdata);
@cols=split(/,/,$rawdata[0]);
$x=join(",",@cols[2..@cols-1]);
$x=~ s/%/_PCT/g;
$x=~ s/(MB)/_MB/g;
$x=~ s/-/_/g;
$x=~ s/ /_/g;
$x=~ s/__/_/g;
$x=~ s/,_/,/g;
$x=~ s/_,/,/g;
$x=~ s/^_//;
$x=~ s/_$//;
print INSERT (qq|insert into $table (serialnum,hostname,mode,nmonver,time,ZZZZ,$x) valuesn| );
$comma="";
$n=@cols;
$n=$n-1; # number of columns -1 
for($i=1;$i<@rawdata;$i++){ 
$TS=$UTC_START + $INTERVAL*($i);
@cols=split(/,/,$rawdata[$i]);
$x=join(",",@cols[2..$n]);
$x=~ s/,,/,-1,/g; # replace missing data ",," with a ",-1,"
print INSERT (qq|$comma("$SN","$HOSTNAME","$MODE","$NMONVER",$TS,"$DATETIME{@cols[1]}",$x)| );
$comma=",n";
}
print INSERT (qq|;nn|);
} # end mk_mysql_insert
##################################################################
## Extract CPU_ALL data for variable fields
##################################################################
sub mk_mysql_insert_variable {
my($nmon_var)=@_; 
my $table=lc($nmon_var);
my @rawdata;
my $x;
my $j;
my @cols;
my $comma;
my $TS;
my $n;
my @devices;

@rawdata=grep(/^$nmon_var,/, @nmon);
if ( @rawdata < 1) { return; }
@rawdata=sort(@rawdata);
$rawdata[0]=~ s/%/_PCT/g;
$rawdata[0]=~ s/(/_/g;
$rawdata[0]=~ s/)/_/g;
$rawdata[0]=~ s/ /_/g;
$rawdata[0]=~ s/__/_/g;
$rawdata[0]=~ s/,_/,/g;
@devices=split(/,/,$rawdata[0]);
print INSERT (qq|insert into $table (serialnum,hostname,time,ZZZZ,device,value) valuesn| );
$n=@rawdata;
$n--; 
for($i=1;$i<@rawdata;$i++){ 
$TS=$UTC_START + $INTERVAL*($i);
$rawdata[$i]=~ s/,$//;
@cols=split(/,/,$rawdata[$i]);
print INSERT (qq|n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[2]",$cols[2])| );
for($j=3;$j<@cols;$j++){
print INSERT (qq|,n("$SN","$HOSTNAME",$TS,"$DATETIME{$cols[1]}","$devices[$j]",$cols[$j])| );
}
if ($i < $n) { print INSERT (","); } 
}
print INSERT (qq|;nn|);
} # end mk_mysql_insert_variable
########################################################
### Get an nmon setting from csv file            ###
### finds first occurance of $search             ###
### Return the selected column...$return_col     ###
### Syntax:                                      ###
###     get_setting($search,$col_to_return,$separator)##
########################################################
sub get_setting {
my $i;
my $value="-1";
my ($search,$col,$separator)= @_;    # search text, $col, $separator
for ($i=0; $i<@nmon; $i++){
if ($nmon[$i] =~ /$search/ ) {
$value=(split(/$separator/,$nmon[$i]))[$col];
$value =~ s/["']*//g;  #remove non alphanum characters
return($value);
} # end if
} # end for
return($value);
} # end get_setting
#####################
##  Clean up       ##
#####################
sub clean_up_line {
# remove characters not compatible with nmon variable
# Max rrdtool variable length is 19 chars
# Variable can not contain special characters (% - () )
my ($x)=@_; 
# print ("clean_up, before: $it$nmon[$i]n");
$x =~ s/%/Pct/g;
# $x =~ s/W*//g;
$x =~ s//s/ps/g;       # /s  - ps
$x =~ s///s/g;     # / - s
$x =~ s/(/_/g;
$x =~ s/)/_/g;
$x =~ s/ /_/g;
$x =~ s/-/_/g;
$x =~ s/_KBps//g;
$x =~ s/_tps//g;
$x =~ s/[:,]*s*$//;
$retval=$x; 
} # end clean up

##########################################
##  Extract headings from nmon csv file ##
##########################################
sub initialize {
%MONTH2NUMBER =  ("jan", 1, "feb",2, "mar",3, "apr",4, "may",5, "jun",6, "jul",7, "aug",8, "sep",9, "oct",10, "nov",11, "dec",12 );
@MONTH2ALPHA =  (   "junk","jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" );
} # end initialize
# Get data from nmon file, extract specific data fields (hostname, date, ...)
sub get_nmon_data {
my $key;
my $x;
my $category;
my %toc;
my @cols;
# Read nmon file
unless (open(FILE, $FILENAME)) { return(1); }
@nmon=<FILE>;  # input entire file
close(FILE);
chomp(@nmon);
# Cleanup nmon data remove trainig commas and colons
for($i=0; $i<@nmon;$i++ ) {
$nmon[$i] =~ s/[:,]*s*$//;
}
# Get nmon/server settings (search string, return column, delimiter)
$AIXVER     =&get_setting("AIX",2,",");
$DATE       =&get_setting("date",2,",");
$HOSTNAME   =&get_setting("host",2,",");
$INTERVAL   =&get_setting("interval",2,","); # nmon sampling interval
$MEMORY     =&get_setting(qq|lsconf,"Good Memory Size:|,1,":");
$MODEL      =&get_setting("modelname",3,'s+');
$NMONVER    =&get_setting("version",2,",");
$SNAPSHOTS  =&get_setting("snapshots",2,",");  # number of readings
$STARTTIME  =&get_setting("AAA,time",2,",");
($HR, $MIN)=split(/:/,$STARTTIME);

if ($AIXVER eq "-1") {
$SN=$HOSTNAME;  # Probably a Linux host
} else {
$SN =&get_setting("systemid",4,",");
$SN     =(split(/s+/,$SN))[0]; # "systemid IBM,SN ..."
}
$TYPE       =&get_setting("^BBBP.*Type",3,",");
if ( $TYPE =~ /Shared/ ) { $TYPE="SPLPAR"; } else { $TYPE="Dedicated"; }
$MODE       =&get_setting("^BBBP.*Mode",3,",");
$MODE       =(split(/: /, $MODE))[1];
# $MODE     =~s/"//g;

# Calculate UTC time (seconds since 1970)
# NMON V9  dd/mm/yy
# NMON V10+ dd-MMM-yyyy
if ( $DATE =~ /[a-zA-Z]/ ) {   # Alpha = assume dd-MMM-yyyy date format
($DAY, $MMM, $YR)=split(/-/,$DATE);
$MMM=lc($MMM);
$MON=$MONTH2NUMBER{$MMM};
} else {
($DAY, $MON, $YR)=split(///,$DATE);
$YR=$YR + 2000;
$MMM=$MONTH2ALPHA[$MON];
} # end if
## Calculate UTC time (seconds since 1970).  Required format for the rrdtool.
##  timelocal format
##    day=1-31
##    month=0-11
##    year = x -1900  (time since 1900) (seems to work with either 2006 or 106)
$m=$MON - 1;  # jan=0, feb=2, ...
$UTC_START=timelocal(0,$MIN,$HR,$DAY,$m,$YR); 
$UTC_END=$UTC_START + $INTERVAL * $SNAPSHOTS;
@ZZZZ=grep(/^ZZZZ,/,@nmon);
for ($i=0;$i<@ZZZZ;$i++){
@cols=split(/,/,$ZZZZ[$i]);
($DAY,$MON,$YR)=split(/-/,$cols[3]);
$MON=lc($MON);
$MON="00" . $MONTH2NUMBER{$MON};
$MON=substr($MON,-2,2);
$ZZZZ[$i]="$YR-$MON-$DAY $cols[2]";
$DATETIME{$cols[1]}="$YR-$MON-$DAY $cols[2]";

} # end ZZZZ
return(0);
} # end get_nmon_data

它几乎(我说几乎是因为最近的NMON版本有时在没有数据的情况下会出现一些问题)完成了这项工作,而且它做得比我的shell脚本快得多,如果我将其用于的这些部分

这就是为什么我认为perl应该是一个完美的解决方案。

当然,我不会要求任何人将我的shell脚本转换成perl中的最终版本,但至少要给我一个正确的方向:-)

我真的很感谢任何人的帮助!

通常我强烈反对这样的问题,但我们的生产系统已经瘫痪,在修复之前,我真的没有那么多事情要做…

以下是一些可能让您入门的代码。请将其视为伪代码,因为它完全未经测试,甚至可能不会编译(我总是忘记一些副题或分号,正如我所说,可以运行代码的实际机器是无法访问的),但我评论了很多,希望您能够根据实际需要修改它并使其运行。

use strict;
use warnings;
open INFILE, "<", "path/to/file.nmon";      # Open the file.
my @topLines;                               # Initialize variables.
my %timestamps;
while <INFILE>                              # This will walk over all the lines of the infile.
{                                           # Storing the current line in $_.
chomp $_;                               # Remove newline at the end.
if ($_ =~ m/^TOP/)                      # If the line starts with TOP...
{
push @topLines, $_;                 # ...store it in the array for later use.
}
elsif ($_ =~ m/^ZZZZ/)                  # If it is in the ZZZZ section...
{
my @fields = split ',', $_;         # ...split the line at commas...
my $timestamp = join ",", $fields(2), $fields(3);   # ...join the timestamp into a string as you wish...
$timestamps{$fields(1)} = $timestamp;               # ...and store it in the hash with the Twhatever thing as key.
}
# This iteration could certainly be improved with more knowledge
# of how the file looks. For example the search could be cancelled
# after the ZZZZ section if the file is still long.
}
close INFILE;
open OUTFILE, ">", "path/to/output.csv";    # Open the file you want your output in.
foreach (@topLines)                         # Iterate through all elements of the array.
{                                           # Once again storing the current value in $_.
my @fields = split ',', $_;             # Probably not necessary, depending on how output should be formated.
my $outstring = join ',', $fields(0), $fields(1), $timestamps{$fields(2)};  # And whatever other fields you care for.
print OUTFILE $outstring, "n";         # Print.
}
close OUTFILE;
print "Done.n";

最新更新