我想写一个Perl脚本:
- 定期监视输入CSV文件的文件目录
- 在文件检测时,打开,读取和合并第二个字段/列具有相同值的多行
- 将更新后的CSV文件写入新目录,最后,
- 删除输入文件
例如,我有一个CSV文件,其中包含如下信息:
"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM"
"102","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"
我希望脚本读取,解析和检测第二个字段("5555555555")的重复值,然后创建一个新的CSV文件,将上述记录合并为一条记录,如下:
"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM AND your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"
我当前的Perl代码成功地检测、读取和解析了文件,但是,我不知道如何检测重复并组合行。
#!
use strict;
use warnings;
use File::Find;
use Text::CSV;
$| = 1;
use constant {
#Check for CSV files only
SUFFIX_LIST => qr/.(csv)$/,
DIR_TO_CHECK => "/Users/Me/Desktop/INBOUND/",
};
my @file_list;
while (1) {
#Recursively search the input directory for CSV files
find ( sub {
return unless -f;
return unless $_ =~ SUFFIX_LIST;
#Make sure all of the files in the file list array are unique
if(!(grep(/^$_$/, @file_list))) {
push @file_list, $File::Find::name;
}
}, DIR_TO_CHECK
);
#If .csv files are found...
if (scalar(@file_list) > 0) {
print "nNew Item in Directoryn";
parseFile($file_list[0]);
#Delete input file
unlink $file_list[0];
print "Deleted Filen";
#Remove the file from the file list
shift @file_list;
} else {
print "No New Itemn";
}
sleep 5;
}
#Subroutine to parse and compare the csv file
sub parseFile() {
my $csv = Text::CSV->new({ sep_char => ',',
always_quote => 1,
quote_char => '"',
escape_char => '"',
binary => 1,
auto_diag => 1});
#Get the file that was passed to the function
my $file = $_[0] or die "CSV file not passed in subroutinen";
#Open file for reading
open(my $data, '<', $file) or die "Could not open '$file' $!n";
while (my $line = <$data>) {
print $line;
if ($csv->parse($line)) {
my @fields = $csv->fields();
} else {
#warn "Line could not be parsed: $linen";
Text::CSV->error_input();
}
}
close $data;
}
我认为我所拥有的功能是错误的,因为我怀疑我需要将文件作为一个整体读取到内存中,而不是逐行读取。请帮忙,谢谢。
我不进入perl这几天,但这里是我的答案。创建一个以第二个字段为键的散列表。像这样。
%hashtbl{555555} = {
id => 102, # first field
names => "doe, john", # third field
msg => "DOE, JOHN, your trip..." # last field
};
如果键在哈希表中已经存在,则附加其msg
if(exists $hashtbl[$KEY])
$hashtbl{$KEY}->{msg} .= "AND $last_field"
在读取整个文件后,使用这个哈希表创建一个新的csv文件
这样应该可以。
它并不完美,但它应该给一个很大的推动。例如,您需要添加一些垃圾来删除扁平描述列中的额外名称。
my $data = parseFile($path);
flatten_record($_) for @$data;
writeFile($newpath, $data);
sub csv_cols { qw/ id phone name desc / ) }
sub get_csv {
my $csv = Text::CSV->new({
sep_char => ',',
always_quote => 1,
quote_char => '"',
escape_char => '"',
binary => 1,
auto_diag => 1
});
}
#Subroutine to parse csv file
sub parseFile() {
my ($file) = @_;
die "CSV file not passed in subroutinen"
unless $file;
my $csv = get_csv();
#Open file for reading
open(my $fh, '<', $file)
or die "Could not open '$file' $!n";
$csv->column_names( csv_cols() );
# make hash of arrays containing
my %by_phone;
for my $row ( @{$csv->getline_hr_all($fh)} ) {
my $phone = $row->{phone}
$by_phone{$phone} = [] unless $by_phone{$phone};
push @{$by_phone{$phone}}, $row;
}
return [ values %by_phone ];
}
sub flatten_record {
my ($record) = @_;
die "Empty record." if @$record == 0;
if ( @$record == 1 ) {
$record = $record->[0];
} else {
$record = {
id => $record->[0]{id},
phone => $record->[0]{phone},
name => $record->[0]{name},
desc => "$record->[0]{desc} AND $record->[1]{desc}",
};
}
return $record;
}
sub writeFile {
my ( $path, $data ) = @_;
open my $fh, ">", $path
or die "Error opening '$path' for writing- $!n";
my $csv = get_csv();
for my $record ( $data ) {
my @row = @{$record}{ csv_cols() };
$csv->print( $fh, @row );
}
}