如何使用Perl Text::CSV组合基于重复字段的CSV行

我想写一个Perl脚本:

定期监视输入CSV文件的文件目录
在文件检测时，打开，读取和合并第二个字段/列具有相同值的多行
将更新后的CSV文件写入新目录，最后，
删除输入文件

例如，我有一个CSV文件，其中包含如下信息:

"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM"
"102","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"

我希望脚本读取，解析和检测第二个字段("5555555555")的重复值，然后创建一个新的CSV文件，将上述记录合并为一条记录，如下:

"101","5555555555","DOE, JOHN "," DOE, JOHN, your trip
tomorrow from, 123 Anywhere St Apt #A, to, 100 ELSEWHERE RD APT E, is
scheduled for pickup between, 1:00 PM, and 1:30 PM AND your trip
tomorrow from, 100 ELSEWHERE RD APT E, to, 123 Anywhere St Apt #A, is
scheduled for pickup between, 9:00 PM, and 9:30 PM"

我当前的Perl代码成功地检测、读取和解析了文件，但是，我不知道如何检测重复并组合行。

#!
use strict;
use warnings;
use File::Find;
use Text::CSV;
$| = 1;
use constant {
    #Check for CSV files only
    SUFFIX_LIST => qr/.(csv)$/,
    DIR_TO_CHECK => "/Users/Me/Desktop/INBOUND/",
};
my @file_list;
while (1) {
    #Recursively search the input directory for CSV files
    find ( sub {
            return unless -f;
            return unless $_ =~ SUFFIX_LIST;
                #Make sure all of the files in the file list array are unique
                if(!(grep(/^$_$/, @file_list))) {
                    push @file_list, $File::Find::name;
                }
           }, DIR_TO_CHECK 
    );
#If .csv files are found...
if (scalar(@file_list) > 0) {
    print "nNew Item in Directoryn";
    parseFile($file_list[0]);
    #Delete input file
    unlink $file_list[0];
    print "Deleted Filen";
    #Remove the file from the file list
    shift @file_list;
} else {
    print "No New Itemn";
}
sleep 5;
}
#Subroutine to parse and compare the csv file
sub parseFile() {
my $csv = Text::CSV->new({ sep_char     => ',',
                       always_quote => 1,
                       quote_char   => '"',
                       escape_char  => '"',
                       binary       => 1,
                       auto_diag    => 1});
#Get the file that was passed to the function
my $file = $_[0] or die "CSV file not passed in subroutinen";
#Open file for reading
open(my $data, '<', $file) or die "Could not open '$file' $!n";
while (my $line = <$data>) {
    print $line;
    if ($csv->parse($line)) {
        my @fields = $csv->fields();
    } else {
        #warn "Line could not be parsed: $linen";
        Text::CSV->error_input();
    }
}
close $data;
}

我认为我所拥有的功能是错误的，因为我怀疑我需要将文件作为一个整体读取到内存中，而不是逐行读取。请帮忙，谢谢。

我不进入perl这几天，但这里是我的答案。创建一个以第二个字段为键的散列表。像这样。

%hashtbl{555555} = {
                    id => 102,                         # first field 
                    names => "doe, john",              # third field
                    msg => "DOE, JOHN, your trip..."   # last field 
                    };

如果键在哈希表中已经存在，则附加其msg

if(exists $hashtbl[$KEY]) 
    $hashtbl{$KEY}->{msg} .= "AND $last_field"

在读取整个文件后，使用这个哈希表创建一个新的csv文件

这样应该可以。

它并不完美，但它应该给一个很大的推动。例如，您需要添加一些垃圾来删除扁平描述列中的额外名称。

my $data = parseFile($path);
flatten_record($_) for @$data;
writeFile($newpath, $data);

sub csv_cols { qw/ id phone name desc / ) }
sub get_csv {
    my $csv = Text::CSV->new({
        sep_char     => ',',
        always_quote => 1,
        quote_char   => '"',
        escape_char  => '"',
        binary       => 1,
        auto_diag    => 1
    });
}

#Subroutine to parse csv file
sub parseFile() {
    my ($file) = @_;    
    die "CSV file not passed in subroutinen"
         unless $file;
    my $csv = get_csv();
    #Open file for reading
    open(my $fh, '<', $file)
         or die "Could not open '$file' $!n";
    $csv->column_names( csv_cols() );
    # make hash of arrays containing 
    my %by_phone;
    for my $row ( @{$csv->getline_hr_all($fh)} ) {
        my $phone = $row->{phone}
        $by_phone{$phone} = [] unless $by_phone{$phone};
        push @{$by_phone{$phone}}, $row;
    }
    return [ values %by_phone ];
}

sub flatten_record {
    my ($record) = @_;
    die "Empty record." if @$record == 0;
    if ( @$record == 1 ) {
         $record = $record->[0];
    } else {
         $record = {
             id    => $record->[0]{id},
             phone => $record->[0]{phone},
             name  => $record->[0]{name},
             desc  => "$record->[0]{desc} AND $record->[1]{desc}",
         };
    }
    return $record;
}
sub writeFile {
    my ( $path, $data ) = @_;
    open my $fh, ">", $path
        or die "Error opening '$path' for writing- $!n";
    my $csv = get_csv();
    for my $record ( $data ) {
        my @row = @{$record}{ csv_cols() };
        $csv->print( $fh, @row );
    }
}

相关内容

最新更新

热门标签：