用Perl修改输入文件



我编写了一个Perl程序,它以2个文本文件作为输入。

第一个文件包含这种格式的序列和概率

good morning 0.5

第二个文件包含所有单词及其概率,格式为

good 0.5
morning 0.6

我的脚本为每个序列计算一个公式

log( prob(sequence) / (prob(word1) - prob(sequence)) * (prob(word2) - prob(sequence)) )

问题是,我有一些prob(sequence)prob(word1)prob(word2)相同的情况,所以我得到了Illegal division by zero

在这种情况下,有没有办法通过添加小数来更改第二个文件中的值?(平滑)

#!/usr/bin/perl
use strict; ## PLE
use warnings;
my $inFile = "file1.txt";
my $outFile ="TEST.txt";
my %hashFR = getVocab("file2.txt");
my @result;
my $bloc = 50000;
my $cmp = 0;
open fileIn, "<$inFile" or die $!;
while (<fileIn>) {
    chomp;
    my $flag = 0;
    my $ligne = $_;
    my @words = getWords($ligne);
    if (my $prob = pop @words) {
        $prob  =~ s/(//g;
        my $probWords = 1;
        foreach my $word (@words) {
            my $probWord;
            if (exists $hashFR{$word}) {
                $probWord = $hashFR{$word};
            }
            $probWords *= $probWord-$prob;
        }
        my $calc = $prob*log2($prob/($probWords));
        my $result10 = sprintf("%.10f", $calc);
        push @result, join(' ',@words) ." (".$result10.")n";
    }
}
#if(scalar(@result) == $bloc)
{
    $cmp += $bloc;
    print "$cmp lignes traitésn";
    writeToResultFile($outFile,@result);
    @result = ();
}
sub getWords {
    my ($ligne) = $_;
    my @words = split(' ', $ligne);
    return @words;
}
sub getVocab {
    my ( $filename ) = @_;
    my %hash = ();
    open fileVocab, "<$filename" or die $!;
    while (<fileVocab>) {
        chomp;
        if (2 == (my($mot, $prob) = split( / / ))) {
            $hash{trim($mot)} = trim($prob);
        }
    }
    close fileVocab;
    return %hash;
}
sub writeToResultFile {
    my ($filename,@res) = @_;
    open(INFO, ">>$filename");
    foreach ( @res) {
        print INFO $_;
    }
    close INFO
}
sub log2 {
    my $n = shift;
    return (log($n)/log(10))/(log(2)/log(10));
}
sub trim($) {
    my $string = shift;
    $string =~ s/^s+//;
    $string =~ s/s+$//;
    return $string;
}

您可以使用这样的异常处理:

my $calc
eval {
 $calc = $prob*log2($prob/($probWords));
};
if ($@){
  $calc = 0;#or whatever suits you
}

或者更简单地说:

my $calc = eval { $prob*log2($prob/($probWords)) } // 'NaN';

相关内容

  • 没有找到相关文章

最新更新