我编写了一个Perl程序,它以2个文本文件作为输入。
第一个文件包含这种格式的序列和概率
good morning 0.5
第二个文件包含所有单词及其概率,格式为
good 0.5
morning 0.6
我的脚本为每个序列计算一个公式
log( prob(sequence) / (prob(word1) - prob(sequence)) * (prob(word2) - prob(sequence)) )
问题是,我有一些prob(sequence)
与prob(word1)
或prob(word2)
相同的情况,所以我得到了Illegal division by zero
在这种情况下,有没有办法通过添加小数来更改第二个文件中的值?(平滑)
#!/usr/bin/perl
use strict; ## PLE
use warnings;
my $inFile = "file1.txt";
my $outFile ="TEST.txt";
my %hashFR = getVocab("file2.txt");
my @result;
my $bloc = 50000;
my $cmp = 0;
open fileIn, "<$inFile" or die $!;
while (<fileIn>) {
chomp;
my $flag = 0;
my $ligne = $_;
my @words = getWords($ligne);
if (my $prob = pop @words) {
$prob =~ s/(//g;
my $probWords = 1;
foreach my $word (@words) {
my $probWord;
if (exists $hashFR{$word}) {
$probWord = $hashFR{$word};
}
$probWords *= $probWord-$prob;
}
my $calc = $prob*log2($prob/($probWords));
my $result10 = sprintf("%.10f", $calc);
push @result, join(' ',@words) ." (".$result10.")n";
}
}
#if(scalar(@result) == $bloc)
{
$cmp += $bloc;
print "$cmp lignes traitésn";
writeToResultFile($outFile,@result);
@result = ();
}
sub getWords {
my ($ligne) = $_;
my @words = split(' ', $ligne);
return @words;
}
sub getVocab {
my ( $filename ) = @_;
my %hash = ();
open fileVocab, "<$filename" or die $!;
while (<fileVocab>) {
chomp;
if (2 == (my($mot, $prob) = split( / / ))) {
$hash{trim($mot)} = trim($prob);
}
}
close fileVocab;
return %hash;
}
sub writeToResultFile {
my ($filename,@res) = @_;
open(INFO, ">>$filename");
foreach ( @res) {
print INFO $_;
}
close INFO
}
sub log2 {
my $n = shift;
return (log($n)/log(10))/(log(2)/log(10));
}
sub trim($) {
my $string = shift;
$string =~ s/^s+//;
$string =~ s/s+$//;
return $string;
}
您可以使用这样的异常处理:
my $calc
eval {
$calc = $prob*log2($prob/($probWords));
};
if ($@){
$calc = 0;#or whatever suits you
}
或者更简单地说:
my $calc = eval { $prob*log2($prob/($probWords)) } // 'NaN';