文件比较多列

我正在进行目录清理，以检查未在我们的测试环境中使用的文件。我有一个在文本文件中按字母顺序排序的所有文件名的列表，以及我要比较的另一个文件。

这是第一个文件的设置方式：

test1.pl
test2.pl
test3.pl

这是一个简单的，一个脚本名称，每个行文本文件的目录中所有脚本的文本文件我想根据下面的另一个文件清理。

我要比较的文件是一个选项卡文件，该文件列出了每个服务器作为测试运行的脚本，并且显然有许多重复。我想从此文件中删除测试脚本名称，然后将其吐出到另一个文件，使用uniq和sort，以便我可以在上面的文件中 diff查看未使用哪些测试脚本。

文件设置为：

server: : test1.pl test2.pl test3.pl test4.sh test5.sh

有一些线条较少，有些则有更多。我的第一个冲动是制作一个Perl脚本以拆分行并在列表中不在列表中的值，但这似乎完全效率低下。我没有在awk中经验丰富，但我认为有多种方法可以做到这一点。还有其他比较这些文件的想法吗？

perl解决方案，使服务器使用的文件的 %needed哈希，然后检查包含所有文件名的文件。

#!/usr/bin/perl
use strict;
use warnings;
use Inline::Files;
my %needed;
while (<SERVTEST>) {
    chomp;
    my (undef, @files) = split /t/;
    @needed{ @files } = (1) x @files;
}
while (<TESTFILES>) {
    chomp;
    if (not $needed{$_}) {
        print "Not needed: $_n";   
    }
}
__TESTFILES__
test1.pl
test2.pl
test3.pl
test4.pl
test5.pl
__SERVTEST__
server1::   test1.pl    test3.pl
server2::   test2.pl    test3.pl
__END__
*** prints
C:Old_Dataperlp>perl t7.pl
Not needed: test4.pl
Not needed: test5.pl

通过awk将文件名重新排列为第二个文件中的每行，然后用第一个文件的diff输出。

diff file1 <(awk '{ for (i=3; i<=NF; i++) print $i }' file2 | sort -u)

快速而肮脏的脚本来完成这项工作。如果听起来不错，请使用打开读取错误检查的文件。

use strict;
use warnings;
my @server_lines = `cat server_file`;chomp(@server_lines);
my @test_file_lines = `cat test_file_lines`;chomp(@test_file_lines);
foreach my $server_line (@server_lines){
   $server_line =~ s!server: : !!is;
   my @files_to_check = split(/s+/is, $server_line);
   foreach my $file_to_check (@files_to_check){
      my @found = grep { /$file_to_check/ } @test_file_lines;
      if (scalar(@found)==0){
        print "$file_to_check is not found in $server_linen";
      }
   }

}

如果我正确理解您的需求，您有一个带有测试列表（testfiles.txt）的文件：

test1.pl
test2.pl 
test3.pl
test4.pl
test5.pl

和一个带有服务器列表的文件，并带有文件测试（serverlist.txt）：

server1:        :       test1.pl        test3.pl
server2:        :       test2.pl        test3.pl

（我将所有空间都视为选项卡）。

如果将第二个文件转换为经过测试的文件列表，则可以使用diff与原始文件进行比较。

cut -d: -f3 serverlist.txt | sed -e 's/^t//g' | tr 't' 'n' | sort -u > tested_files.txt

cut删除了服务器名称和"："，sed删除了留下的前导选项卡，tr然后将剩余的选项卡转换为Newlines，然后我们进行独特的排序以进行排序和删除重复项。这是输出到tested_files.txt。

然后您要做的就是diff testfiles.txt tested_files.txt。

很难说，因为您没有发布预期的输出，但这是您想要的吗？

$ cat file1
test1.pl
test2.pl
test3.pl
$
$ cat file2
server: : test1.pl test2.pl test3.pl test4.sh test5.sh
$
$ gawk -v RS='[[:space:]]+' 'NR==FNR{f[$0]++;next} FNR>2 && !f[$0]' file1 file2
test4.sh
test5.sh

相关内容

最新更新

热门标签：