在File1中找到包含File2的任何关键字的最快方法



我有两个文件。第一个文件具有显示的3000个记录示例(file1),第二个文件具有十万个记录,显示了示例(file2)。我基本上是用file1file2上进行每个条目的GREP,然后检索File1中的所有内容。我正在以正常的循环进行操作:

for i in `cat file2.txt`; do cat file1 | grep -i -w $i; done > /var/tmp/file3.txt

由于数据太大,我花了8个小时才能完成此操作。.我需要您的专业知识,以便我可以在不到2-3个小时内以有效的方式进行此操作。。

示例条目

file1

server1:user1:x:13621:22324:User One:/users/user1:/bin/ksh |  
server1:user2:x:14537:100:User two:/users/user2:/bin/bash |  
server1:user3:x:14598:24:User three:/users/user3:/bin/bash |  
server1:user4:x:14598:24:User Four:/users/user4:/bin/bash |  
server1:user5:x:14598:24:User Five:/users/user5:/bin/bash | 

file2

user1  
user2  
user3  

给这个镜头。

测试数据:

%_Host@User> head file1.txt file2.txt
==> file1.txt <==
server1:user1:x:13621:22324:User One:/users/user1:/bin/ksh |
server1:user2:x:14537:100:User two:/users/user2:/bin/bash |
server1:user3:x:14598:24:User three:/users/user3:/bin/bash |
server1:user4:x:14598:24:User Four:/users/user4:/bin/bash |
server1:user5:x:14598:24:User Five:/users/user5:/bin/bash |
==> file2.txt <==
user1
user2
user3
#user4
%_Host@User>

输出:

    %_Host@User> ./2comp.pl file1.txt file2.txt   ; cat output_comp
    server1:user1:x:13621:22324:User One:/users/user1:/bin/ksh |
    server1:user3:x:14598:24:User three:/users/user3:/bin/bash |
    server1:user2:x:14537:100:User two:/users/user2:/bin/bash |
    %_Host@User>
    %_Host@User>

脚本:请再尝试一下。重新检查文件订单。File1首先,然后文件第二:./2comp.pl file1.txt file2.txt

%_Host@User> cat 2comp.pl
#!/usr/bin/perl
use strict ;
use warnings ;
use Data::Dumper ;
my ($file2,$file1,$output) = (@ARGV,"output_comp") ;
my (%hash,%tmp) ;
(scalar @ARGV != 2 ? (print "Need 2 files!n") : ()) ? exit 1 : () ;
for (@ARGV) {
  open FH, "<$_" || die "Cannot open $_n" ;
  while (my $line = <FH>){$line =~ s/^.+[()].+$| +?$//g ; chomp $line ; $hash{$_}{$line} = "$line"}
  close FH ;}
open FH, ">>$output" || die "Cannot open outfile!n" ;
foreach my $k1 (keys %{$hash{$file1}}){
  foreach my $k2 (keys %{$hash{$file2}}){
    if ($k2 =~ m/^.+?$k1.+?$/i){    # Case Insensitive matching.
      if (!defined $tmp{"$hash{$file2}{$k2}"}){
        print FH "$hash{$file2}{$k2}n" ;
        $tmp{"$hash{$file2}{$k2}"} = 1 ;
                }}}} close FH  ;
# End.
%_Host@User>

谢谢你好运。

相关内容

最新更新