如何从第一个 10 个字符中按标题号链接的两个文本文件中提取行

我有两个文件：

文件1.txt：

0000001435 XYZ 与 ABC
0000001438warlaugh 世界

文件1.txt：

0000001435 XYZ with abc
0000001436 DFC whatever
0000001437 FBFBBBF
0000001438 world of warlaugh

分隔文件中的行由数字（第 1 个 10 个字符）链接。所需的输出是一个制表符分隔的文件，其中包含存在和file1.txt的行以及file2.txt中的相应行：

文件3.txt：

XYZ 与 ABC   XYZ with abc
warlaugh 世界 world of warlaugh

如何获取相应的行，然后创建一个制表符分隔的文件，其中包含file1.txt中存在的行以生成file3.txt？

请注意，只有前 10 个字符构成 ID，有 0000001438warlaugh 世界 甚至0000001432231hahaha lol等情况，只有 0000001438 和 0000001432 是 ID。

我尝试使用python，getfile3.py：

import io
f1 = {line[:10]:line[10:].strip() for line in io.open('file1.txt', 'r', encoding='utf8')}
f2 = {line[:10]:line[10:].strip() for line in io.open('file1.txt', 'r', encoding='utf8')}
f3 = io.open('file3.txt', 'w', encoding='utf8') 
for i in f1:
  f3.write(u"{}t{}n".format(f1[i], f2[i]))

但是有没有一种 bash/awk/grep/perl 命令行方式我可以得到file3.txt？

awk '
{ key = substr($0,1,10); data = substr($0,11) }
NR==FNR { file1[key] = data; next }
key in file1 { print file1[key] data }
' file1 file2

如果你愿意，你可以将 FIELDWIDTHS 与 GNU awk 一起使用，而不是 substr（）。

超长的 Perl 答案：

use warnings;
use strict;
# add files here as needed
my @input_files = qw(file1.txt file2.txt);
my $output_file = 'output.txt';
# don't touch anything below this line
my @output_lines = parse_files(@input_files);
open (my $output_fh, ">", $output_file) or die;
foreach (@output_lines) {
    print $output_fh "$_n";                    #print to output file
    print "$_n";                               #print to console
}
close $output_fh;
sub parse_files {
    my @input_files = @_;                       #list of text files to read.
    my %data;                                   #will store $data{$index} = datum1 datum2 datum3
    foreach my $file (@input_files) {           
        open (my $fh, "<", $file) or die;       
        while (<$fh>) { 
            chomp;                              
            if (/^(d{10})s?(.*)$/) {
                my $index = $1;
                my $datum = $2;
                if (exists $data{$index}) {
                    $data{$index} .= "t$datum";
                } else {
                    $data{$index} = $datum;
                } #/else
            } #/if regex found
        } #/while reading current file
        close $fh;
    } #/foreach file
    # Create output array
    my @output_lines;
    foreach my $key (sort keys %data) {
        push (@output_lines, "$data{$key}");
    } #/foreach
    return @output_lines;
} #/sub parse_files

相关内容

最新更新

热门标签：