有两个文件file1和file2。他们的内容是:
file1-输入
Line1
Line2
Line3
Line4
file2-输入
<head>
<intro> This is an introduction </intro>
<line> this is a line1 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> this is a line2 </intro>
</head>
<head>
<intro> This is an introduction </intro>
<line> this is a line3 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> this is a line4 </intro>
</head>
想要读取文件1并用line1,line2,line3,line4替换file2中的行标签值(请参阅输出)。这样做的最简单的方法(SED,Awk,Grep,Perl,Python ...)是什么?
输出
<head>
<intro> This is an introduction </intro>
<line> Line1 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> Line2 </intro>
</head>
<head>
<intro> This is an introduction </intro>
<line> Line3 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> Line4 </intro>
</head>
如果您认为这是重复的,请链接副本。我试图走出看起来相似但没有发现的解决方案。
编辑:以防万一某人想要附加/连接而不是更换,可以轻松地修改 markline 在@cdarke的python2代码中表达式以下并使用。
markline = re.sub(r'</line>$',''+subt+'</line>',markline)
gnu sed and bash的过程替代:
sed -e '/<line>[^<]*</[^>]*>/{R '<(sed 's|.*| <line> & </line>|' file1) -e 'd;}' file2
输出:
<head>
<intro> This is an introduction </intro>
<line> Line1 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> Line2 </line>
</head>
<head>
<intro> This is an introduction </intro>
<line> Line3 </line>
</head>
<head>
<intro> This is another intro </intro>
<line> Line4 </line>
</head>
最简单的方法可能是您熟悉的方法。如果您知道这些语言,那么在Perl和Python(以及Ruby和Lua)中很容易。"简单"是主观的。
(编辑以添加空间的示例)
这是Python 2版本:
import re
lines = open('file1').readlines()
with open('file2') as fh:
for markline in fh:
if '<line>' in markline:
subt = lines.pop(0).rstrip()
markline = re.sub(r'<line>.*</line>', '<line> ' + subt + ' </line>',
markline)
print markline,
这是一个perl版本:
use strict;
use warnings;
open(my $fh1, 'file1') or die "Unable to open file1 for read: $!";
my @lines = <$fh1>;
chomp(@lines);
close($fh1);
open(my $fh2, 'file2') or die "Unable to open file2 for read: $!";
while (<$fh2>) {
s/<line>.*</line>/'<line> ' . shift(@lines) . ' </line>'/e;
print
}
close($fh2);
我在输入数据中假设了错别字。
我显示的代码作品,但不灵活。所有这些语言都有几种XML解析器,实际上,您应该学习其中一种语言和XML解析器。