更快地合并多个文件的方法

我在linux中有多个小文件（约70,000个文件），我想在文件的每行末尾添加一个单词，然后将它们全部合并到一个文件中。<<<<<<<<<</p>

我正在使用此脚本：

for fn in *.sms.txt 
do 
    sed 's/$/'$fn'/' $fn >> sms.txt
    rm -f $fn
done

有更快的方法吗？

我尝试了这些文件：

for ((i=1;i<70000;++i)); do printf -v fn 'file%.5d.sms.txt' $i; echo -e "HAHAnLOLnBye" > "$fn"; done

我尝试了您的解决方案，该解决方案花费了大约 4分钟（real）进行处理。解决方案的问题在于您正在使用sed 70000次分叉！分叉相当慢。

#!/bin/bash
filename="sms.txt"
# Create file "$filename" or empty it if it already existed
> "$filename"
# Start editing with ed, the standard text editor
ed -s "$filename" < <(
   # Go into insert mode:
   echo i
   # Loop through files
   for fn in *.sms.txt; do
      # Loop through lines of file "$fn"
      while read l; do
         # Insert line "$l" with "$fn" appended to
         echo "$l$fn"
      done < "$fn"
   done
   # Tell ed to quit insert mode (.), to save (w) and quit (q)
   echo -e ".nwq"
)

该解决方案采用了CA。 6秒。

别忘了，ed是标准文本编辑器，不要忽略它！如果您喜欢ed，您也可能会喜欢ex！

欢呼！

几乎与gniourf_gniourf的解决方案相同，但没有ED：

for i in *.sms.txt 
do   
   while read line   
   do    
     echo $line $i
   done < $i
done >sms.txt

什么，对awk不爱？

awk '{print $0" "FILENAME}' *.sms.txt >sms.txt

使用 gawk，这在我的机器上的gniourf_gniourf示例上在我的计算机上进行了1-2秒（根据time）。

mawk在这里比gawk快0.2秒。

此perl脚本在每行末尾添加实际文件名。

#!/usr/bin/perl
use strict;
while(<>){
    chomp;
    print $_, $ARGV, "n";
}

这样称呼：

scriptname *.sms.txt > sms.txt

由于只有一个过程，并且不涉及正则表达处理，因此应该很快。

相关内容

最新更新

热门标签：