如何忽略比较脚本中的换行符,如下所示


    #!/bin/bash
    function compare {
    for file1 in /dir1/*.csv
    do
    file2=/dir2/$(basename "$file1")
    if [[ -e "$file2" ]]    ### loop only if the file2 with same filename as file1 is present ###
    then
    awk 'BEGIN {FS==","} NR == FNR{arr[$0];next} ! ($0 in arr)' $file1 $file2 > /dirDiff/`echo $(basename "$file1")_diff`
    fi
    done
    }
    function removeNULL {
    for i in /dirDiff/*_diff
    do
    if [[ ! -s "$i" ]]     ### if file exists with zero size ###
    then
    rm -- "$i"
    fi
    done
    }
    compare
    removeNULL
文件

1 和文件2是来自两个不同来源的格式化文件。Source1 诱导任意换行符使一条记录拆分为两条记录,从而导致脚本失败并生成错误的 diff o/p。我希望我的脚本通过忽略 Source1 诱导的换行符来比较黑白文件1和文件2。但是,我不确定我的脚本将如何识别黑白实际的新记录和手动诱导的 newLine。

    file1:-
    11447438218480362,6005560623,6005560623,11447438218480362,5,20160130103044,100,195031,,1,0,00,49256,0
    ,195031_5_00_6,0.1,6;
    11447691224860640,6997557634,6997557634,11447691224860640,601511,20160130103457,500,195035,,2,0,00,45394,0
    ,195035_601511_00_6,0.5,6;
    file2:-
    11447438218480362,6005560623,6005560623,11447438218480362,5,20160130103044,100,195031,,1,0,00,49256,0,195031_5_00_6,0.1,6;
    11447691224860640,6997557634,6997557634,11447691224860640,601511,20160130103457,500,195035,,2,0,00,45394,0,195035_601511_00_6,0.5,6;

感谢您的支持。

您可以预处理 file1 将不以 ; 结尾的行与下一行连接:

sed -r ":again; /;$/! { N; s/(.+)[rn]+(.+)/12/g; b again; }" file1

因此,文件 1 和文件 2 具有可比性。

最新更新