Unix:查找并替换连续管道的连续逗号



我正在Unix中将双引号CSV转换为管道分隔的txt文件。我已经使用下面的sed命令来替换"插入|,然后删除开始和结束的双引号。

sed -e 's/","/|/g' -e 's/"//g' filenm.csv > filenm.txt

但该文件似乎有连续的逗号,没有双引号,它们不会被替换。

Col1|col2|col3|col4|col5|col6|col7|col8
Val1|val2|val3,,,,val7|val8

现在,我想将所有这些连续的逗号转换为连续的管道,因为它们表示空或null字段。

其他字段的字段值中也有逗号,不应更改。

我试着在下面使用,但没有成功。

sed -e 's/,{1,}/|{1,}/g' filenm.csv > filenm.txt

在记事本中打开的示例csv文件:

"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"

我希望这有助于重现问题并解决问题。

提前感谢。。。。

这可能对你有用(GNU sed(:

sed -E ':a;s/^(("[^",]*",+)*"[^",]*),/1n/;ta;y/,n/|,/' file

"的之间迭代地用换行符替换,,然后将,的转换为|的,将换行符转换为,的。

您可以使用perl:

perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' filenm.csv > filenm.txt

详细信息

  • "([^"]*)"|,-匹配"的正则表达式模式,然后在组1中捕获除"之外的任何零个或多个字符,然后匹配",或者在所有其他上下文中仅匹配,
  • defined($1) ? $1 : "|"-RHS,替换,用组1值(如果组1匹配(或|(如果,匹配(替换匹配
  • ge-g代表global(替换所有出现的内容(,e使Perl将RHS视为Perl表达式

查看在线测试:

#!/bin/bash
s='"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"'
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' <<< "$s"

输出:

ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|0|No.38,3rd st, RRR NNN, TRT||||9999999999

使用awk:

awk -F " '{ for(i=1;i<=NF;i++) { if ($i ~ /^[,]{2,}$/) { $i="," } } OFS=""";gsub("","",""|"",$0)}1' sample.csv

说明:

awk -F " '{  # Set the field delimiter to double quote
for(i=1;i<=NF;i++) { 
if ($i ~ /^[,]{2,}$/) { 
$i="," # Loop through each field and if is contains 2 or more commas, set that field to one comma
} 
} 
OFS=""";
gsub("","",""|"",$0) # Substitute "," for "|"
}1' sample.csv

我将使用GNUAWK进行以下操作。设file.txt内容为

"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"

然后

awk 'BEGIN{FS=""";OFS=""}{for(i=1;i<=NF;i+=2){$i=gensub(/,/,"|","g",$i)};print $0}' file.txt

输出

ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|15|No.38,3rd st, RRR NNN, TRT||||9999999999
456|DEF|12/20/2020|||||test-country|9999999999
465|XYZ|||No.38,3rd st, RRR NNN, TRT||||9999999999

我假设第一列和最后一列永远不会是空的。我使用"作为字段分隔符,然后在每个奇数字段(仅包含,(中,我将所有,更改为|。最后我把这样修改后的线条全部打印出来。

(在GNU Awk 5.0.1中测试(

最新更新