如何在 unix 中的" "之间搜索和替换



输入:

20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30

理想结果:

20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2     
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30

如何去掉引号之间的逗号?引号之间也有没有逗号的行。

我需要删除里面的逗号"JUDICIARY, STATE COURTS (STATE COURTS)"(两个逗号都出现在一行上)。

有些行有几个字段,在双之间有逗号

这里有一个脚本,演示了如何做到这一点——欢迎来到sed中的goto世界。这是使用BSD sed编写的,它使用-E来启用扩展正则表达式;GNU sed使用-r执行相同的任务。

sed -E -e 's/^/A: /p; s/^A: /B: /' 
       -e ':again' 
       -e 's/^(([^"]*|"[^",]*")*)("[^"]*),([^"]*")/134/' 
       -e 't again' 
       data

假设数据在一个名为data的文件中。第一个-e简单地回显以A:为前缀的原始输入,然后将前缀更改为B:。这是调试材料。第二CCD_ 12制作可以跳转到的标签CCD_。如果前一步骤进行了替换,则第四个-e跳到again标签。

所有的兴奋都在第三个CCD_ 16。该模式查找行的开头,然后是一个零次或多次出现的序列"不是双引号";或";双引号后接零个或多个"非双引号"和一个双引号";,然后是一个双引号、一个"非双引号"序列、一个逗号、更多的"非双括号"和一个双括号。它被前缀、双引号之间逗号之前的部分和双引号之间的逗号之后的部分所取代。

给定一个数据文件:

2000,"xxxx,xxxx",192.168.3.2
2000,"xx,xx,xx",192.16.3.2
2000,"xxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
201,"x,x",192.168.3.2,"yyy"
201,"xx",192.168.3.2,"yyy",2211
201,"xxx",192.168.3.2,"y,y"
201,"xxx",192.168.3.2,"yyy"
201,"x,x",192.168.3.2,"y,y"
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30 

脚本生成输出:

A: 2000,"xxxx,xxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 2000,"xx,xx,xx",192.16.3.2
B: 2000,"xxxxxx",192.16.3.2
A: 2000,"xxxxxxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2,"yyyyyyyyyy"
A: 20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
B: 201,"xx",192.168.3.2,"yy","aaaaccccdddd",192,"zzzz",234
A: 201,"x,x",192.168.3.2,"yyy"
B: 201,"xx",192.168.3.2,"yyy"
A: 201,"xx",192.168.3.2,"yyy",2211
B: 201,"xx",192.168.3.2,"yyy",2211
A: 201,"xxx",192.168.3.2,"y,y"
B: 201,"xxx",192.168.3.2,"yy"
A: 201,"xxx",192.168.3.2,"yyy"
B: 201,"xxx",192.168.3.2,"yyy"
A: 201,"x,x",192.168.3.2,"y,y"
B: 201,"xx",192.168.3.2,"yy"
A: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30 
B: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30 

请注意:这很难。如果您有选择,请使用能够识别CSV格式的工具。例如,Python附带了一个CSV模块;Perl有Text::CSV(以及子模块Text::CSV_PPText::CSV_XS)可以处理此问题;有用于操作CSV文件的自定义工具。

还要注意,微软支持的符号与RFC 4180略有不同,RFC 4180是互联网世界试图合理化微软使用的符号(近似)。

最新更新