awk忽略双引号内的字段分隔符管道



我知道这个问题已经回答了,但用逗号作为分隔符。如何让awk忽略双引号内的字段分隔符?

但是我的文件是用管道分隔的,当我在regex中使用它时,它只是一个regex,不能获得正确的输出。我没有广泛使用awk。。我的要求是在管道字符之前加上一个斜杠,如果它有值的话。

由于文件大小几乎是5GB,所以考虑选择特定列并转义管道。

输入:

"first | last | name" |" steve | white | black"| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019

预期输出:

"first | last | name" |" steve | white | black "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019

我试着在gsub中使用gawk,但没有成功。。有其他方法吗?

另外,如果我必须在多个列中进行检查,我该怎么做?

假设:

  • 可以有多个嵌入|字符的字段(所述字段将用双引号括起来(
  • 单个字段中可能有多个嵌入的CCD_ 2字符
  • 双引号不会显示为其他双引号中的嵌入字符

设置:

$ cat pipe.dat
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022        # multiple double-quoted fields, multiple pipes between double quotes
cars | camaro | chevy | 2033                             # no double quotes

注意:此处添加注释以突出显示新案例

awk的一个想法:

awk '
BEGIN { FS=OFS=""" }              # define field delimiters as double quote
{ for (i=2;i<=NF;i+=2)       # double quoted data resides in the even numbered fields
gsub(/|/,"\|",$i)    # escape all pipe characters in field #i
print
}
' pipe.dat

这将生成:

name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022
cars | camaro | chevy | 2033

假设|分隔符和双引号之间没有空格。。。

GNU awk的一个想法(使用FPAT功能(:

awk -v FPAT='([^|]*)|("[^"]+")' '
BEGIN { OFS="|" }
{ for (i=1;i<=NF;i++)
gsub(/|/,"\|",$i)
print
}
' pipe.dat

这也会生成:

name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022
cars | camaro | chevy | 2033

最新更新