我知道这个问题已经回答了,但用逗号作为分隔符。如何让awk忽略双引号内的字段分隔符?
但是我的文件是用管道分隔的,当我在regex中使用它时,它只是一个regex,不能获得正确的输出。我没有广泛使用awk。。我的要求是在管道字符之前加上一个斜杠,如果它有值的话。
由于文件大小几乎是5GB,所以考虑选择特定列并转义管道。
输入:
"first | last | name" |" steve | white | black"| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
预期输出:
"first | last | name" |" steve | white | black "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
我试着在gsub中使用gawk,但没有成功。。有其他方法吗?
另外,如果我必须在多个列中进行检查,我该怎么做?
假设:
- 可以有多个嵌入
|
字符的字段(所述字段将用双引号括起来( - 单个字段中可能有多个嵌入的CCD_ 2字符
- 双引号不会显示为其他双引号中的嵌入字符
设置:
$ cat pipe.dat
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022 # multiple double-quoted fields, multiple pipes between double quotes
cars | camaro | chevy | 2033 # no double quotes
注意:此处添加注释以突出显示新案例
awk
的一个想法:
awk '
BEGIN { FS=OFS=""" } # define field delimiters as double quote
{ for (i=2;i<=NF;i+=2) # double quoted data resides in the even numbered fields
gsub(/|/,"\|",$i) # escape all pipe characters in field #i
print
}
' pipe.dat
这将生成:
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022
cars | camaro | chevy | 2033
假设|
分隔符和双引号之间没有空格。。。
GNU awk
的一个想法(使用FPAT
功能(:
awk -v FPAT='([^|]*)|("[^"]+")' '
BEGIN { OFS="|" }
{ for (i=1;i<=NF;i++)
gsub(/|/,"\|",$i)
print
}
' pipe.dat
这也会生成:
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022
cars | camaro | chevy | 2033