根据字段计数添加其他字段

  • 本文关键字:字段 添加 其他 awk sed
  • 更新时间 :
  • 英文 :


我在文件中有以下格式的数据

"123","XYZ","M","N","P,Q"
"345",
"987","MNO","A,B,C"

我总是希望行中有5个条目,所以如果字段数为2,则需要添加3个额外的("(。

"123","XYZ","M","N","P,Q" 
"345","","","",""  
"987","MNO","A,B,C","",""  

我在页面上查找了解决方案

根据字段计数添加额外字符串-Sed/Awk

它有非常相似的要求,但当我尝试时失败了,因为我在字段中也有逗号(,(。

谢谢。

在GNUawk中,使用您显示的示例,请尝试以下代码。

awk -v s1=""" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
nof=(NF>nof?NF:nof)
next
}
NF<nof{
val=""
i=($0~/,$/?NF:NF+1)
for(;i<=nof;i++){
val=(val?val OFS:"")s1 s1
}
sub(/,$/,"")
$0=$0 OFS val
}
1
'  Input_file  Input_file

解释:添加以上详细解释。

awk -v s1=""" -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program from here setting FPAT to csv file parsing here.
BEGIN{ OFS="," }                         ##Starting BEGIN section of this program setting OFS to comma here.
FNR==NR{                                 ##Checking condition FNR==NR here, which will be true for first time file reading.
nof=(NF>nof?NF:nof)                    ##Create nof to get highest NF value here.
next                                   ##next will skip all further statements from here.
}
NF<nof{                                  ##checking if NF is lesser than nof then do following.
val=""                                 ##Nullify val here.
i=($0~/,$/?NF:NF+1)                    ##Setting value of i as per condition here.
for(;i<=nof;i++){                      ##Running loop till value of nof matches i here.
val=(val?val OFS:"")s1 s1            ##Creating val which has value of "" in it.
}
sub(/,$/,"")                           ##Removing ending , here.
$0=$0 OFS val                          ##Concatinate val here.
}
1                                        ##Printing current line here.
'  Input_file  Input_file                ##Mentioning Input_file names here.


编辑:在此处添加此代码,其中保留一个名为nof的变量,在该变量中,我们可以给出我们的字段数值,该值应在所有缺失行中添加到最小值,如果任何行的字段值超过最小值,则将使用该值添加缺失行中的字段数。

awk -v s1=""" -v nof="5" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
nof=(NF>nof?NF:nof)
next
}
NF<nof{
val=""
i=($0~/,$/?NF:NF+1)
for(;i<=nof;i++){
val=(val?val OFS:"")s1 s1
}
sub(/,$/,"")
$0=$0 OFS val
}
1
'  Input_file  Input_file

[你]总是想在行中有5个条目时,下面是一个使用FPAT的GNU awk:

$ awk '
BEGIN {
FPAT="([^,]*)|("[^"]+")"
OFS=","
}
{
NF=5                              # set NF to limit too long records
for(i=1;i<=NF;i++)                # iterate to NF and set empties to ""
if($i=="")
$i=""""
}1' file

输出:

"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""

这里有一个awk命令,可以与任何版本的awk:一起使用

awk -v n=5 -v ef=',""' -F '","' '
{
sub(/,+$/, "")
for (i=NF; i<n; ++i)
$0 = $0 ef
} 1' file
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""

对于perl,假设每个字段都是双引号:

$ perl -pe 's/,$//; s/$/q(,"") x (4 - s|","|$&|g)/e' ip.txt
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""
# if the , at the end of line isn't present
$ perl -pe 's/$/q(,"") x (4 - s|","|$&|g)/e' ip.txt
"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""

s|","|$&|g将搜索","并将其替换回。返回值是替换数,然后用于确定必须追加的字段数。

e标志允许您在替换部分中使用Perl代码。

q运算符有助于对单引号字符串使用不同的分隔符。



这里有一个替代解决方案,它可以创建一个数组,然后在必要时添加空字段。

perl -lne '@f = /"[^"]+"|[^,]+/g; print join ",", @f, qw("") x (4 - $#f)'

/"[^"]+"|[^,]+/g将字段定义为双引号字符串(内部没有双引号,因此转义引号不适用于此解决方案(或非,字符(至少有一个,因此行末尾的,将被忽略(。

CCD_ 14确定要附加的额外字段。CCD_ 15创建具有值为CCD_ 16的单个元素的数组,然后使用CCD_。

另一种使用-a进行自动拆分和-F设置分隔符的perl方法:

perl -lanF'/"*,*"/' -e 'print join ",", map ""$_"", @F[1..5]'
  • -F'/"*,*"/'-这使用了一个双引号的自动拆分分隔符,可选地在前面加逗号和引号
  • -a使用该分隔符自动拆分为@F
  • -l为打印添加换行符,除非明确告知,否则-n将以流模式处理输入,不打印
  • map ""$_"", @F[1..5]正好取5个字段,甚至是未定义的字段,并添加双引号
  • print join ",", map ...获取上面映射的结果,用逗号连接成一个字符串,然后打印

(注意:因为每一行都以字段分隔符开头,所以我忽略了空的$F[0]元素(

这可能对你有用(GNU sed(:

sed ':a;s/"[^"]*"/&/5;t;s/$/,""/;ta' file

如果有5个字段,请退出。

否则,请附加一个空字段并重复。

最新更新