为隐含的空白字段追加分隔符



我正在寻找一个简单的解决方案,使文件(CSV文件(中的每一行都有相同数量的逗号

例如

文件示例:

1,1
A,B,C,D,E,F
2,2,
3,3,3,
4,4,4,4

预期:

1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

在这种情况下,逗号数最多的一行有5个逗号(第2行(。因此,我想在所有行中添加其他逗号,以便每行都有相同的编号(即5个逗号(

使用awk:

$ awk 'BEGIN{FS=OFS=","} {$6=$6} 1' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

正如您在上面看到的,在这种方法中,必须在命令中硬编码最大数量的字段。

另一个方法是使CSV文件中的所有行都具有相同数量的字段。字段的数量不需要知道。将计算max字段,并将所需逗号的子字符串附加到每条记录,例如

awk -F, -v max=0 '{
lines[n++] = $0             # store lines indexed by line number
fields[lines[n-1]] = NF     # store number of field indexed by $0
if (NF > max)               # find max NF value
max = NF
}
END {
for(i=0;i<max;i++)          # form string with max commas
commastr=commastr","
for(i=0;i<n;i++)            # loop appended substring of commas 
printf "%s%sn", lines[i], substr(commastr,1,max-fields[lines[i]])
}' file

示例使用/输出

粘贴到命令行,您将收到:

$ awk -F, -v max=0 '{
>     lines[n++] = $0             # store lines indexed by line number
>     fields[lines[n-1]] = NF     # store number of field indexed by $0
>     if (NF > max)               # find max NF value
>         max = NF
> }
> END {
>     for(i=0;i<max;i++)          # form string with max commas
>         commastr=commastr","
>     for(i=0;i<n;i++)            # loop appended substring of commas
>         printf "%s%sn", lines[i], substr(commastr,1,max-fields[lines[i]])
> }' file
1,1,,,,
A,B,C,D,E,F
2,2,,,,
3,3,3,,,
4,4,4,4,,

您能尝试以下更通用的方法吗。即使Input_file中的字段数量不相同,该代码也能工作,它将首先从整个文件中读取并获得最大字段数量,然后第二次读取文件时,它将重置字段(为什么,因为我们已经将OFS设置为,所以如果当前行的字段数量小于nf值,那么将向该行添加许多逗号(。@oguz-ismail答案的增强版。

awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
nf=nf>NF?nf:NF
next
}
{
$nf=$nf
}
1
'  Input_file  Input_file

解释:添加对上述代码的详细解释。

awk '                ##Starting awk program frmo here.
BEGIN{               ##Starting BEGIN section of awk program from here.
FS=OFS=","          ##Setting FS and OFS as comma for all lines here.
}
FNR==NR{             ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
nf=nf>NF?nf:NF      ##Creating variable nf whose value is getting set as per condition, if nf is greater than NF then set it as NF else keep it as it is,
next                ##next will skip all further statements from here.
}
{
$nf=$nf             ##Mentioning $nf=$nf will reset current lines value and will add comma(s) at last of line if NF is lesser than nf.
}
1                    ##1 will print edited/non-edited lines here.
' Input_file Input_file      ##Mentioning Input_file names here.

最新更新