列中的Awk regex子字符串

  • 本文关键字:字符串 regex Awk awk
  • 更新时间 :
  • 英文 :


我有一个带有逗号分隔字段的数据文件:

379565,COFFEE,297678,      ,21,21,I, 6,  10.00,               ,     ,                            ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD#05260540         ,YES               ,N,N,20210625,
380685,COMICS,297634,      ,21,21,I, 3,  21.00,MAIN NEWS      ,     ,BATHS                       ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD# IS 05240526     ,YES               ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6,  21.00,               ,     ,SCHOOL PAGE                 ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,                    ,CMYK              ,N,N,20210625

当第4列只有空格时,需要从第15列中提取8位数的广告号。

这个awk检查第4列是否只有空格,如果是,则将第15列复制到4:

awk -F, '{ if ($4 ~ /^[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/) {OFS=",";{$4=$15} print} else print}'

如何从第15列中仅提取8位广告号(没有"ad#"或"ad#IS"部分(并放入第4列?

预期结果:

379565,COFFEE,297678,05260540,21,21,I, 6,  10.00,               ,     ,                            ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD#05260540         ,YES               ,N,N,20210625,
380685,COMICS,297634,05240526,21,21,I, 3,  21.00,MAIN NEWS      ,     ,BATHS                       ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD# IS 05240526     ,YES               ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6,  21.00,               ,     ,SCHOOL PAGE                 ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,                    ,CMYK              ,N,N,20210625

您可以使用此awk:

awk 'BEGIN{FS=OFS=","} $4 ~ /^[[:blank:]]*$/ {$4 = $15; gsub(/[^[:digit:]]+/, "", $4)} 1' file
379565,COFFEE,297678,05260540,21,21,I, 6,  10.00,               ,     ,                            ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD#05260540         ,YES               ,N,N,20210625,
380685,COMICS,297634,05240526,21,21,I, 3,  21.00,MAIN NEWS      ,     ,BATHS                       ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,AD# IS 05240526     ,YES               ,N,N,20210625,
337708,COMICS,298047, 84558,21,21,I, 6,  21.00,               ,     ,SCHOOL PAGE                 ,01-DISPLAY REVENUE  ,17-HOUSE ACCOUNT    ,                    ,CMYK              ,N,N,20210625

一种扩展形式:

awk '
BEGIN {FS=OFS=","}
$4 ~ /^[[:blank:]]*$/ {
$4 = $15
gsub(/[^[:digit:]]+/, "", $4)
}
1' file

最新更新