在 gawk 中用空格、引号或括号定义字段

我有一个格式为以下的文本文件：

RANDOM-WORD1 ==> "string with whitespaces" (string with whitespaces)
RANDOM-WORD2 ==> "another string" (and another)
RANDOM-WORD3 ==> "yet another string" (and another)

我想通过以下方式定义gawk分量表：

空格
引号
括弧

例如，第 1 行：

$1: RANDOM-WORD1
$2: ==>
$3: "string with whitespaces"
$4: (string with whitespaces)

我读过gawk的FPAT手册，我写了这个：

FPAT = "([^[:blank:]]*)|("[^"]+")|(([^)]+))"

但是，它不适用于括号，因为我得到：

$1: RANDOM-WORD1
$2: ==>
$3: "string with whitespaces"
$4: (string

我尝试转义第三句中的括号，但它也不起作用。我想忽略任何不是一对( ... )内)的角色。我知道一个事实，不会有任何嵌套的括号。

注意：我怎样才能忽略引号/括号作为字段数据？例如：

$1: RANDOM-WORD1
$2: ==>
$3: string with whitespaces
$4: string with whitespaces

至于括号，您需要对它们进行两次转义：

FPAT = "([^[:blank:]]*)|("[^"]+")|(\([^\)]+\))"

要去掉括号和引号，请使用substr：

$3 = substr($3, 2, length($3) - 2);
$4 = substr($4, 2, length($4) - 2);

这个FPAT = "([^ ]+)|([(][^)]+[)])|("[^"]+")"对我有用。它使用的技巧是内部[ ] (和)不需要引用。

关于

您关于去掉引号或括号的第二个问题，我没有比添加这样的操作更好的主意了：

{ for( i=1; i<= NF; i++ ) {
    b = substr( $i, 1, 1 );
    e = substr( $i, length( $i ), 1 );
    if( ( b == """ || b == "(" ) && (b == e) ) {
      $i = substr( $i,2 , length( $i ) - 2 )
    }
  }
}

我不会为此使用 FPAT，因为您的字段有一个顺序，而不仅仅是一个模式。我会使用 3rd arg 来匹配（），因为它更简单、更健壮：

match($0,/(S+)s(S+)s"([^"]+)"s(([^)]+).*/,a)

例如：

$ awk 'match($0,/(S+)s(S+)s"([^"]+)"s(([^)]+).*/,a) { print; for (i=1; i in a; i++) printf "a[%d]: %sn", i, a[i] }' file
RANDOM-WORD1 ==> "string with whitespaces" (string with whitespaces)
a[1]: RANDOM-WORD1
a[2]: ==>
a[3]: string with whitespaces
a[4]: string with whitespaces
RANDOM-WORD2 ==> "another string" (and another)
a[1]: RANDOM-WORD2
a[2]: ==>
a[3]: another string
a[4]: and another
RANDOM-WORD3 ==> "yet another string" (and another)
a[1]: RANDOM-WORD3
a[2]: ==>
a[3]: yet another string
a[4]: and another

相关内容

最新更新

热门标签：