使用 UNIX 工具从多行文本块中提取数据,以每行生成一行



我正在尝试提取文件的特定部分,如下所示:

name = "Account - UU ",
source = "1-account",
destination = "account-manager12",
other = 111111 
name = "Account - PP,
source = "2-account",
destination = "account-manager1234",
other = 1212
name = "Account - GG ",
source = "3-account",
destination = "account-manager12345",
other = 44444
name = "Account - QQ,
source = "4-account",
destination = "account-manager123456"
other = 23232323

我的预期输出是

name = "Account - UU" | source = "1-account" | destination = "account-manager12"
name = "Account - PP" | source = "2-account" | destination = "account-manager1234"
name = "Account - GG" | source = "3-account" | destination = "account-manager12345"
name = "Account - QQ" | source = "4-account" | destination = "account-manager123456"

有什么方法可以使用 grep/awk 命令实现相同的目标吗?我真的很感激任何建议。谢谢。

在 https://ideone.com/0O8t3U 看到这个运行:

#!/usr/bin/env bash
shopt -s extglob  # enable extended globbing, of which @(one|two|three) is an example
output=""
while IFS= read -r line; do
case $line in
@(name|source|destination)" = "*)  # "name = " or "source = " or "destination = "
output+="${line%,} | " ;;        # strip trailing comma before appending to output
"")                                # matches only an empty line
printf '%sn' "${output%' | '}"  # print our output, without the last " | "
output=""                        # ...then reset that output to empty
;;
esac
done
# finally, print anything that didn't have a blank line after it (last block of input)
[[ $output ]] && printf '%sn' "${output% | }"

如果我们真的不必处理 $1 中缺少的引号和尾随空白:

$ awk -v RS= -F',?n' -v OFS=' | ' '{print $1, $2, $3}' file
name = "Account - UU " | source = "1-account" | destination = "account-manager12"
name = "Account - PP | source = "2-account" | destination = "account-manager1234"
name = "Account - GG " | source = "3-account" | destination = "account-manager12345"
name = "Account - QQ | source = "4-account" | destination = "account-manager123456"

或者如果我们这样做:

$ awk -v RS= -F',?n' -v OFS=' | ' '{gsub(/^"? *| *"?$/,""",$1); print $1, $2, $3}' file
"name = "Account - UU" | source = "1-account" | destination = "account-manager12"
"name = "Account - PP" | source = "2-account" | destination = "account-manager1234"
"name = "Account - GG" | source = "3-account" | destination = "account-manager12345"
"name = "Account - QQ" | source = "4-account" | destination = "account-manager123456"

您能否尝试以下操作,使用 GNUawk中显示的示例编写和测试。

awk '
BEGIN{
OFS=" | "
}
/^ +name/{
if(val){
print val
val=""
}
found=1
}
found{
val=(val?val OFS:"")$0
}
/^ +other/{
found=""
}
END{
if(val){
print val
}
}'  Input_file

使用两个 Perl 单行代码的组合,paste

perl -lne 'print for /^s*((?:name|source|destination)s*=s*[^,]*)/' input_file | paste - - - | perl -pe 's/t/ | /g' 
Perl 单行代码使用这些命令行标志:
-e:告诉 Perl 在内联中查找代码,而不是在文件中查找代码.
-n

:一次循环一行输入,默认情况下将其分配给$_.
-p:一次循环一行输入,默认情况下将其分配给$_。在每次循环迭代后添加print $_.
-l:在内联执行代码之前去除输入行分隔符(默认情况下在 *NIX 上"n"),并在打印时追加它。

第一个 Perl 单行打印此正则表达式中用括号捕获的所有组,这些组创建输出表单元格,每行 1 个单元格:
/^s*((?:name|source|destination)s*=s*[^,]*)/:行的开头,后跟 0 或更多空格,后跟key = value对,其中键在非捕获括号内指定(?:PATTERN)value是 0 个或多个非逗号字符 ([^,]*) 的延伸

第二个 Perl 单行代码使用/g(多个匹配)正则表达式修饰符将所有选项卡替换为|

paste - - -:在 TAB 上连接输入的每 3 行并打印为一行。

另请参阅:
perldoc perlrun: 如何执行 Perl 解释器: 命令行开关
perldoc perlre: Perl 正则表达式 (正则表达式)
perldoc perlre: Perl 正则表达式 (正则表达式): 量词;角色职业和其他特殊逃生;断言;捕获组
perldoc perlrequick:Perl 正则表达式快速入门

最新更新