使用 UNIX 工具从多行文本块中提取数据，以每行生成一行

我正在尝试提取文件的特定部分，如下所示：

name = "Account - UU ",
source = "1-account",
destination = "account-manager12",
other = 111111 
name = "Account - PP,
source = "2-account",
destination = "account-manager1234",
other = 1212
name = "Account - GG ",
source = "3-account",
destination = "account-manager12345",
other = 44444
name = "Account - QQ,
source = "4-account",
destination = "account-manager123456"
other = 23232323

我的预期输出是

name = "Account - UU" | source = "1-account" | destination = "account-manager12"
name = "Account - PP" | source = "2-account" | destination = "account-manager1234"
name = "Account - GG" | source = "3-account" | destination = "account-manager12345"
name = "Account - QQ" | source = "4-account" | destination = "account-manager123456"

有什么方法可以使用 grep/awk 命令实现相同的目标吗？我真的很感激任何建议。谢谢。

在 https://ideone.com/0O8t3U 看到这个运行：

#!/usr/bin/env bash
shopt -s extglob  # enable extended globbing, of which @(one|two|three) is an example
output=""
while IFS= read -r line; do
case $line in
@(name|source|destination)" = "*)  # "name = " or "source = " or "destination = "
output+="${line%,} | " ;;        # strip trailing comma before appending to output
"")                                # matches only an empty line
printf '%sn' "${output%' | '}"  # print our output, without the last " | "
output=""                        # ...then reset that output to empty
;;
esac
done
# finally, print anything that didn't have a blank line after it (last block of input)
[[ $output ]] && printf '%sn' "${output% | }"

如果我们真的不必处理 $1 中缺少的引号和尾随空白：

$ awk -v RS= -F',?n' -v OFS=' | ' '{print $1, $2, $3}' file
name = "Account - UU " | source = "1-account" | destination = "account-manager12"
name = "Account - PP | source = "2-account" | destination = "account-manager1234"
name = "Account - GG " | source = "3-account" | destination = "account-manager12345"
name = "Account - QQ | source = "4-account" | destination = "account-manager123456"

或者如果我们这样做：

$ awk -v RS= -F',?n' -v OFS=' | ' '{gsub(/^"? *| *"?$/,""",$1); print $1, $2, $3}' file
"name = "Account - UU" | source = "1-account" | destination = "account-manager12"
"name = "Account - PP" | source = "2-account" | destination = "account-manager1234"
"name = "Account - GG" | source = "3-account" | destination = "account-manager12345"
"name = "Account - QQ" | source = "4-account" | destination = "account-manager123456"

您能否尝试以下操作，使用 GNUawk中显示的示例编写和测试。

awk '
BEGIN{
OFS=" | "
}
/^ +name/{
if(val){
print val
val=""
}
found=1
}
found{
val=(val?val OFS:"")$0
}
/^ +other/{
found=""
}
END{
if(val){
print val
}
}'  Input_file

使用两个 Perl 单行代码的组合，paste：

perl -lne 'print for /^s*((?:name|source|destination)s*=s*[^,]*)/' input_file | paste - - - | perl -pe 's/t/ | /g'

Perl 单行代码使用这些命令行标志：
-e：告诉 Perl 在内联中查找代码，而不是在文件中查找代码.
-n

：一次循环一行输入，默认情况下将其分配给$_.
-p：一次循环一行输入，默认情况下将其分配给$_。在每次循环迭代后添加print $_.
-l：在内联执行代码之前去除输入行分隔符(默认情况下在 *NIX 上"n")，并在打印时追加它。

第一个 Perl 单行打印此正则表达式中用括号捕获的所有组，这些组创建输出表单元格，每行 1 个单元格：
/^s*((?:name|source|destination)s*=s*[^,]*)/：行的开头，后跟 0 或更多空格，后跟key = value对，其中键在非捕获括号内指定(?:PATTERN)。value是 0 个或多个非逗号字符 ([^,]*) 的延伸

第二个 Perl 单行代码使用/g(多个匹配)正则表达式修饰符将所有选项卡替换为|。

paste - - -：在 TAB 上连接输入的每 3 行并打印为一行。

另请参阅：
perldoc perlrun：如何执行 Perl 解释器：命令行开关
perldoc perlre： Perl 正则表达式 (正则表达式)
perldoc perlre： Perl 正则表达式 (正则表达式)：量词;角色职业和其他特殊逃生;断言;捕获组
perldoc perlrequick：Perl 正则表达式快速入门

相关内容

最新更新

热门标签：