我有一个包含以下内容的文件" structure& quot;text:
>some multiline-
>text
---
>in multiple chunks (this one for instance is the second of this sample)
---
>Their number, sizes and content are irregular
---
>they can't
>be known in
>advance
---
>And they'll contain pretty much any char whatsoever known or unknown in the universe
>like 𒊺/🫘/n/...
---
我希望能够通过for
循环(这是一个强烈的偏好)在
for chunk in $(someUnknownMagic --over content.file)
do
echo "I'll do something with ${chunk}"
done
我很确定有一个简单的答案,但我不能使用
IFS
:它是一个单字符分隔符列表sed
(为了简化我的模式'n---n'
走向智能)+IFS
:因为我必须选择一个分隔符字符,可能会出现在我的块
所以我没有想法(但我确信有很多选择)…
sed(简化我的模式'n- n') + IFS
很棒!插入一个唯一的字节(以下为零字节)来分隔块,然后将它们作为由该字节分隔的流读取。使用GNU sed:
sed -z 's/n---n/x00/g' content.file |
while IFS= read -r -d '' chunk; do
echo "$chunk"
done
真的,只要遍历行并累积直到找到---
行,你需要复杂吗:
chunk=""
while IFS= read -r line; do
if [[ line == '---' ]]; then
echo "$chunk"
chunk=""
fi
chunk+=$line$'n'
done < content.file
当我们有一些重复的模式时,我们可以遍历它。在你的例子中,它是---
所以我们可以这样解决…
#!/bin/bash
file=$(<"$1");
# read-only numeric value
declare -ir chunk_max=$(grep -c '---' <<< "$file");
for ((index=0; index < $chunk_max; ++index )); do
chunk="${file%%---*}";
file="${file#*---}";
echo "chunk[ $index ]";
echo "$chunk";
done
脚本:
- 计算我们有多少
---
- 循环到chunk_max
- 删除第一个chunk|右侧匹配
- 更新文件通过移除第一个块,我们提取了|左侧匹配
chunk[ 0 ]
>some multiline-
>text
chunk[ 1 ]
>in multiple chunks (this one for instance is the second of this sample)
chunk[ 2 ]
>Their number, sizes and content are irregular
chunk[ 3 ]
>they can't
>be known in
>advance
chunk[ 4 ]
>And they'll contain pretty much any char whatsoever known or unknown in the universe
>like 𒊺/🫘/n/...
一些bug修复
- 如果一行包含
---
, grep将失败 - 删除每个块前后的换行符
#!/bin/bash
file=$(<"$1");
# only a single line starts end ends with ---
declare -ir chunk_max=$(grep -c '^---$' <<< "$file");
for ((index=0; index < $chunk_max; ++index )); do
# from the end, delete everything up to "n----"
chunk="${file%%$'n'---*}";
# from the beginning, delete everything up to "---n"
file="${file#*---$'n'}";
# print
echo "chunk[ $index ]";
echo "$chunk";
done
输出chunk[ 0 ]
>some multiline-
>text
chunk[ 1 ]
>in multiple chunks (this one for instance is the second of this sample)
chunk[ 2 ]
>Their number, sizes and content are irregular
chunk[ 3 ]
>they can't
>be known in
>advance
chunk[ 4 ]
>And they'll contain pretty much any char whatsoever known or unknown in the universe
>like 𒊺/🫘/n/...
注意
$'...'
将值视为特殊字符,因此$'n'
表示它是换行符,而不是+
n
${VAR%%PATTERN}
从右侧匹配并删除${VAR##PATTERN}
从左侧匹配并删除