删除图案之间的空格

我有一个日志文件，其中数据用空格分隔。不幸的是，其中一个数据字段也包含空格。我想用"%20"替换这些空格。它看起来像这样：

2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right

预期结果是

2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

无法预测我们在 IP 地址和".doc"之间有多少空格。因此，如果可能的话，我想使用纯 bash 在这两种模式之间更改它们。

感谢您的帮助

$ cat file
2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right

使用 Perl：

$ perl -lne 'if (/(.*([0-9]{1,3}.){3}[0-9]{1,3} )(.*)(.doc.*)/){($a,$b,$c)=($1,$3,$4);$b=~s/ /%20/g;print $a.$b.$c;}' file
2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

这可能对你有用(GNU sed)：

sed 's/S*s/&n/4;s/(sS*){3}$/n&/;h;s/ /%20/g;H;g;s/(n.*n)(.*)n.*n(.*)n.*/32/' file

这会将行分成三部分，复制该行，在其中一个副本中用%20替换space，并重新组装该行，丢弃不需要的部分。

编辑：

参考下面的评论，上述解决方案可以改进为：

sed -r 's/S*s/&n/4;s/.*.doc/&n/;h;s/ /%20/g;H;g;s/(n.*n)(.*)n.*n(.*)n.*/32/' file

尚未测试，但在 Bash 4 中可以做到这一点

if [[ $line =~ (.*([0-9]+.){3}[0-9]+ +)([^ ].*.doc)(.*) ]]; then
nospace=${BASH_REMATCH[3]// /%20}
printf "%s%s%sn" ${BASH_REMATCH[1]} ${nospace} ${BASH_REMATCH[4]}
fi

这是GNU sed的一种方法：

echo "2012-11-02 23:48:36 INFO 10.2.3.23 something strange name.doc 3.0.0 view1 orientation_right" |
sed -r 's/(([0-9]+.){3}[0-9]+s+)(.*.doc)/1n3n/; h; s/[^n]+n([^n]+)n.*$/1/; s/s/%20/g; G; s/([^n]+)n([^n]+)n([^n]+)n(.*)$/214/'

输出：

2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

解释

s/(([0-9]+.){3}[0-9]+s+)(.*.doc)/1n3n/  # Separate the interesting bit on its own line
h                                             # Store the rest in HS for later
s/[^n]+n([^n]+)n.*$/1/                   # Isolate the interesting bit
s/s/%20/g                                     # Do the replacement
G                                             # Fetched stored bits back
s/([^n]+)n([^n]+)n([^n]+)n(.*)$/214/ # Reorganize into the correct order

只是砰。假设 4 个字段出现在空格分隔字符串之前，3 个字段出现在之后：

reformat_line() {
local sep i new=""
for ((i=1; i<=$#; i++)); do
if (( i==1 )); then
sep=""
elif (( (1<i && i<=5) || ($#-3<i && i<=$#) )); then
sep=" "
else
sep="%20"
fi
new+="$sep${!i}"
done
echo "$new"
}
while IFS= read -r line; do
reformat_line $line    # unquoted variable here
done < filename

输出

2012-11-02 23:48:36 INFO 10.2.3.23 something%20strange%20name.doc 3.0.0 view1 orientation_right

雷神答案的变体，但使用 3 个过程(4 个带有cat波纹管，但您可以通过将 your_file 作为第一个 sed 的最后一个参数来摆脱它)：

cat your_file |
sed -r -e 's/ (([0-9]+.){3}[0-9]+) +(.*.doc) / 1n3n/' |
sed -e '2~3s/ /%20/g' |
paste -s -d "  n"

正如托尔解释的那样：

第一个sed(s/ (([0-9]+.){3}[0-9]+) +(.*.doc) / 1n3n/)将有趣的位分开在它自己的行上。

然后：

第 2

行 sed 将第 2 行和每 3 行的%20个空格替换所有空格。
最后，将其粘贴回一起。

必须注意的是，2~3部分是GNU sed扩展。如果你没有 GNU sed，你可以做：

cat your_file |
sed -r -e 's/ (([0-9]+.){3}[0-9]+) +(.*.doc) / 1n3n/' |
sed -e 'N;P;s/.*n//;s/ /%20/g;N' |
paste -s -d "  n"

解释

相关内容

最新更新

热门标签：