在 Bash 中，如何仅将扩展的 ASCII 字符转换为它们的十六进制代码?

我需要检查我的字符串变量是否存在扩展的 ASCII 字符，一个字节，十进制代码 128-255。如果有的话，用多字符十六进制等效物替换它，准备进一步的 grep 命令等。

示例字符串："Ørsted\ Salg"，我需要将其转换为"\xD8rsted\ Salg"。

我知道在 Bash 4 中使用 hastable 的方法：

declare -A symbolHashTable=(
["Ø"]="D8"
);
currSearchTerm="Ørsted Salg"
for curRow in "${!symbolHashTable[@]}"; do
currSearchTerm=$(echo $currSearchTerm | sed s/$curRow/'\x'${symbolHashTable[$curRow]}/)
done

，但对于 127 个案例来说，这似乎太乏味了。应该有一种方法可以做得更短，可能更快，而无需编写所有符号。

我可以通过以下方式检测字符串中是否有任何字符：

echo $currSearchTerm | grep -P "[x80-xFF]"

我几乎可以肯定有一种方法可以让 sed 做到这一点，但我在"替换为"部分的某个地方迷路了。

你可以用 Perl 轻松地做到这一点：

#!/bin/bash
original='Ørsted'
replaced=$(perl -pe 's/([x80-xFF])/"\x".unpack "H*", $1/eg' <<< "$original")
echo "The original variable's hex encoding is:"
od -t x1 <<< "$original"
echo "Therefore I converted $original into $replaced"

以下是文件和终端为 ISO-8859-1 时的输出：

The original variable's hex encoding is:
0000000 d8 72 73 74 65 64 0a
0000007
Therefore I converted Ørsted into xd8rsted

以下是文件和终端为 UTF-8 时的输出：

The original variable's hex encoding is:
0000000 c3 98 72 73 74 65 64 0a
0000010
Therefore I converted Ørsted into xc3x98rsted

在这两种情况下，它都按预期工作。

相关内容

最新更新

热门标签：