如何使用Ruby解码IFC

在Ruby中，我正在读取一个.ifc文件以获取一些信息，但我无法解码它。例如，文件内容：

"'SX20E9X0jour/Cuisine'"

应该是：

"'Séjour/Cuisine'"

我正在尝试用以下方法对其进行编码：

puts ifcFileLine.encode("Windows-1252")
puts ifcFileLine.encode("ISO-8859-1")
puts ifcFileLine.encode("ISO-8859-5")
puts ifcFileLine.encode("iso-8859-1").force_encoding("utf-8")'

但没有什么能给我所需要的。

我对 IFC 一无所知，但仅根据 Denis 链接到的页面和您的示例输入，这有效：

ESCAPE_SEQUENCE_EXPR = /\X2\(.*?)\X0\/
def decode_ifc(str)
  str.gsub(ESCAPE_SEQUENCE_EXPR) do
    $1.gsub(/..../) { $&.to_i(16).chr(Encoding::UTF_8) }    
  end
end
str = 'SX20E9X0jour/Cuisine'
puts "Input:", str
puts "Output:", decode_ifc(str)

这些代码所做的只是将分隔符之间的每个四个字符(/..../(序列替换为相应的Unicode字符，每个分隔符都是十六进制的Unicode代码点。

请注意，此代码仅处理此特定编码。快速浏览一下实现指南，可以看到其他编码，包括基本多语言平面之外的 Unicode 字符的X4指令。不过，这应该让你开始。

在 eval.in 上看到它：https://eval.in/776980

如果有人感兴趣，我在这里写了一个 Python 代码，它解码了 3 种 IFC 编码：\X、\X2\ 和 \S\

    import re
    
    def decodeIfc(txt):
        # In regex "" is hard to manage in Python... I use this workaround
        txt = txt.replace('\', 'µµµ')
        txt = re.sub('µµµX2µµµ([0-9A-F]{4,})+µµµX0µµµ', decodeIfcX2, txt)
        txt = re.sub('µµµSµµµ(.)', decodeIfcS, txt)
        txt = re.sub('µµµXµµµ([0-9A-F]{2})', decodeIfcX, txt)
        txt = txt.replace('µµµ','\')
        return txt
    
    def decodeIfcX2(match):
        # X2 encodes characters with multiple of 4 hexadecimal numbers.
        return ''.join(list(map(lambda x : chr(int(x,16)), re.findall('([0-9A-F]{4})',match.group(1)))))
    
    def decodeIfcS(match):
        return chr(ord(match.group(1))+128)
    
    def decodeIfcX(match):
        # Sometimes, IFC files were made with old Mac... wich use MacRoman encoding.
        num = int(match.group(1), 16)
        if (num <= 127) | (num >= 160):
            return chr(num)
        else:
            return bytes.fromhex(match.group(1)).decode("macroman")

相关内容

最新更新

热门标签：