检索使用错误编码编码的文本

>我有一个从Foxpro（基于Dos）程序导出的文本文件，但是此文本包含非英语字符（阿拉伯语[从右到左]），现在导出的字符串就像这个"¤"îگüَن"。

有没有办法将它们转换回原始值？

您应该使用正确的代码页读取数据。

public static string ReadFile(string path, int codepage)
{
    return Encoding.GetEncoding(codepage)
        .GetString(File.ReadAllBytes(path));
}

使用正确的代码页ID调用函数，对于MS-DOS阿拉伯语，它应该是"708"，对于完整列表，您可以在维基百科上开始。

string content = ReadFile(@"c:test.txt", 708);

带有查找表的解决方案，用于从不受支持的编码进行翻译（仅字符> 127 需要映射）：

public static string ReadFile(string path, byte[] translationTable, int codepage)
{
    byte[] content = File.ReadAllBytes(path);
    for (int i=0; i < content.Length; ++i)
    {
        byte value = content[i];
        if (value > 127)
            content[i] = translationTable[value - 128];
    }
    return Encoding.GetEncoding(codepage)
        .GetString(content);
}

转换表的示例：

索引原文 （IS） 翻译 （1256）...13       141              194...

相关内容

最新更新

热门标签：