c-如何将无符号字节转换为整数

我正在尝试读取一个包含无符号字节的文件，并尝试将其读取为[0255]的整数范围。

当我看到扩展的ascii表时，当我读到"┌"，它等于218，但我的程序取195或226，我不知道为什么

这个问题也发生在许多扩展部分(超过128个(的字符上。

为什么我不能作为ASCII等价物阅读，以及如何解决此问题？谢谢回复。。

这是我的代码，


int main()
{
unsigned int temp = 0;
int bytesread;
int fd = open("inputs.txt", O_RDONLY);
if(fd == -1)
{
printf("An error occured.. n");
exit(-1);
}
else
{
bytesread = read(fd, &temp, 1);
}
printf("%d", temp);
return 0;
}

如果您看到大量195，那么输入可能是UTF-8字符编码。

ASCII只能达到127，没有单一的标准"扩展ASCII"。有ISO-8859-1，但没有┌。也许你可以参考CP 437。

从这里开始，你的前进道路将分为两种广泛的方法之一：

使用适用于您的操作系统的工具或其他方式，将文件从UTF-8转换为其他编码，如CP437
在C程序中读取UTF-8；您可以从头开始，也可以使用预先存在的库

字符可能使用UTF-8编码存储在文件中。

例如，字符┌具有Unicode十六进制代码点250c，UTF-8字节序列为e2 94 8c。e2等于你的十进制226，这表明你的字符实际上可能在附近的Unicode块中，并且是UTF-8编码的。

正如在评论中所建议的那样，如果您提供文件的六进制转储，这将非常有帮助，例如：

hexdump -C inputs.txt

此代码为

bytesread = read(fd, &temp, 1);

将一个字节读取到unsigned int的第一个字节中，该字节几乎肯定大于单个字节。因此，您读取的数据最终在int值中的位置取决于您的系统。

如果您要读取单个字节，通常只使用[unsigned] char会容易得多，这样您就可以始终知道它的最终位置。要将unsigned char转换为int，您只需分配它：

int main()
{
int fd = open("inputs.txt", O_RDONLY);
if(fd == -1)
{
// perror() will tell you **WHAT** error occurred
perror( "open()" );
exit(-1);
}
// this is now an unsigned char
unsigned char temp;
// read() returns ssize_t, not int
ssize_t bytesread = read( fd, &temp, sizeof( temp ) );
if ( bytesread != sizeof( temp ) )
{
perror( "read()" );
close( fd );
exit( -1 );
}
close( fd );
// there are a lot of ways to do this
printf( "unsigned int value: %un", ( unsigned int ) temp );
// this is another way - it prints the hex value
printf( "hex value: %hhxn", temp );
// this prints the char value:
printf( "char value: '%c'n", temp;
// this converts that unsigned char into an int:
int intvalue = temp;
// yes, it's that simple.
printf( "int value: %dn", intvalue  );
return 0;
}

注意，如果sizeof( int ) == sizeof( unsigned char )。在这种情况下，可能存在不能表示为int值的unsigned char值。

相关内容

最新更新

热门标签：