快速简单的方法，以C++为单位一次读取一个字节的 stdin

以下用于读取 stdin 并计算每个字节出现次数的朴素代码非常慢，在我的机器上处理 1 GiB 的数据大约需要 1m40。

int counts[256] {0};
uint8_t byte;
while (std::cin >> std::noskipws >> byte) {
  ++counts[byte];
}

当然，进行缓冲读取要快得多，在不到一秒的时间内处理 1 GiB。

uint8_t buf[4096];
uint8_t byte;
int n;
while (n = read(0, (void *)buf, 4096), n > 0) {
  for (int i = 0; i < n; ++i) {
    ++counts[buf[i]];
  }
}

但是，它的缺点是更复杂，需要手动缓冲区管理。

有没有办法在标准C++中逐字节读取流，它与第一个片段一样简单、明显和惯用，但与第二个片段一样高效？

这似乎是一个有趣的问题。我的结果在这里：

without cin sync      : 34.178s
with cin sync         : 14.347s
with getchar          : 03.911s
with getchar_unlocked : 00.700s

源文件是使用以下方法生成的：

$ dd if=/dev/urandom of=file.txt count=1024 bs=1048576

第一个是我的参考，没有变化：34.178s

#include <bits/stdc++.h>
int main(int argc, char **argv) {
    FILE *f = freopen(argv[1], "rb", stdin);
    int counts[256] {0};
    uint8_t byte;
    while (std::cin >> std::noskipws >> byte) {
      ++counts[byte];
    }
    return 0;
}

使用 std::ios::sync_with_stdio(false); ： 14.347s

#include <bits/stdc++.h>
int main(int argc, char **argv) {
    std::ios::sync_with_stdio(false);
    FILE *f = freopen(argv[1], "rb", stdin);
    int counts[256] {0};
    uint8_t byte;
    while (std::cin >> std::noskipws >> byte) {
      ++counts[byte];
    }
    return 0;
}

带getchar：3.911s

#include <bits/stdc++.h>
int main(int argc, char **argv) {
    FILE *f = freopen(argv[1], "rb", stdin);
    int v[256] {0};
    unsigned int b;
    while ((b = getchar()) != EOF) {
        ++v[b];
    }
    return 0;
}

带getchar_unlocked：0.700s

#include <bits/stdc++.h>
int main(int argc, char **argv) {
    FILE *f = freopen(argv[1], "rb", stdin);
    int v[256] {0};
    unsigned int b;
    while ((b = getchar_unlocked()) != EOF) {
        ++v[b];
    }
    return 0;
}

我的机器配置：

CPU  : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
MEM  : 12GB
Build: g++ speed.cc -O3 -o speed
g++ v: g++ (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
exec : time ./speed file.txt

对我来说，getchar_unlocked是在不维护缓冲区的情况下读取字节的最快方法。

我会试试这个：

std::ios::sync_with_stdio(false);

它将大大加快 cin 的速度。

相关内容

最新更新

热门标签：