读取OCaml中的所有字符太慢

我是OCaml的初学者，我想从文件中读取行，然后检查每行中的所有字符。作为一个伪示例，假设我们想统计字符"a"在文件中的出现次数。

我尝试了以下

open Core.Std
let count_a acc string = 
    let rec count_help res stream =
        match Stream.peek stream with
        | None -> res
        | Some char -> Stream.junk stream; if char = 'A' then count_help (res+1) stream else count_help res stream
    in acc + count_help 0 (Stream.of_string string)
let count_a = In_channel.fold_lines stdin ~init:0 ~f:count_a
let () = print_string ((string_of_int count_a)^"n"

我用编译它

 ocamlfind ocamlc -linkpkg -thread -package core -o solution solution.ml

使用运行

$./solution < huge_file.txt

在一个有一百万行的文件上，它给了我以下的时间

real    0m16.337s
user    0m16.302s
sys 0m0.027s

这是我的python实现的4倍。我很确定应该有可能让这件事进展得更快，但我该怎么做呢？

要计算字符串中的A字符数，只需使用String.count函数即可。事实上，最简单的解决方案是：

open Core.Std
let () =
  In_channel.input_all stdin |>
  String.count ~f:(fun c -> c = 'A') |>
  printf "we have %d A'sn"

更新

使用[fold_lines]的稍微复杂一点（并且不太需要内存）的解决方案将如下所示：

let () =
  In_channel.fold_lines stdin ~init:0 ~f:(fun n s ->
    n + String.count ~f:(fun c -> c = 'A') s) |>
    printf "we have %d A'sn"

事实上，它比上一次慢。在我8岁的笔记本电脑上，计算20兆字节文本文件中的"A"需要7.3秒。在以前的解决方案上还有3秒。

此外，我希望你能发现这篇文章很有趣。

更新

相关内容

最新更新

热门标签：