Haskell速度/内存使用率



我正在尝试用Haskell处理一些点云数据,它似乎占用了大量内存。我使用的代码如下,它基本上将数据解析为我可以使用的格式。该数据集具有440MB和10M行。当我用runhaskell运行它时,它会在短时间内(~3-4gb)用完所有内存,然后崩溃。如果我用-O2编译并运行它,它会消耗100%的cpu,并且需要很长时间(大约3分钟)才能完成。我应该提到的是,我使用的是一个带有4GB ram和SSD的i7 cpu,所以应该有足够的资源。我该如何提高它的性能?

{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding (lines, readFile)
import Data.Text.Lazy (Text, splitOn, unpack, lines)
import Data.Text.Lazy.IO (readFile)
import Data.Maybe (fromJust)
import Text.Read (readMaybe)
filename :: FilePath
filename = "sample.txt"
readTextMaybe = readMaybe . unpack
data Classification = Classification
    { id :: Int, description :: Text
    } deriving (Show)
data Point = Point
    { x :: Int, y :: Int, z :: Int, classification :: Classification
    } deriving (Show)
type PointCloud = [Point]
maybeReadPoint :: Text -> Maybe Point
maybeReadPoint text = parse $ splitOn "," text
    where toMaybePoint :: Maybe Int -> Maybe Int -> Maybe Int -> Maybe Int -> Text -> Maybe Point
          toMaybePoint (Just x) (Just y) (Just z) (Just cid) cdesc = Just (Point x y z (Classification cid cdesc))
          toMaybePoint _ _ _ _ _                                   = Nothing
          parse :: [Text] -> Maybe Point
          parse [x, y, z, cid, cdesc] = toMaybePoint (readTextMaybe x) (readTextMaybe y) (readTextMaybe z) (readTextMaybe cid) cdesc
          parse _                     = Nothing
readPointCloud :: Text -> PointCloud
readPointCloud = map (fromJust . maybeReadPoint) . lines
main = (readFile filename) >>= (putStrLn . show . sum . map x . readPointCloud)

在没有优化的情况下编译时,这会使用所有内存的原因很可能是因为sum是使用foldl定义的。如果没有优化带来的严格性分析,这将非常糟糕。您可以尝试使用此功能:

sum' :: Num n => [n] -> n
sum' = foldl' (+) 0

使用优化编译时速度较慢的原因似乎与解析输入的方式有关。当读取输入时,将为每个字符分配一个cons,当将输入拆分为行时,可能还会在逗号上拆分。使用适当的解析库(其中任何一个)几乎肯定会有所帮助;使用像CCD_ 5或CCD_。

另一个与性能无关的问题是:fromJust的总体形式相当糟糕,在处理用户输入时是一个非常糟糕的主意。您应该在Maybe monad中的列表上使用mapM,这将为您生成一个Maybe [Point]

最新更新