我正在编写简单的站点地图.xml爬虫。代码如下。我的问题是为什么main
末尾的代码不打印任何内容。我怀疑是因为哈斯克尔的懒惰,但不知道如何处理这里:
import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy as L
import Text.XML.Light
import Control.Monad.Trans (liftIO)
import Control.Monad
import Data.String.Utils
import Control.Exception
download :: Manager -> Request -> IO (Either HttpException L.ByteString)
download manager req = do
try $
fmap responseBody (httpLbs req manager)
downloadUrl :: Manager -> String -> IO (Either HttpException L.ByteString)
downloadUrl manager url = do
request <- parseUrl url
download manager request
getPages :: Manager -> [String] -> IO [Either HttpException L.ByteString]
getPages manager urls =
sequence $ map (downloadUrl manager) urls
main = withManager $ manager -> do
-- I know simpleHttp is bad here
mapSource <- liftIO $ simpleHttp "http://example.com/sitemap.xml"
let elements = (parseXMLDoc mapSource) >>= Just . findElements (mapElement "loc")
Just urls = liftM (map $ (replace "/#!" "?_escaped_fragment_=") . strContent) elements
mapElement name = QName name (Just "http://www.sitemaps.org/schemas/sitemap/0.9") Nothing
return $
getPages manager urls >>= pages -> do
print "evaluate me!"
sequence $ map print pages
你遇到了我在这里描述的同样的问题,至少就错误代码而言,当它实际上应该给出类型错误时进行类型检查:为什么类型是"Main.main","IO()"而不是"IO a"?这就是为什么您应该始终明确地main
类型签名main :: IO ()
的原因。
要解决此问题,您需要将return
替换为lift
(请参阅 http://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Class.html#v:lift),并将sequence $ map ...
替换为mapM_
。 mapM_ f
相当于sequence_ . map f
。
将最后的return
替换为 runResourceT
(http://hackage.haskell.org/package/resourcet-1.1.1/docs/Control-Monad-Trans-Resource.html#v:runResourceT)。正如其类型所暗示的那样,它会将资源T转换为IO操作。