在haskell中用字符串组合多个列表



对于一项任务,我试图将4个刮取的数据列表组合成1个。所有4个都已正确订购,如下所示。

["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof",""Ik heb zin in wat nog komen gaat"","Oorlog in Oekraïne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]

我想要的输出会像这个

[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]

我尝试了一些在互联网上找到的解决方案,但我不理解其中的一些,其中大多数都是关于2个列表,或者在我尝试实现它们时出现错误。

为了获得更多参考,我的代码如下:

urlString :: String
urlString = "https://www.example.com"
--Main function in which we call the other functions
main :: IO()
main = do
resultTitle <- scrapeURL urlString scrapeHANTitle
resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
resultDate <- scrapeURL urlString scrapeHANDate
resultAuthor <- scrapeURL urlString scrapeHANAuthor
print resultTitle
print resultSubtitle
print resultDate
print resultAuthor
scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeTitle
scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeSubTitle
scrapeHANDate :: Scraper String [String]
scrapeHANDate = 
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeDate
scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeAuthor
-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
text $ "a" @: [hasClass "card-news__body__title"]
-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
text $ "span" @: [hasClass "card-news__body__eyebrow"]
--gets the date on which the news item was posted
scrapeDate :: Scraper String String 
scrapeDate = do
text $ "div" @: [hasClass "card-news__footer__body__date"]
--gets the author of the news item
scrapeAuthor :: Scraper String String 
scrapeAuthor = do
text $ "div" @: [hasClass "card-news__footer__body__author"]

我也尝试了下面的内容,但它给了我一堆类型错误。

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = s1 -> s2 -> s3 -> s4 ->s1 ++ s2 ++ s3 ++ s4

您可以使用Monoid实例并使用:

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4

然而,在这里您正在抓取相同的页面,因此您可以将来自抓取器的数据与组合

myScraper :: Scraper String [String]
myScraper = do
da <- scrapeHANTitle
db <- scrapeHANSubtitle
dc <- scrapeHANDate
dd <- scrapeHANAuthor
return da ++ db ++ dc ++ dd

然后用运行

main :: IO()
main = do
result <- scrapeURL urlString myScraper
print result

或更短:

main :: IO()
main = scrapeURL urlString myScraper >>= print

您可以使用zip4Data.List组合四个列表。

import Data.List
list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof",""Ik heb zin in wat nog komen gaat"","Oorlog in Oekraïne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]
result = zip4 list1 list2 list3 list4
result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]

这两个结果略有不同。结果result创建元组列表。结果result2根据请求创建列表列表。元组列表可能更好,因为:

  • 列表可以包含任意数量的值,都是相同类型的(Haskell列表是同质的(
  • 元组可以包含任何类型,因此具有更大的灵活性
  • 具有两个值的元组与具有三个值的tuple是不同的类型,因此,如果您希望使用tuple收集四个值,则会停止用户挤压三个值或五个值的集合

最新更新