查找字符串中的短语以及每个短语的频率



我正在使用f#编写脚本,该脚本可以在给定的字符串或文本中查找短语,以及每个短语的频率。

该短语将 2 个或更多单词。

我知道如何在其他语言中做到这一点,但我对 F Sharp 中的匿名函数很感兴趣,目前我正在学习和发现。

这是一个非常复杂和有用的想法,因为短语包含两个或多个单词。

到目前为止,我拥有的:

  let containsPhrase (phrase:string) (text:string) =   
     let rec contains index =
         if index <= text.Length - phrase.Length then compare index
         else false
     and compare index =        
         if String.Compare(text, index, phrase, 0, phrase.Length) <> 0
         then nextWord index
         else true
     and nextWord index =
         let index = text.IndexOf(' ', index)
         if index >= 0 then      
            contains (index+1)
         else 
         false             
     contains 0
 let Phrases = ["Good morning";"Take care";"black Friday"] 
 for phrase in Phrases do
    printfn "[%A] was found %b" phrase (containsPhrase (phrase.ToLower()) text)

对于问题的第一部分,我可以找到解决方案,但是在多次尝试计算字符串中使用了多少每个短语后,我感到迷茫。

上面的代码可以检查任何给定的短语是否在 A 字符串中。

谁能帮我为每个短语的频率添加一个计数器?

像这样的东西?

let text = """
Good morning Take care black Friday
Good morning Take care black Friday
Good morning Take care black Friday
Good morning Take care black Friday
Good morning Take care black Friday
"""
let phrases = ["Good morning";"Take care";"black Friday"] 
let occurrences (phrase: string) =
  let rec loop (index: int) count =
    match text.IndexOf(phrase, index) with
    | -1 -> count
    | n -> loop (n + phrase.Length) (count + 1)
  loop 0 0
phrases |> List.map (fun s -> s, occurrences s)
> val it : (string * int) list =
  [("Good morning", 5); ("Take care", 5); ("black Friday", 5)]

最新更新