通过"代码内变量检查"调试scala中的筛选器操作



我是scala的新手,希望调试这段代码,看看为什么我没有得到任何结果。

>  def main(args:Array[String]){
>     Logger.getLogger("org").setLevel(Level.ERROR)
>     val sc = new SparkContext("local[*]","WordCountRe")
>     val input = sc.textFile("data/book.txt")
>     //With regexp
>     val words = input.flatMap(x=>x.split("\W+"))
>     //Lower case
>     val lowerCaseWords = words.map(x => x.toLowerCase())
>     val wordCounts = lowerCaseWords.map(x => (x,1)).reduceByKey((x,y)=>x+y)
>     val sortedWordCounts = wordCounts.sortBy(-_._2)
>     val commonEnglishStopWords = List("you","to","your","the","a","of","and","that","it","in","is","for","on","are","if","s","i","with","t","this","or","but","they","will","what","at","my","re","do","not","about","more","an","up","need","them","from","how","there","out","new","work","so","just","don","","get","their","by","some","ll","self","make","may","even","when","one","than","also","much","job","who","was","these","find","into","only")
>     val filteredWordCounts = sortedWordCounts.filter{
>       x =>
>         val inspectVariable = commonEnglishStopWords.contains(x._1)} //Error here
>     filteredWordCounts.collect().foreach(println)   } }

当我尝试使用此代码时,我会得到一个编译错误:

类型不匹配;找到:单位必需:布尔字计数Re.scala/SparkScalaCourse/src.com/sundogsoftware/spark-line29 Scala问题

这个线程"如何在rdd中查找数据"似乎有我试图应用的解决方案,只是我一定用错了。

感谢您的帮助

编辑:发现我的代码有什么问题(需要在contains中放入._1才能解析元组中的单词(单词,计数((,但我仍然不知道如何在这种情况下调试/检查值。

问题是您将方法contains的布尔结果分配给了valinspectVariable。此操作的返回类型为Unit。但filter方法需要布尔值。

只需移除val inspectVariable =,就可以修复它。

或者在指定值后,通过添加内容为inspectVariable的新行来返回值。

如图所示

val filteredWordCounts = sortedWordCounts.filter { x =>
val inspectVariable = commonEnglishStopWords.contains(x._1)//put your breakpoint here
inspectVariable
}

最新更新