我是scala的新手,希望调试这段代码,看看为什么我没有得到任何结果。
> def main(args:Array[String]){
> Logger.getLogger("org").setLevel(Level.ERROR)
> val sc = new SparkContext("local[*]","WordCountRe")
> val input = sc.textFile("data/book.txt")
> //With regexp
> val words = input.flatMap(x=>x.split("\W+"))
> //Lower case
> val lowerCaseWords = words.map(x => x.toLowerCase())
> val wordCounts = lowerCaseWords.map(x => (x,1)).reduceByKey((x,y)=>x+y)
> val sortedWordCounts = wordCounts.sortBy(-_._2)
> val commonEnglishStopWords = List("you","to","your","the","a","of","and","that","it","in","is","for","on","are","if","s","i","with","t","this","or","but","they","will","what","at","my","re","do","not","about","more","an","up","need","them","from","how","there","out","new","work","so","just","don","","get","their","by","some","ll","self","make","may","even","when","one","than","also","much","job","who","was","these","find","into","only")
> val filteredWordCounts = sortedWordCounts.filter{
> x =>
> val inspectVariable = commonEnglishStopWords.contains(x._1)} //Error here
> filteredWordCounts.collect().foreach(println) } }
当我尝试使用此代码时,我会得到一个编译错误:
类型不匹配;找到:单位必需:布尔字计数Re.scala/SparkScalaCourse/src.com/sundogsoftware/spark-line29 Scala问题
这个线程"如何在rdd中查找数据"似乎有我试图应用的解决方案,只是我一定用错了。
感谢您的帮助
编辑:发现我的代码有什么问题(需要在contains中放入._1
才能解析元组中的单词(单词,计数((,但我仍然不知道如何在这种情况下调试/检查值。
问题是您将方法contains
的布尔结果分配给了valinspectVariable
。此操作的返回类型为Unit。但filter
方法需要布尔值。
只需移除val inspectVariable =
,就可以修复它。
或者在指定值后,通过添加内容为inspectVariable
的新行来返回值。
如图所示
val filteredWordCounts = sortedWordCounts.filter { x =>
val inspectVariable = commonEnglishStopWords.contains(x._1)//put your breakpoint here
inspectVariable
}