我有review_ratingremoved,它是[review_id,文本]的类型,我想做一个map-reduce函数,我给出(review_id,word(作为映射器输出。但是,我必须将文本拆分为单词,并输出所有这些带有相关review_id的单词
斯卡拉代码:
val reviews = spark.read.option("header","true").option("inferSchema","true").csv(review_path)
val review_ratingremoved = review_afterstep1.select("review_id","text")
val reviewmap = review_ratingremoved.map(_.map(_._2.split(" ")))
// not working showing this error:
//notebook:23: error: value map is not a member of org.apache.spark.sql.Row
//val reviewmap = review_ratingremoved.map(_.map(_._2.split(" ")))
请帮我弄清楚这一点。谢谢。
您可以执行以下操作:
review_ratingremoved .selectExpr("review_id", "split(text, ' ') as text_words")