当您有 2 个字段并且应该拆分第二个字段时,如何在 scala 数据帧中使用 map-reduce



我有review_ratingremoved,它是[review_id,文本]的类型,我想做一个map-reduce函数,我给出(review_id,word(作为映射器输出。但是,我必须将文本拆分为单词,并输出所有这些带有相关review_id的单词

斯卡拉代码:

val reviews = spark.read.option("header","true").option("inferSchema","true").csv(review_path)
val review_ratingremoved = review_afterstep1.select("review_id","text")
val reviewmap = review_ratingremoved.map(_.map(_._2.split(" ")))
// not working showing this error:
//notebook:23: error: value map is not a member of org.apache.spark.sql.Row
//val reviewmap = review_ratingremoved.map(_.map(_._2.split(" ")))

请帮我弄清楚这一点。谢谢。

您可以执行以下操作:

review_ratingremoved .selectExpr("review_id", "split(text, ' ') as text_words")

最新更新