小贝子编程

spark scala reducekey dataframe operation

本文关键字：operation dataframe reducekey scala spark scala apache-spark dataframe apache-spark-sql word-count
更新时间 : 2023-09-09
英文 : spark scala reducekey dataframe operation

我正在尝试使用数据帧在scala中进行计数。我的数据有 3 列，我已经加载了数据并按选项卡拆分。所以我想做这样的事情：

val file = file.map(line=>line.split("t"))
val x = file1.map(line=>(line(0), line(2).toInt)).reduceByKey(_+_,1)

我想将数据放在数据帧中，并且在语法上遇到了一些问题

val file = file.map(line=>line.split("t")).toDF
val file.groupby(line(0))
        .count()

有人可以帮助检查这是否正确吗？

Spark需要知道df
的模式有很多方法可以指定架构，这里有一个选项：

val df = file
   .map(line=>line.split("t"))
   .map(l => (l(0), l(1).toInt)) //at this point spark knows the number of columns and their types
   .toDF("a", "b") //give the columns names for ease of use
df
 .groupby('a)
 .count()

spark scala reducekey dataframe operation

相关内容

最新更新

热门标签：