小贝子编程

GroupBy and Concatenate rows of DataFrame for Apache Spark i

本文关键字：for Apache Spark DataFrame of and Concatenate rows GroupBy java apache-spark apache-spark-sql
更新时间 : 2023-09-06
英文 : GroupBy and Concatenate rows of DataFrame for Apache Spark in Java

我有一个具有此架构的数据帧：

id      user        keywords
1       u1, u2      key1, key2  
1       u3, u4      key3, key4
1       u5, u6      key5, key6
2       u7, u8      key7, key8
2       u9, u10     key9, key10
3       u11, u12    key11, key12
3       u13, u14    key13, key14

我需要一种方法来按 id 对行进行分组，并连接用户和关键字列中的字符串，使其看起来像这样：

id      user                            keywords
1       u1, u2, u3, u4, u5, u6          key1, key2, key3, key4, key5, key6
2       u7, u8, u9, u10                 key7, key8, key9, key10
3       u11, u12, u13, u14              key11, key12, key13, key14

如何在 Java 中做到这一点？

执行以下操作：

使用（用户、（作者、关键字）创建 RDD
此 RDD 上的 groupByKey
到一些关于作者和关键字的平面地图

GroupBy and Concatenate rows of DataFrame for Apache Spark i

相关内容

最新更新

热门标签：