美好的一天大学,我无法解决转换的问题。
数据集示例:
+--------------------------------------+
| Col1 Col2 Col3 Col2 Col3 |
+--------------------------------------+
| Value 1 Value 2 123 |
| Value 1 Value 2 124 |
| Value 1 Value 2 125 |
+--------------------------------------+
需要输出:
+--------------------------------------------------------+
|Col1 Col2 Col3 Col2 Col3 |
+--------------------------------------------------------+
| Value 1 Value 2 123 124 125 |
+--------------------------------------------------------+
我用Apache Toree在木星中做到了,看起来像:
val z = spark.read.parquet("/*/*.parquet")
val d = z.groupBy("Col1","Col2").agg(first(col("Col3"),true).as("Col3"),first(col("Col4"),true).as("Col4"),first(col("Col5"),true).as("Col5")))
如何使用Java Spark API?
找到一种方法,用java
private Dataset<RCR> getRCR() {
Dataset<RCR> read = respCookieRelReader.read(false, inputPath);
read
.groupBy("col1", "col2", "col3")
.agg(functions.first(new Column("col4")).as("col4"),
functions.first(new Column("col5")).as("col5"),
functions.first(new Column("col6")).as("col6"),
functions.first(new Column("col7")).as("col7"));
return read;
}