在Scala中将来自不同数据帧的行合并在一起

例如，首先我有一个类似的数据帧

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|
+----+-----+-----+--------------------+-----+

我们有2012年、1997年和2015年。我们还有另一个类似的数据帧

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|BMW  |    3|          No comment|     |
|1997|VW   | GTI |   get              |     |
|2015|MB   | C200|                good| null|
+----+-----+-----+--------------------+-----+

我们还有2012年、1997年和2015年。我们如何将同一年的行合并在一起？感谢

输出应该像这个

+----+-----+-----+--------------------+-----++-----+-----+--------------------------+
|year| make|model|             comment|blank|| make|model|             comment|blank|
+----+-----+-----+--------------------+-----++-----+-----+-----+--------------------+
|2012|Tesla|    S|          No comment|     |BMW   | 3   |          no comment|
|1997| Ford| E350|Go get one now th...|     |VW    |GTI  |      get           |
|2015|Chevy| Volt|                null| null|MB    |C200 |             Good   |null
+----+-----+-----+--------------------+-----++----+-----+-----+---------------------+

您可以通过简单的join获得所需的表。类似于：

val joined = df1.join(df2, df1("year") === df2("year"))

我加载了您的输入，以便看到以下内容：

scala> df1.show
...
year make  model comment
2012 Tesla S     No comment
1997 Ford  E350  Go get one now
2015 Chevy Volt  null
scala> df2.show
...
year make model comment
2012 BMW  3     No comment
1997 VW   GTI   get
2015 MB   C200  good

当我运行join时，我得到：

scala> val joined = df1.join(df2, df1("year") === df2("year"))
joined: org.apache.spark.sql.DataFrame = [year: string, make: string, model: string, comment: string, year: string, make: string, model: string, comment: string]
scala> joined.show
...
year make  model comment        year make model comment
2012 Tesla S     No comment     2012 BMW  3     No comment
2015 Chevy Volt  null           2015 MB   C200  good
1997 Ford  E350  Go get one now 1997 VW   GTI   get

需要注意的一点是，您的列名可能不明确，因为它们在数据帧中的名称相同（因此您可以更改它们的名称，使对生成的数据帧的操作更容易编写）。

相关内容

最新更新

热门标签：