创建structtype/printschema的数据帧两个pyspark数据帧的结果

我有两个spark数据帧，我想比较它们的数据类型，但我想并排比较它们，我想把它们放在一个数据帧中，包含这两个数据帧的模式：

df1.printSchema()

root
|-- orgID: string (nullable = true)
|-- deptID: string (nullable = true)
|-- systemID: string (nullable = true)
|-- eventId: string (nullable = true)
|-- eventType: string (nullable = true)
|-- autoID: string (nullable = true)
|-- personID: string (nullable = true)
|-- employeeFirst: string (nullable = true)
|-- employeeMiddle: string (nullable = true)
|-- employeeLast: string (nullable = true)
|-- employeeDOB: string (nullable = true)

df2.printSchema()

root
|-- orgID: integer (nullable = true)
|-- deptID: string (nullable = true)
|-- systemID: string (nullable = true)
|-- eventId: integer (nullable = true)
|-- eventType: string (nullable = true)
|-- autoID: string (nullable = true)
|-- personID: integer (nullable = true)
|-- employeeFirst: string (nullable = true)
|-- employeeMiddle: string (nullable = true)
|-- employeeLast: string (nullable = true)
|-- employeeDOB: timestamp (nullable = false)

我想将两者的数据帧放在一起，以创建另一列来比较df1type和df2type['True'，'False']

+-------------+----------+---------+
|       column|   df1type|  df2type|
+-------------+----------+---------+
|        orgID|    string|  integer|
|       deptID|    string|   string|
...
|  employeeDOB|    string|timestamp|
+-------------+----------+---------+

到目前为止，我可以从以下内容中看出：

df1.schema == df2.schema

两个数据帧不相等。以上将返回False

我试着将每个printSchema转换成一个表，然后进行合并，但我认为放入printSchema((结果很有挑战性。

我需要找出哪些常见列具有不同的structTypes。有别的办法吗？

我不知道是不是这样，但试试这个。

df2.except(df1)

相关内容

最新更新

热门标签：