我有两个spark数据帧,我想比较它们的数据类型,但我想并排比较它们,我想把它们放在一个数据帧中,包含这两个数据帧的模式:
df1.printSchema()
返回
root
|-- orgID: string (nullable = true)
|-- deptID: string (nullable = true)
|-- systemID: string (nullable = true)
|-- eventId: string (nullable = true)
|-- eventType: string (nullable = true)
|-- autoID: string (nullable = true)
|-- personID: string (nullable = true)
|-- employeeFirst: string (nullable = true)
|-- employeeMiddle: string (nullable = true)
|-- employeeLast: string (nullable = true)
|-- employeeDOB: string (nullable = true)
df2.printSchema()
返回
root
|-- orgID: integer (nullable = true)
|-- deptID: string (nullable = true)
|-- systemID: string (nullable = true)
|-- eventId: integer (nullable = true)
|-- eventType: string (nullable = true)
|-- autoID: string (nullable = true)
|-- personID: integer (nullable = true)
|-- employeeFirst: string (nullable = true)
|-- employeeMiddle: string (nullable = true)
|-- employeeLast: string (nullable = true)
|-- employeeDOB: timestamp (nullable = false)
我想将两者的数据帧放在一起,以创建另一列来比较df1type和df2type['True','False']
+-------------+----------+---------+
| column| df1type| df2type|
+-------------+----------+---------+
| orgID| string| integer|
| deptID| string| string|
...
| employeeDOB| string|timestamp|
+-------------+----------+---------+
到目前为止,我可以从以下内容中看出:
df1.schema == df2.schema
两个数据帧不相等。以上将返回False
我试着将每个printSchema转换成一个表,然后进行合并,但我认为放入printSchema((结果很有挑战性。
我需要找出哪些常见列具有不同的structTypes。有别的办法吗?
我不知道是不是这样,但试试这个。
df2.except(df1)