如何在spark-scala数据帧的枢轴列中具有值



我有值为的数据帧

+---+-----------------------+----------+---------+------------+---------+
|id |database_name          |users     |groups   |type        |isAllowed|
+---+-----------------------+----------+---------+------------+---------+
|73 |[ww_hr_dl_highsecure]  |[hive]    |[hrhs]   |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |select      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |update      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |create      |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |drop        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |alter       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |index       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |lock        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |all         |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |read        |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |write       |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |repladmin   |true     |
|73 |[ww_hr_dl_highsecure]  |[svchrdat]|[]       |serviceadmin|true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |select      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |update      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |create      |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |drop        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |alter       |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |all         |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |read        |true     |
|84 |[ww_core_dim_dl_tables]|[svc02001]|[]       |write       |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|select      |true     |
|84 |[ww_core_dim_dl_tables]|[]        |[walmart]|read        |true     |
+---+-----------------------+----------+---------+------------+---------+

我想根据类型列调整我的数据帧。所以我需要的结果数据帧就像

id db_name                 users    group select update create  drop  alter
73 ww_hr_dl_highsecure     hive     hrhs   true    null    null  null  null
73 ww_hr_dl_highsecure     svchrdat null   true    true    true  true  true
84 ww_core_dim_dl_tables   svc02001 true   true    true    true  true  true

我不知道如何将新列中的值作为原始数据帧中isAllowed列的值。

到目前为止我所做的是

val dfs3 = dfs2.groupBy("database_name","users").pivot("type").expr("isAllowed")

将组添加到groupBy中是可以接受的结果吗?

df.groupBy("database_name","users","groups").pivot("type").agg(first("isAllowed")).show(false)

输出:

+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|database_name          |users     |groups   |all |alter|create|drop|index|lock|read|repladmin|select|serviceadmin|update|write|
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+
|[ww_core_dim_dl_tables]|[svc02001]|[]       |true|true |true  |true|null |null|true|null     |true  |null        |true  |true |
|[ww_hr_dl_highsecure]  |[svchrdat]|[]       |true|true |true  |true|true |true|true|true     |true  |true        |true  |true |
|[ww_hr_dl_highsecure]  |[hive]    |[hrhe]   |null|null |null  |null|null |null|null|null     |true  |null        |null  |null |
|[ww_core_dim_dl_tables]|[]        |[walmart]|null|null |null  |null|null |null|true|null     |true  |null        |null  |null |
+-----------------------+----------+---------+----+-----+------+----+-----+----+----+---------+------+------------+------+-----+

最新更新