如何在struct中使用列值更改df列名称


df.withColumn("storeInfo", struct($"store", struct($"inhand", $"storeQuantity")))
.groupBy("sku").agg(collect_list("storeInfo").as("info"))
.show(false)
+---+---------------------------------------------------+
|sku|info                                               |
+---+---------------------------------------------------+
|1  |[{2222, {3, 34}}, {3333, {5, 45}}]                 |
|2  |[{4444, {5, 56}}, {5555, {6, 67}}, {6666, {7, 67}}]|
+---+---------------------------------------------------+

当我发送它到couchbase

{
"SKU": "1",
"info": [
{
"col2": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"col2": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}

是否可以将带有值的col2重命名为store的值?我想让它看起来像下面这样。因此,每个结构体的键都是存储值的值。

{
"SKU": "1",
"info": [
{
"2222": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"3333": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}

简单地说,我们不能按照您的要求构造列。两个限制:

  1. struct类型的字段名必须是固定的,我们可以将'col2'更改为其他名称(例如;'fixedFieldName'在演示1),但它不能是动态的(类似于Java类字段名)
  2. map类型的key可以是动态的,但map类型的value必须是相同类型的,参见演示2中的例外。

也许你应该改变模式,看看演示1,3的输出

演示1

df.withColumn(
"storeInfo", struct($"store", struct($"inhand", $"storeQuantity").as("fixedFieldName"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output:
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|value                                                                                                                                                                                              |
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"store":2222,"fixedFieldName":{"inhand":3,"storeQuantity":34}},{"store":3333,"fixedFieldName":{"inhand":5,"storeQuantity":45}}]}                                                           |
//|{"sku":2,"info":[{"store":4444,"fixedFieldName":{"inhand":5,"storeQuantity":56}},{"store":5555,"fixedFieldName":{"inhand":6,"storeQuantity":67}},{"store":6666,"fixedFieldName":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

演示2

df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"), lit("Store"), $"store")).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output exception:
// The given values of function map should all be the same type, but they are [struct<inhand:int,storeQuantity:int>, int]
演示

3

df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|value                                                                                                                                        |
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"2222":{"inhand":3,"storeQuantity":34}},{"3333":{"inhand":5,"storeQuantity":45}}]}                                         |
//|{"sku":2,"info":[{"4444":{"inhand":5,"storeQuantity":56}},{"5555":{"inhand":6,"storeQuantity":67}},{"6666":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------+

最新更新