jq -将嵌套对象提取到CSV中



我想将以下内容(经过简化的elasticsearch结果)扁平化为具有重复父值的CSV文件。

聚合有三个级别,我想要扁平化:站点名称、帐户id、用户名和计数。如所示,重命名CSV头会很好。

我尝试使用map_values(),但嵌套结构给了我一个困难的时间。

输入文件内容

{
  "aggregations": {
    "by_site_group_and_account_and_user_name": {
      "doc_count_error_upper_bound": 4880,
      "sum_other_doc_count": 760270,
      "buckets": [
        {
          "key": "S1",
          "doc_count": 335216,
          "by_account_Id": {
            "doc_count_error_upper_bound": 429,
            "sum_other_doc_count": 34655,
            "buckets": [
              {
                "key": 1396220,
                "doc_count": 277532,
                "by_user_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 2580,
                  "buckets": [
                    {
                      "key": "S1_U1",
                      "doc_count": 244602
                    },
                    {
                      "key": "S1_U2",
                      "doc_count": 30350
                    }
                  ]}},
              {
                "key": 14770032,
                "doc_count": 23029,
                "by_user_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "S1_U3",
                      "doc_count": 23016
                    }
                  ]}}]}},
        {
          "key": "S2",
          "doc_count": 146453,
          "by_account_Id": {
            "doc_count_error_upper_bound": 2380,
            "sum_other_doc_count": 108259,
            "buckets": [
              {
                "key": 14733289,
                "doc_count": 20138,
                "by_user_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 2406,
                  "buckets": [
                    {
                      "key": "S2_U1",
                      "doc_count": 13247
                    },
                    {
                      "key": "S2_U2",
                      "doc_count": 4485
                    }
                  ]}},
              {
                "key": 14769976,
                "doc_count": 18056,
                "by_user_name": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "S2_U3",
                      "doc_count": 17015
                    },
                    {
                      "key": "S2_U4",
                      "doc_count": 1041
                    }
                  ]}}]}}]}}}

期望输出值

"Site", "Account", "User", "Counted"
"S1", 1396220, "S1_U1", 244602 
"S1", 1396220, "S1_U2", 30350
"S1", 14770032, "S1_U3", 23016
"S2", 14733289, "S2_U1", 13247
"S2", 14733289, "S2_U2", 4485
"S2", 14769976, "S2_U3", 17015
"S2", 14769976, "S2_U4", 1041

事先感谢您的帮助。

这是一个非常简单的组合,获取一些信息,然后循环遍历后代。

(
   [ "Site", "Account", "User", "Counted" ],
   (
      .aggregations.by_site_group_and_account_and_user_name.buckets[] |
      .key as $site |
      .by_account_Id.buckets[] |
      .key as $account |
      .by_user_name.buckets[] |
      [ $site, $account, .key, .doc_count ]
   )
) | @csv

jqplay的演示

最新更新