如何在U-SQL中提取大型JSON阵列并包括元数据



我正在尝试从JSON文档中提取大量数据。每个JSON文档大约有1500个节点。当我尝试加载身体节点时,我会得到128KB限制错误。我找到了一种加载节点的方法,但是我必须一直走到数组列表。jsonextractor(" body.nprobe.items [*]");我遇到的问题是,我无法访问JSON文档的任何其他部分,我需要获取元数据,例如:ID,SerialNumber等。是否应该以某种方式更改JSON文件?我需要的数据是3级。JSON已被混淆和缩短,实际文件约为33K格式的JSON行约1500个项目。

 {
    "headers": {
        "pin": "12345",
        "Type": "URJ201W-GNZGK",
        "RTicks": "3345",
        "SD": "211",
        "Jov": "juju",
        "Market": "Dal",
        "Drst": "derre",
        "Model": "qw22",
        "DNum": "de34",
        "API": "34f",
        "Id": "821402444150002501"
    },
    "Id": "db5aacae3778",
    "ModelType": "URJ",
    "body": {
        "uHeader": {
            "ID": "821402444150002501",
            "SerialNo": "ee028861",
            "ServerName": "BRXTTY123"
        },
        "header": {
            "form": 4,
            "app": 0,
            "Flg": 1,
            "Version": 11056,
            "uploadID": 1,
            "uDate": "2016-04-14T18:29"
        },
        "nprobe": {
            "items": [{
                "purchaseDate": "2016-04-14T18:21:09",
                "storeLoc": {
                    "latitude": 135.052335,
                    "longitude": 77.167005
                },
                "sr": {
                    "ticks": 3822,
                    "SkuId": "24",
                    "Data": {
                        "X": "0.00068",
                        "Y": "0.07246",
                    }
                }
            },
            {
                "purchaseDate": "2016-04-14T18:21:09",
                "storeLoc": {
                    "latitude": 135.052335,
                    "longitude": 77.167005
                },
                "sr": {
                    "ticks": 3823,
                    "SkuId": "25",
                    "Data": {
                        "X": "0",
                        "Y": "2",
                    }
                }
            }]
        }
    },
    "Messages": []
}

谢谢。

您必须使用Cross Apply:https://msdn.microsoft.com/en-us/library/azure/mt621307.aspx并爆炸:https://msdn.microsoft.com/en-us/library/azure/mt621306.aspx

在此处查看解决方案:

https://github.com/algattik/usqlhackathon/blob/master/vs-solution/usqlapplication/ciam-to-sqlapllication/ciam-to-sqldw.usqlhttps://github.com/algattik/usqlhackathon/blob/master/samples/customer/customers 2016-08-10.json

- 改进答案: -

由于此解决方案对您不起作用,因为您的内部JSON太大而无法安装在字符串中,因此您可以两次解析输入:

DECLARE @input string = @"/so.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@meta = 
EXTRACT Id string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@items = 
EXTRACT purchaseDate string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("body.nprobe.items[*]");
@itemsFull =
    SELECT Id,
           purchaseDate
    FROM @meta
         CROSS JOIN @items;
OUTPUT @itemsFull
TO "/items_full.csv"
USING Outputters.Csv();

相关内容

  • 没有找到相关文章

最新更新