将动态 JSON 数据高效反序列化为数据表



我们正在开发一个程序,从一组没有一致架构设置的服务器中获取幻灯片图像数据(我担心它是无效的,但我不够熟练,无法进行调用)。 作为独立的无关研究人员,我们对服务器没有影响力。

数据是通过大量表格(n>50)手动输入的(大部分),字段不一致(数据可以追溯到90年代)。 下面是一个响应示例:

{
"form12873": [
{
"id": "9202075838",
"timestamp": "2015-06-25 10:24:51",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3",
"remote_addr": "[Re.dact.ed]",
"processed": "1",
"data": {
"33885124": {
"field": "33885124",
"value": "CDat Lab",
"flat_value": "CDat Lab",
"label": "Completed by:",
"type": "select"
},
''**Several more fields as above**''...
"33884660": {
"field": "33884660",
"value": {
"slideX": "2456123",
"slideY": "456632",
"label": "K-20150322148",
"approved": "1",
"score": "30144"
},
"flat_value": "slideX = 2456123nslideY = 456632nlabel = K-20150322148napproved = 1nscore = 30144",
"label": "Slide Stats:",
"type": "slidestats"
},
''**Some of the fields are as above...
"31970564": {
"field": "31970564",
"value": [
"System",
"Crated",
"Mirax",
"NanoZoomer",
"ThinPrep",
"Aperio",
"Intellisite"
],
"flat_value": "SystemnCratednMiraxnNanoZoomernThinPrepnAperionIntellisite",
"label": "System Information",
"type": "checkbox"
},
''**Some of the values are Arrays...
"33883781": {
"field": "33883781",
"selection": "Retain",
"label": "4. Retain/Remove/Review",
"type": "selectdrop"
},
''**Some of the fields don't have the same children
"52792890": {
"field": "52792890",
"image": "'A really large byte[], removed for ease of reading'",
"type": "image"
}
''**Somewhere near the end of each response is the actual image...
}
},
{
"id": "33884681",
''**Then it continues on as above until the end:
}
], "total": 170, "pages": 5, "pretty_id": "478125624983" }

过去,当我能够为 JSON 的结构建模/类时,我已经知道如何处理它(创建一个定义了字段、值等的数据类)。

尝试以下解决方案:

var result = JsonConvert.DeserializeObject<List<Dictionary<string, 
Dictionary<string, string>>>>(content);

总是导致数组错误或强制转换问题(即使添加了直接转换)。 我能够使用以下方法获取实际的第一个数组:

Public Shared Function Tabulate(json As String) As DataTable
Dim jsonLinq = Newtonsoft.Json.Linq.JObject.Parse(json)
' Find the first array using Linq
Dim srcArray = jsonLinq.Descendants().Where(Function(d) TypeOf d Is JArray).First()
Dim trgArray = New Newtonsoft.Json.Linq.JArray()
For Each row As JObject In srcArray.Children(Of JObject)()
Dim cleanRow = New JObject()
For Each column As JProperty In row.Properties()
' Only include JValue types
If TypeOf column.Value Is JValue Then
cleanRow.Add(column.Name, column.Value)
End If
Next
trgArray.Add(cleanRow)
Next

Return JsonConvert.DeserializeObject(Of DataTable)(trgArray.ToString())
End Function

我的最终目标也是获得一个数据表,循环/图像字节让我担心试图回归到更多的孩子。 然后我尝试使用第一个数组进行反序列化,然后出现了 nill。

如果有快速处理此问题的方法,我会喜欢解决方案。 如果问题是我正在尝试处理垃圾 JSON,我希望参考当前标准被破坏的地方(所以我至少可以尝试让其他机构更改他们的服务器)。 也就是说,我可能无论如何都要处理它,即使它是循环。

*注意:该项目于 VB.net 年启动,因此我们一直保持这种状态,但我可能还是决定移植到 C#。 任何一个代码都很棒。

下面是一个应该可用于测试的 Json 的未标记示例。 我的最终目标是将其扁平化为数据表:

{
"form12873": [
{
"id": "9202075838",
"timestamp": "2015-06-25 10:24:51",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.6.3 (KHTML, like Gecko) Version/8.0.6 Safari/600.6.3",
"remote_addr": "[Re.dact.ed]",
"processed": "1",
"data": {
"33885124": {
"field": "33885124",
"value": "CDat Lab",
"flat_value": "CDat Lab",
"label": "Completed by:",
"type": "select"
},
"33884660": {
"field": "33884660",
"value": {
"slideX": "2456123",
"slideY": "456632",
"label": "K-20150322148",
"approved": "1",
"score": "30144"
},
"flat_value": "slideX = 2456123nslideY = 456632nlabel = K-20150322148napproved = 1nscore = 30144",
"label": "Slide Stats:",
"type": "slidestats"
},
"31970564": {
"field": "31970564",
"value": [
"System",
"Crated",
"Mirax",
"NanoZoomer",
"ThinPrep",
"Aperio",
"Intellisite"
],
"flat_value": "SystemnCratednMiraxnNanoZoomernThinPrepnAperionIntellisite",
"label": "System Information",
"type": "checkbox"
},

"33883781": {
"field": "33883781",
"selection": "Retain",
"label": "4. Retain/Remove/Review",
"type": "select"
}
}
}
], "total": 170, "pages": 5, "pretty_id": "478125624983" }

下面的丑陋装置能够(大致)做你想做的事。将 json 源字符串作为参数提供给DeserializeToDataTable并收集结果数据表。它适用于您的样品。我不能保证它会在您的其余数据中工作。这里的目的是提供一个工作入门套件,您可以研究、理解、调试和适应以满足您的需求。

Private Function DeserializeToDataTable(ByVal jsource As String)
Dim JRootObject = JObject.Parse(jsource)
Dim Children = JRootObject.SelectTokens("$..data.*").ToArray
Dim Records = Children.OfType(Of JObject).ToArray
Dim dicList As New List(Of Dictionary(Of String, Object))
For Each rec In Records
dicList.Add(DeserializeToDictionary(rec))
Next
Dim fieldnames = dicList.SelectMany(Function(d) d.Keys).Distinct.ToArray
Dim dt As New DataTable
For Each fieldname In fieldnames
dt.Columns.Add(fieldname, GetType(Object))
Next
Dim row As DataRow
For Each dic In dicList
row = dt.NewRow
For Each kvp In dic
row.SetField(kvp.Key, kvp.Value)
Next
dt.Rows.Add(row)
Next
Return dt
End Function
Private Function DeserializeToDictionary(ByVal json_object As JObject) As Dictionary(Of String, Object)
Dim dic = New Dictionary(Of String, Object)
For Each field In json_object.Properties
Select Case field.Value.Type
Case JTokenType.Array
Dim subobject = New JObject
Dim item = 0
For Each token In field.Value
subobject("item" & item) = token
item += 1
Next
Dim subdic = DeserializeToDictionary(subobject)
For Each kvp In subdic
dic(kvp.Key) = kvp.Value
Next
Case JTokenType.Boolean
dic(field.Name) = field.Value.ToObject(Of Boolean)
Case JTokenType.Bytes
dic(field.Name) = field.Value.ToObject(Of Byte())
Case JTokenType.Date
dic(field.Name) = field.Value.ToObject(Of Date)
Case JTokenType.Float
dic(field.Name) = field.Value.ToObject(Of Double)
Case JTokenType.Guid
dic(field.Name) = field.Value.ToObject(Of Guid)
Case JTokenType.Integer
dic(field.Name) = field.Value.ToObject(Of Integer)
Case JTokenType.Object
Dim subdic = DeserializeToDictionary(field.Value)
For Each kvp In subdic
dic(kvp.Key) = kvp.Value
Next
Case JTokenType.String
Try
dic(field.Name) = field.Value.ToObject(Of String)
Catch ex As Exception
dic(field.Name) = field.Value.ToObject(Of Object)
End Try
Case JTokenType.TimeSpan
dic(field.Name) = field.Value.ToObject(Of TimeSpan)
Case Else
dic(field.Name) = field.Value.ToString
End Select
Next
Return dic
End Function

使用上面的代码时,您必须意识到这一点:

  1. 它使用递归来展平多分支结构。所以

    {
    "A":"aaaa",
    "B":"bbbb",
    "C":{
    "D":"dddd",
    "E":"eeee",
    "F":"ffff"
    }
    }
    }
    

    将成为

    A   |B   |D   |E   |F
    ----+----+----+----+----
    aaaa|bbbb|dddd|eeee|ffff
    
  2. 我所做的方式假设在展平时不会有重复;如果有这些,它将只保留最后一个。所以

    {
    "A":"aaaa",
    "B":"bbbb",
    "C":{
    "D":"d1d1",
    "E":"e1e1",
    "F":"f1f1"
    },
    "G":{
    "D":"d2d2",
    "E":"e2e2",
    "F":"f2f2"
    }
    }
    }
    

    将成为

    A   |B   |D   |E   |F
    ----+----+----+----+----
    aaaa|bbbb|d2d2|e2e2|f2f2
    

    这是一个明显有缺陷的、有缺陷的行为,这将需要一种更复杂的方法,我留给你在我的划痕基础上再接再厉。

即使DataTable已经包含DataRows,也可以将DataColumns添加到中。

我不怎么做 JSON,但我对狡猾的 XML 的一般方法是分解为键值对流,其中键是 XPATH "地址",值是节点的内容(不包括子节点),然后遍历流以构建DataTable。也许在这里可以采用类似的方法使用JSONPath。

相关内容

  • 没有找到相关文章

最新更新