如何将具有重复键的JSON文件导入数据帧



例如:我有一个程序在JSON文件中生成这样的使用日志。JSON文件日志包含许多相同的密钥,称为"activity",如下所示:

"probe": "PROCESS_PROBE",
"status": "ProcessCreated",
"processName": "backgroundTaskHost.exe",
"path": "C:\WINDOWS\system32\backgroundTaskHost.exe",
"creationClassName": "Win32_Process",
"handle": "21632",
"priority": "Normal",
"commandLine": ""C:\WINDOWS\system32\backgroundTaskHost.exe" -ServerName:CortanaUI.AppXy7vb4pc2dr3kc93kfc509b1d0arkfb2x.mca",
"handleCount": 236,
"processId": 21632,
"parentProcessId": 112,
"pageFileUsage": 4244,
"creationDate": "20200410172922.614702+120",
"annotations": {
"userName": "datta",
"timeSinceStartup": 259878750,
"ticksOfEvent": 637221365629757593
}
},
"activity":{
"probe": "PROCESS_PROBE",
"status": "ProcessDeleted",
"processName": "RuntimeBroker.exe",
"path": "C:\Windows\System32\RuntimeBroker.exe",
"creationClassName": "Win32_Process",
"handle": "8504",
"priority": "Normal",
"handleCount": 285,
"processId": 8504,
"parentProcessId": 112,
"pageFileUsage": 3180,
"creationDate": "20200410172757.934567+120",
"terminationDate": null,
"annotations": {
"userName": "datta",
"timeSinceStartup": 259883953,
"ticksOfEvent": 637221365681937472
}
},
"activity":{
"probe": "FILERESOURCE_PROBE",
"status": "Changed",
"path": "C:\Users\datta\eclipse\jee-2019-12",
"entityName": "eclipse",
"extension": "",
"attributes": "Directory",
"owner": "null",
"length": 0,
"isReadOnly": false,
"creationTime": "2020-01-17T09:42:08.5092897+01:00",
"lastWriteTime": "2020-03-25T10:56:10.7382329+01:00",
"lastAccessTime": "2020-04-10T17:29:29.9811767+02:00",
"annotations": {
"userName": "datta",
"timeSinceStartup": 259885750,
"ticksOfEvent": 637221365699837331
}
},
"activity":{
"probe": "FILERESOURCE_PROBE",
"status": "Changed",
"path": "C:\Users\datta\eclipse",
"entityName": "jee-2019-12",
"extension": "",
"attributes": "Directory",
"owner": "null",
"length": 0,
"isReadOnly": false,
"creationTime": "2020-01-17T09:42:08.5083+01:00",
"lastWriteTime": "2020-01-17T09:42:08.5092897+01:00",
"lastAccessTime": "2020-04-10T17:29:29.9801436+02:00",
"annotations": {
"userName": "datta",
"timeSinceStartup": 259885750,
"ticksOfEvent": 637221365699906960
}
},
"activity":{
"probe": "FILERESOURCE_PROBE",
"status": "Changed",
"path": "C:\Users\datta",
"entityName": "eclipse",
"extension": "",
"attributes": "Directory",
"owner": "null",
"length": 0,
"isReadOnly": false,
"creationTime": "2020-01-17T09:42:08.5083+01:00",
"lastWriteTime": "2020-01-17T09:42:08.5083+01:00",
"lastAccessTime": "2020-04-10T17:29:29.9922013+02:00",
"annotations": {
"userName": "datta",
"timeSinceStartup": 259885765,
"ticksOfEvent": 637221365699922013
}
}
}

我想将活动键中的数据加载为数据帧的列。例如,每个活动都将是数据帧中的一行,列将是"探测"、"状态"、"进程名称"等。

问题是,当我使用logData = json.load(logfile)加载数据时,它只加载最后一个活动密钥,因为它会因为重复而被覆盖。我尝试使用logData = json.load(logfile, object_pairs_hook=tuple)加载数据。它将数据加载为一个巨大的元组。我不确定如何实现我试图获得的数据帧。提前谢谢。

请参阅JSON语法是否允许对象中存在重复键?

这里的问题不在于JSON,而在于您使用的目标结构。Pythons的json模块定义了将JSON对象导入字典,从而使处理重复属性(键(变得不可能。

这里真正的问题在于这个JSON的生产者。制作一个记录列表,甚至一个字典列表("activity"是每个字典中唯一的关键字(本来是非常容易的。出于他们自己的原因,生产者选择创建这种结构,这种结构(几乎(是合法的,但大多数JSON消费者都无法处理(我知道Python、PHP,最重要的是JavaScript,都会遇到这种情况(。

同样可以放心地假设,生成您试图读取的文件的程序不是通过JSON包生成的(至少不是整个文件(。它可能会生成文本块并将它们附加到流中。

最新更新