Python:将多个YAML文档转换为JSON



我目前正在尝试使用python将一些YAML转换为JSON,并且很难正确格式化JSON。我的YAML文件有多个文档,如下所示:

title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
- https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage:
- '*mshta.exe'
- '*powershell.exe'
- '*cmd.exe'
- '*rundll32.exe'
- '*cscript.exe'
- '*wscript.exe'
- '*wmiprvse.exe'
Image:
- '*schtasks.exe'
- '*nslookup.exe'
- '*certutil.exe'
- '*bitsadmin.exe'
- '*mshta.exe'
condition: selection
fields:
- CommandLine
- ParentCommandLine
falsepositives:
- Administrative scripts
level: medium
...

我正在尝试为每个文档提取检测、字段、误报和级别,并将它们作为单独的数组放入JSON文档中。我的第一次尝试非常糟糕,只是将每个文档中的组集中到列表中:

data = {}
data['indicator'] = {}
data['indicator']['detection']=[]
data['indicator']['fields']=[]
data['indicator']['false positives']=[]
data['indicator']['level']=[]
with open(yaml_file, 'r') as yaml_in, open(json_file, 'a') as definition:
loadyaml = yaml.safe_load_all(yaml_in)
for item in loadyaml:
for header, subsections in item.iteritems():
if header == 'detection':
data['indicator']['detection'].append(subsections)
elif header == 'fields':
data['indicator']['fields'].append(subsections)
elif header == 'false positives':
data['indicator']['false positives'].append(subsections)
elif header == 'level':
data['indicator']['level'].append(subsections)
json.dump(data, definition, indent=4)

我希望我的每个文档都作为单独的指示符输入到我的json文档中,它们的检测、字段、dalsposities和级别都分组在一起——但我的python能力让我失望了。

如果我能对此有任何见解,我将不胜感激!

您可以通过迭代.load_all()和一个小得多的程序来获得您想要的输出:

import sys
import ruamel.yaml
import json
yaml = ruamel.yaml.YAML(typ='safe')
ind = dict()
data = dict(indicator=ind)
for d in yaml.load_all(open('input.yaml')):
for k in ('detection', 'fields', 'falsepositives', 'level'):
ind.setdefault(k, []).append(d[k])
json.dump(data, sys.stdout, indent=2)

如果你有一个文件input.yaml:

---
title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
- https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage:
- '*mshta.exe'
- '*powershell.exe'
- '*cmd.exe'
- '*rundll32.exe'
- '*cscript.exe'
- '*wscript.exe'
- '*wmiprvse.exe'
Image:
- '*schtasks.exe'
- '*nslookup.exe'
- '*certutil.exe'
- '*bitsadmin.exe'
- '*mshta.exe'
condition: selection
fields:
- CommandLine
- ParentCommandLine
falsepositives:
- Administrative scripts
level: medium
...
---
title: Bash starting just what is asked
status: stabel
description: No negative side effects
references:
- https://nblue24.github.io/posts/2019/04/01/DownloadBed.html
author: Axel Roth
date: 2019/04/01
logsource:
product: linux
service: good
detection:
selection:
EventID: 42
ParentImage:
- '*/bash'
- '*/ash'
Image:
- systemctl
- init
condition: selection
fields:
- Shell
- ParentShell
falsepositives:
- root programs
level: high
...

您的输出将是:

{
"indicator": {
"detection": [
{
"selection": {
"EventID": 1,
"ParentImage": [
"*\mshta.exe",
"*\powershell.exe",
"*\cmd.exe",
"*\rundll32.exe",
"*\cscript.exe",
"*\wscript.exe",
"*\wmiprvse.exe"
],
"Image": [
"*\schtasks.exe",
"*\nslookup.exe",
"*\certutil.exe",
"*\bitsadmin.exe",
"*\mshta.exe"
]
},
"condition": "selection"
},
{
"selection": {
"EventID": 42,
"ParentImage": [
"*/bash",
"*/ash"
],
"Image": [
"systemctl",
"init"
]
},
"condition": "selection"
}
],
"fields": [
[
"CommandLine",
"ParentCommandLine"
],
[
"Shell",
"ParentShell"
]
],
"falsepositives": [
[
"Administrative scripts"
],
[
"root programs"
]
],
"level": [
"medium",
"high"
]
}
}

这适用于Python 2和Python 3。

import yaml
import json
data = {}
data['indicator'] = {}
data['indicator']['detection']=[]
data['indicator']['fields']=[]
data['indicator']['falsepositives']=[]
data['indicator']['level']=[]
def parse_string(s, data):
doc = next(yaml.safe_load_all(s))
data['indicator']['detection'].append(doc['detection'])
data['indicator']['fields'].append(doc['fields'])
data['indicator']['falsepositives'].append(doc['falsepositives'])
data['indicator']['level'].append(doc['level'])
with open(yaml_file, 'r') as yaml_in, open(json_file, 'a') as definition:
parse_string(yaml_in.read(), data)
json.dump(data, definition, indent=4)

最新更新