将CSV加载到嵌套字典中以提取动态值



我从我们的开发人员那里得到了一个CSV,它看起来像这样(仅限示例值(

Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR

我需要我的python脚本能够阅读这篇文章(值会随着时间的推移而增长和变化(,并为我提供给定类别和子类别的模板名称。我可以通过阅读CSV并在其中循环,搜索我想要的值来实现这一点,但似乎必须有一种更简单的方法。

如果我像这样加载一个JSON文件,我可以使用JSON.load将内容转换为dict,然后轻松地检索我想要的值,而不必遍历内容。

{
"Health Check": {
"CPU": "checkCPU",
"Memory": "checkMemory"
},
"Service Request": {
"Reboot Device": "rebootDevice",
"Check CPU": "checkCPU-SR"
}
}

然后我用之类的东西

import json
import csv
with open('categories.json','r') as f:
myDict = json.load(f)
print(myDict["Health Check"]["CPU"])

我更愿意使用dict方法,但我不知道是否有办法从CSV文件中实现这一点。我试过一些东西,比如csv.dictreader或Pandas,但我都无法让它们发挥作用。Pandas我可以设置一个密钥,但这里的值都不是唯一的。使用csv.dictreader,这种数据的嵌套方式(在一个标题下有多个键/值,如HealthCheck(似乎不起作用。

如果使用pandas,则不需要loop,也不必转换为JSON

all_results = df[ (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU") ]["Template Name"] 

或更可读的

mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]

最小工作代码。

我只使用io来模拟文件,所以每个人都可以简单地复制和测试它,但您应该使用filename

text = '''Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep=',s*')  # I use `,s*` to remove spaces after `,`
print(df)
print('---')
mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]
print(all_results.iloc[0])

结果:

Category   Sub-Category Template Name
0     Health Check            CPU      checkCPU
1     Health Check         Memory   checkMemory
2  Service Request  Reboot Device  rebootDevice
3  Service Request      Check CPU   checkCPU-SR
---
checkCPU

使用pandas,您可以轻松选择所有可以匹配某些值的项目,即Sub-Category中具有子字符串CPU的所有项目

mask = df["Sub-Category"].str.contains("CPU")
all_results = df[mask]
for index, item in all_results.iterrows():
print(item['Category'], '|', item["Sub-Category"], '|', item["Template Name"])

结果:

Health Check | CPU | checkCPU
Service Request | Check CPU | checkCPU-SR
import json

def convert():
output_dict = {}
with open("<file_name>", "r") as source:
file_lines = source.read().splitlines()
file_lines.pop(0)  # Remove header line as we don't need it
for line in file_lines:
line_contents = line.split(",")  # CSV so split the line on ','
if line_contents[0] not in output_dict:  # If first key isn't in output_dict, make it as a dict
output_dict[line_contents[0]] = {}
if line_contents[1] not in output_dict[line_contents[0]]:  # If second key isn't a sub-key of the first-key in output_dict,  make it a dict
output_dict[line_contents[0]][line_contents[1]] = {}
output_dict[line_contents[0]][line_contents[1]][line_contents[2]] = line_contents[3]  # Now add the entry based on the top two key and sub-key
return output_dict  # return the dict

if __name__ == "__main__":
print(json.dumps(convert(), indent=4))

最新更新