我从我们的开发人员那里得到了一个CSV,它看起来像这样(仅限示例值(
Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR
我需要我的python脚本能够阅读这篇文章(值会随着时间的推移而增长和变化(,并为我提供给定类别和子类别的模板名称。我可以通过阅读CSV并在其中循环,搜索我想要的值来实现这一点,但似乎必须有一种更简单的方法。
如果我像这样加载一个JSON文件,我可以使用JSON.load将内容转换为dict,然后轻松地检索我想要的值,而不必遍历内容。
{
"Health Check": {
"CPU": "checkCPU",
"Memory": "checkMemory"
},
"Service Request": {
"Reboot Device": "rebootDevice",
"Check CPU": "checkCPU-SR"
}
}
然后我用之类的东西
import json
import csv
with open('categories.json','r') as f:
myDict = json.load(f)
print(myDict["Health Check"]["CPU"])
我更愿意使用dict方法,但我不知道是否有办法从CSV文件中实现这一点。我试过一些东西,比如csv.dictreader或Pandas,但我都无法让它们发挥作用。Pandas我可以设置一个密钥,但这里的值都不是唯一的。使用csv.dictreader,这种数据的嵌套方式(在一个标题下有多个键/值,如HealthCheck(似乎不起作用。
如果使用pandas
,则不需要loop
,也不必转换为JSON
。
all_results = df[ (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU") ]["Template Name"]
或更可读的
mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]
最小工作代码。
我只使用io
来模拟文件,所以每个人都可以简单地复制和测试它,但您应该使用filename
text = '''Category, Sub-Category, Template Name
Health Check,CPU,checkCPU
Health Check,Memory,checkMemory
Service Request,Reboot Device, rebootDevice
Service Request,Check CPU,checkCPU-SR'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep=',s*') # I use `,s*` to remove spaces after `,`
print(df)
print('---')
mask = (df["Category"] == "Health Check") & (df["Sub-Category"] == "CPU")
all_results = df[mask]["Template Name"]
print(all_results.iloc[0])
结果:
Category Sub-Category Template Name
0 Health Check CPU checkCPU
1 Health Check Memory checkMemory
2 Service Request Reboot Device rebootDevice
3 Service Request Check CPU checkCPU-SR
---
checkCPU
使用pandas
,您可以轻松选择所有可以匹配某些值的项目,即Sub-Category
中具有子字符串CPU
的所有项目
mask = df["Sub-Category"].str.contains("CPU")
all_results = df[mask]
for index, item in all_results.iterrows():
print(item['Category'], '|', item["Sub-Category"], '|', item["Template Name"])
结果:
Health Check | CPU | checkCPU
Service Request | Check CPU | checkCPU-SR
import json
def convert():
output_dict = {}
with open("<file_name>", "r") as source:
file_lines = source.read().splitlines()
file_lines.pop(0) # Remove header line as we don't need it
for line in file_lines:
line_contents = line.split(",") # CSV so split the line on ','
if line_contents[0] not in output_dict: # If first key isn't in output_dict, make it as a dict
output_dict[line_contents[0]] = {}
if line_contents[1] not in output_dict[line_contents[0]]: # If second key isn't a sub-key of the first-key in output_dict, make it a dict
output_dict[line_contents[0]][line_contents[1]] = {}
output_dict[line_contents[0]][line_contents[1]][line_contents[2]] = line_contents[3] # Now add the entry based on the top two key and sub-key
return output_dict # return the dict
if __name__ == "__main__":
print(json.dumps(convert(), indent=4))