我有一个csv文件,格式如下:
a | b | c | d | e|
---|---|---|---|---|
1 | 2 | 3 | >td>45//tr>||
9 | 8 | 7 | 6 | 5 |
一个简单的方法是添加更多的列;然后在大熊猫中使用to_json
方法:
import pandas as pd
df = pd.read_csv('your_file.csv')
df['Purchase'] = df[['b','c','d']].to_dict('records')
df['Sales'] = df[['d','e']].to_dict('records')
out = df[['a', 'Purchase', 'Sales']].to_json(orient='records', indent=4)
输出:
[
{
"a":1,
"Purchase":{
"b":2,
"c":3,
"d":4
},
"Sales":{
"d":4,
"e":5
}
},
{
"a":9,
"Purchase":{
"b":8,
"c":7,
"d":6
},
"Sales":{
"d":6,
"e":5
}
}
]
您不需要任何库,只需指定正确的方言即可,例如,对于制表符分隔的:
import csv
import json
with open("tmp4.csv", "r") as f:
result = [
{
"a": row["a"],
"Purchase": {
"b": row["b"],
"c": row["c"],
},
"Sales": {
"d": row["d"],
"e": row["e"],
},
}
for row in csv.DictReader(f, dialect='excel-tab')
]
assert (
json.dumps(result)
== '[{"a": "1", "Purchase": {"b": "2", "c": "3"}, "Sales": {"d": "4", "e": "5"}}, {"a": "9", "Purchase": {"b": "8", "c": "7"}, "Sales": {"d": "6", "e": "5"}}]'
)
执行r["purchase"] = {"b": ...}
时,将字典分配回每行对象r
,该对象在循环结束时被丢弃。相反,为每条记录创建一个新字典,并将其附加到列表中。类似:
result = []
with open("new_data.csv") as f:
reader = csv.DictReader(f)
for r in reader:
result.append({
"a": r["a"],
"Purchase" : {
"b": r["b"],
"c": r["c"],
"d": r["d"],
},
"Sales": {
"d": r["d"],
"e": r["e"],
},
})
并使用列表理解创建result
:
with open("new_data.csv") as f:
reader = csv.DictReader(f)
result = [{
"a": r["a"],
"Purchase" : {
"b": r["b"],
"c": r["c"],
"d": r["d"],
},
"Sales": {
"d": r["d"],
"e": r["e"],
},
} for r in reader]