如何使用csv文件作为源和panda创建具有内部逻辑的嵌套json



解释:需要将csv文件转换为json文件。csv包含以下信息:给定一个帐户,该帐户必须支付1或n份文档,其中一份的到期日为当天,另一份的日期为之前。csv文件具有以下结构:

due_dateid_typeto_pay//tr>>987654123987654124<15022022>199,75<15032022>113,259875649998765477987444123<15032022>
账户 customer_nameid_document
2001 John Doe 05082022FC113,25
2002 Karl Steven 05082022 FC113,25
2003 Sarah MillerFC98756425
2004 Marty McflyFC98756426
2001 John Doe 06032022 FC150,00
2001 John Doe 15042022 FC15000
2003 Sarah Miller 05082022 FC113,25
2001 John DoeFC987654123113,25

您可以使用.groupby('account')单独处理每个帐户。

当您有一个帐户时,您可以将第一行([0](作为主文档,将其他行([1:](作为pending_documents

使用.to_dict(),您可以为account创建dict,稍后您可以将其附加到某个列表中以拥有所有帐户。


具有示例数据的最小工作代码。

我使用io在内存中创建文件,这样每个人都可以简单地复制和测试它,但您可以使用filename

我使用engine='python',所以我可以在分隔符sep='s+|s*'中使用regex,并删除值周围的所有空格。

data = '''account | customer_name | due_date | id_type | id_document | to_pay |
2001     | John Doe      | 05082022 | FC      | 987654123   | 113,25 |
2002     | Karl Steven   | 05082022 | FC      | 987654124   | 113,25 |
2003     | Sarah Miller  | 15022022 | FC      | 987654125   | 199,75 |
2004     | Marty Mcfly   | 15032022 | FC      | 987654126   | 113,25 |
2001     | John Doe      | 06032022 | FC      | 987654999   | 150,00 |
2001     | John Doe      | 15042022 | FC      | 987654777   | 15,00  |
2003     | Sarah Miller  | 05082022 | FC      | 987444123   | 113,25 |
2001     | John Doe      | 15032022 | FC      | 987654123   | 113,25 |'''
import pandas as pd
import io
#df = pd.read_csv('filename.csv')
df = pd.read_csv(io.StringIO(data), sep='s+|s*', engine='python')
#print(df)
df = df.drop('Unnamed: 6', axis=1)  # remove column after last `|`
#print(df)
all_accounts = []
for selected, group in df.groupby(['account']):
#print('---', selected, '---')
account = group.iloc[0].to_dict()
docs = group.iloc[1:][['id_type', 'id_document', 'to_pay', 'due_date']]
account["pending_documents"] = docs.to_dict(orient='records')
#print(account)
all_accounts.append(account)
import json
print(json.dumps(all_accounts, indent=2))

结果:

[
{
"account": 2001,
"customer_name": "John Doe",
"due_date": 5082022,
"id_type": "FC",
"id_document": 987654123,
"to_pay": "113,25",
"pending_documents": [
{
"id_type": "FC",
"id_document": 987654999,
"to_pay": "150,00",
"due_date": 6032022
},
{
"id_type": "FC",
"id_document": 987654777,
"to_pay": "15,00",
"due_date": 15042022
},
{
"id_type": "FC",
"id_document": 987654123,
"to_pay": "113,25",
"due_date": 15032022
}
]
},
{
"account": 2002,
"customer_name": "Karl Steven",
"due_date": 5082022,
"id_type": "FC",
"id_document": 987654124,
"to_pay": "113,25",
"pending_documents": []
},
{
"account": 2003,
"customer_name": "Sarah Miller",
"due_date": 15022022,
"id_type": "FC",
"id_document": 987654125,
"to_pay": "199,75",
"pending_documents": [
{
"id_type": "FC",
"id_document": 987444123,
"to_pay": "113,25",
"due_date": 5082022
}
]
},
{
"account": 2004,
"customer_name": "Marty Mcfly",
"due_date": 15032022,
"id_type": "FC",
"id_document": 987654126,
"to_pay": "113,25",
"pending_documents": []
}
]

最新更新