我想询问如何合并2个csv文件,以便生成查询。请注意,我不被允许使用";熊猫;图书馆
作为一个例子,我有这2 csv:
数据.csv:
cod_pers, cod_enti, fec_venc
2317422,208,04/12/2022
2392115,210,04/02/2022
2086638,211,31/03/2022
2086638,212,03/13/2022
enti.csv:
cod_enti,cod_market
208,40
209,50
210,16
211,40
212,50
我想要的是能够通过cod_enti
加入他们,从而能够评估他们在过去15天内作为cod_mercado = 40
的cod_pers
的数量。
为此,我理解我可以生成如下csv文件的读取:
import csv
import numpy as np
from time import strftime
from datetime import datetime, date, time, timedelta
from dateutil.relativedelta import relativedelta
#Read the CSV file
str2date = lambda x: datetime.strptime(x, '%d/%m/%Y')
data_datos = np.genfromtxt('datos.csv', delimiter=',', dtype=None, names=True, converters={'fec_venc':str2date}, encoding="UTF-8")
data_enti = np.genfromtxt('enti.csv', delimiter=',', dtype=None, names=True, encoding="UTF-8")
稍后可以使用类似于以下的方法按天搜索:
#definition of days
today = datetime.now()
yesterday = today - timedelta(days=15)
# Generate array of dates
values_on_date =[]
calc_date = data_datos['fec_venc']
for date_obt in calc_date:
if (yesterday <= date_obt):
values_on_date.append(date_obt)
tot_doc_mor_15d = len(values_on_date)
print(tot_doc_mor_15d)
但为此,我必须首先连接2个csv文件才能生成查询。
我期待你的评论和任何形式的帮助,非常感谢。谢谢!!:D
修改:
我在代码中添加了以下行:
from numpy.lib import recfunctions
merged_array = recfunctions.join_by('cod_enti', data_datos, data_enti)
它工作正常,但当我想输入带有更多数据的csv时,它会给我以下错误:
TypeError: invalid type promotion with structured datatype(s).
您只需使用CSV模块就可以用非常直接的方法实现这一点。
我创建了data.csv中每一行到其code_enti
值的映射。然后,对于enti.csv中具有匹配code_enti的每一行,我更新映射中的行:
import csv
import pprint
# Create a mapping of a data row to its cod_enti, e.g.:
# {208: {cod_pers:2317422, cod_enti:208, fec_venc:04/12/2022}, ...}
cod_enti_row_map = {}
with open("data.csv", newline="") as f:
reader = csv.DictReader(f, skipinitialspace=True) # because your header row has leading spaces
for row in reader:
cod_enti = row["cod_enti"]
cod_enti_row_map[cod_enti] = row
print(f"Map before join")
pprint.pprint(cod_enti_row_map, width=100, sort_dicts=False)
# Now, update each row in the map with cod_market for the key, cod_enti
with open("enti.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
cod_enti = row["cod_enti"]
# skip cod_enti in enti.csv that is not in data.csv, like 209
if cod_enti not in cod_enti_row_map:
continue
cod_enti_row_map[cod_enti].update(row)
print(f"Map after join")
pprint.pprint(cod_enti_row_map, width=100, sort_dicts=False)
以下是我运行时得到的:
Map before join
{'208': {'cod_pers': '2317422', 'cod_enti': '208', 'fec_venc': '04/12/2022'},
'210': {'cod_pers': '2392115', 'cod_enti': '210', 'fec_venc': '04/02/2022'},
'211': {'cod_pers': '2086638', 'cod_enti': '211', 'fec_venc': '31/03/2022'},
'212': {'cod_pers': '2086638', 'cod_enti': '212', 'fec_venc': '03/13/2022'}}
Map after join
{'208': {'cod_pers': '2317422', 'cod_enti': '208', 'fec_venc': '04/12/2022', 'cod_market': '40'},
'210': {'cod_pers': '2392115', 'cod_enti': '210', 'fec_venc': '04/02/2022', 'cod_market': '16'},
'211': {'cod_pers': '2086638', 'cod_enti': '211', 'fec_venc': '31/03/2022', 'cod_market': '40'},
'212': {'cod_pers': '2086638', 'cod_enti': '212', 'fec_venc': '03/13/2022', 'cod_market': '50'}}
从那里,您可以将行提取到一个普通列表中,并按键值或其他方式进行所有筛选。