需要帮助从字典列表中提取年份



我可以用下面的代码提取一些数据,但我想在下面提取的数据中包括年份和月份。示例数据如下所示。根据数据中的日期寻找分歧。

people = [
{"name": "Tom", "age": 10, "city": "NewYork", "Date": '01/01/2021'},
{"name": "Mark", "age": 5, "country": "Japan", "Date": '05/01/2021'},
{"name": "Pam", "age": 7, "city": "London", "Date": '03/06/2021'},
{"name": "Tom", "hight": 163, "city": "California", "Date": '04/06/2021'},
{"name": "Lena", "weight": 45, "country": "Italy", "Date": '12/12/2020'},
{"name": "Ben", "age": 17, "city": "Colombo", "Date": '11/12/2020'},
{"name": "Lena", "gender": "Female", "country": "Italy", "Date": '8/01/2020'},
{"name": "Ben", "gender": "Male", "city": "Colombo", "Date": '7/01/2020'},
{"name": "Tom", "age": 10, "country": "Italy", "Date": '01/01/2021'},
{"name": "Mark", "age": 5, "country": "Japan", "Date": '05/01/2021'},
{"name": "Tom", "age": 7, "city": "London", "Date": '03/06/2021'},
{"name": "Tom", "hight": 163, "country": "Japan", "Date": '04/06/2021'}
]
def groupby( fld ):
vals = { fld: 0 }
for row in people:
if fld in row:
vals[fld] += 1
if row[fld] not in vals:
vals[row[fld]] = 1
else:
vals[row[fld]] += 1
return vals
months = 
('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
def groupbydate( fld ):
vals = {}
for row in people:
if fld in row and 'Date' in row:
month = months[int(row['Date'].lstrip('0').split('/')[0])-1]
if row[fld] not in vals:
vals[row[fld]] = {}
if month not in vals[row[fld]]:
vals[row[fld]][month] = 1
else:
vals[row[fld]][month] += 1
return vals
print( groupby( 'name' ) )
print( groupby( 'city' ) )
print( groupby( 'country' ) )
print( )
print( groupbydate( 'city' ) )

当前输出

{'name': 12, 'Tom': 5, 'Mark': 2, 'Pam': 1, 'Lena': 2, 'Ben': 2}
{'city': 6, 'NewYork': 1, 'London': 2, 'California': 1, 'Colombo': 2}
{'country': 6, 'Japan': 3, 'Italy': 3}
{'NewYork': {'Jan': 1}, 'London': {'Mar': 2}, 'California': {'Apr': 
1}, 'Colombo': {'Nov': 1, 'Jul': 1}}

的期望输出

{'NewYork': {'Jan 21': 1}, 'London': {'Mar 20': 2}, 'California': 
{'Apr 20': 1}, 'Colombo': {'Nov 20': 1, 'Jul 20': 1}}

尝试:

df=pd.DataFrame(people)
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%b %y')
out=pd.crosstab(df['Date'],df['city']).rename_axis(columns=None)

最后:

d1=df['name'].value_counts().to_dict()
d1['name']=len(df)
d2=df['city'].value_counts().to_dict()
d2['city']=df['city'].value_counts().sum()
d3=df['country'].value_counts().to_dict()
d3['country']=df['country'].value_counts().sum()
d4=out.apply(lambda x:dict(x[x.ne(0)])).to_dict()

输出:

print(d1)
#output
{'Tom': 5, 'Lena': 2, 'Ben': 2, 'Mark': 2, 'Pam': 1, 'name': 12}
print(d2)
#output
{'London': 2, 'Colombo': 2, 'NewYork': 1, 'California': 1, 'city': 6}
print(d3)
#output
{'Japan': 3, 'Italy': 3, 'country': 6}
print(d4)
#output
{'California': {'Apr 21': 1},
'Colombo': {'Jul 20': 1, 'Nov 20': 1},
'London': {'Mar 21': 2},
'NewYork': {'Jan 21': 1}}

您可以将这些日期转换为DateTime对象,然后根据需要对其进行格式化。

from datetime import datetime
for record in people:
record["Date"] = datetime.strptime(record["Date"], "%m/%d/%Y")

然后你可以把它作为记录["日期"].strftime("%b%y"(它将为您提供CCD_ 1格式。

可选:其中一个观察结果是,你想分组表演,所以你能做的就是使用熊猫。

import pandas as pd
df = pd.DataFrame(people)
def groupby_date(fld):
df.groupby(by=fld, sort=False).apply(lambda group: group.groupby(by="Date").apply(len))
result = dict()
for (fld, date), count in x.items():
date = datetime.datetime.strptime(date, "%m/%d/%Y")
result[fld] = {date.strftime("%b %y"): count}
return result

输出为:

{'NewYork': {'Jan 21': 1},
'London': {'Mar 21': 2},
'California': {'Apr 21': 1},
'Colombo': {'Jul 20': 1}}

将日期字符串转换为日期-时间对象,然后根据需要对其进行格式化。

看看这些代码,它与您提交的代码非常相似。

def groupbydate( fld ):
vals = {}
for row in people:
if fld in row and 'Date' in row:
datestring=row['Date']
dt = datetime.strptime(datestring, '%m/%d/%Y')
month = dt.strftime("%b %y")
print(month)
if row[fld] not in vals:
vals[row[fld]] = {}
if month not in vals[row[fld]]:
vals[row[fld]][month] = 1
else:
vals[row[fld]][month] += 1
return vals

相关内容

  • 没有找到相关文章

最新更新