我们如何像下面提到的那样分割数据?
样本数据:
EmployeeId City join_month Year
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
所需输出应为-
EmployeeId City join_month Year 2018_jan_count 2018_feb_count 2018_march_count
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
您可以使用df.apply
df.apply(pd.value_counts)
这将应用一个基于列的聚合函数(在本例中为date)
构建year - month值,然后将其用作数据透视表上的一列。我计算按城市和年月聚合的员工id
months=[(1,'Jan'),(2,'Feb'),(3,'Mar'),(4,'Apr'),(5,'May'),(6,'Jun'),(7,'Jul'),(8,'Aug'),(9,'Sept'),(10,'Oct'),(11,'Nov'),(12,'Dec')]
employeeId=['001','001','002','002','003','003','004','004','005']
city=['Mumbai', 'Bangalore','Pune','Mumbai','Delhi','Mumbai','Bangalore','Pune','Mumbai']
join_month=[1,3,2,6,9,12,11,10,1]
char_month=[b for item in join_month for a,b in months if item==a ]
year=[2018, 2018,2019,2017,2018,2017,2017,2018,2018]
char_yearmonth=[]
[char_yearmonth.append(str(year[i])+"_"+char_month[i]) for i in range(len(year))]
df=pd.DataFrame({'EmployeeId': employeeId,'City':city,'YearMonth':char_yearmonth})
fp=df.pivot_table(index=['City'], columns=['YearMonth'],aggfunc='count').fillna(0)
print(fp)
EmployeeId
YearMonth 2017_Dec 2017_Jun 2017_Nov 2018_Jan 2018_Mar 2018_Oct 2018_Sept
City
Bangalore 0.0 0.0 1.0 0.0 1.0 0.0 0.0
Delhi 0.0 0.0 0.0 0.0 0.0 0.0 1.0
Mumbai 1.0 1.0 0.0 2.0 0.0 0.0 0.0
Pune 0.0 0.0 0.0 0.0 0.0 1.0 0.0
YearMonth 2019_Feb
City
Bangalore 0.0
Delhi 0.0
Mumbai 0.0
Pune 1.0