给定一个datframe:
df = pd.DataFrame({'c':[0,1,1,2,2,2],'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-05'])})
如何将最晚的月份标记为M1,将第二晚的月份标记为M2,等等
所以for和out的例子看起来像这样:
df = pd.DataFrame({'c':[0,1,1,2,2,2],'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-05']),
'tag':['M6', 'M5', 'M4', 'M3', 'M2', 'M1']})
+----+-------+-------------+----+
| | c | date |tag
+----+-------+-------------+----+
| 0 | 0 | 2016-01-01 | M6 |
| 1 | 1 | 2016-02-01 | M5 |
| 2 | 1 | 2016-03-01 | M4 |
| 3 | 2 | 2016-04-01 | M3 |
| 4 | 2 | 2016-05-01 | M2 |
| 5 | 2 | 2016-06-05 | M1 |
+----+-------+-------------+----+
如果您想要一个健壮的方法,您可以创建一个月周期(使用to_period
),然后rank
并转换为字符串:
month = pd.to_datetime(df['date']).dt.to_period('M')
df['tag'] = 'M'+month.rank(method='dense', ascending=False).astype(int).astype(str)
输出:
c date tag
0 0 2016-01-01 M6
1 1 2016-02-01 M5
2 1 2016-03-01 M4
3 2 2016-04-01 M3
4 2 2016-05-01 M2
5 2 2016-06-05 M1