如何用序列标记前几个月



给定一个datframe:

df = pd.DataFrame({'c':[0,1,1,2,2,2],'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-05'])})

如何将最晚的月份标记为M1,将第二晚的月份标记为M2,等等

所以for和out的例子看起来像这样:

df = pd.DataFrame({'c':[0,1,1,2,2,2],'date':pd.to_datetime(['2016-01-01','2016-02-01','2016-03-01','2016-04-01','2016-05-01','2016-06-05']), 
'tag':['M6', 'M5', 'M4', 'M3', 'M2', 'M1']})

+----+-------+-------------+----+
|    | c     | date        |tag     
+----+-------+-------------+----+
| 0  |    0  | 2016-01-01  | M6 |
| 1  |    1  | 2016-02-01  | M5 |
| 2  |    1  | 2016-03-01  | M4 |
| 3  |    2  | 2016-04-01  | M3 |
| 4  |    2  | 2016-05-01  | M2 |
| 5  |    2  | 2016-06-05  | M1 |
+----+-------+-------------+----+

如果您想要一个健壮的方法,您可以创建一个月周期(使用to_period),然后rank并转换为字符串:

month = pd.to_datetime(df['date']).dt.to_period('M')
df['tag'] = 'M'+month.rank(method='dense', ascending=False).astype(int).astype(str)

输出:

c       date tag
0  0 2016-01-01  M6
1  1 2016-02-01  M5
2  1 2016-03-01  M4
3  2 2016-04-01  M3
4  2 2016-05-01  M2
5  2 2016-06-05  M1

最新更新