我想统计每天固定2个月反转的病例数(例如,1 - 2月,3 - 4月,5 - 6月,7 - 8月等)。例如,
import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})
,我想产生的结果如下:
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010_09_10 1
你有什么好主意吗?计算每月的病例数很简单,但这个问题对我来说很难。提前感谢!
按频率使用Grouper
2个月:
d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', pd.Grouper(freq='2m', key='date')])
.size()
.reset_index(name='count'))
m = df['date'].dt.month
df['date'] = (df['date'].dt.year.astype(str) + '-' +
m.sub(1).astype(str).str.zfill(2) + '_' +
m.astype(str).str.zfill(2))
print (df)
ID date count
0 A 2010-01_02 2
1 A 2010-11_12 1
2 B 2012-01_02 2
3 C 2011-01_02 3
4 C 2014-01_02 1
5 D 2010-07_08 2
6 D 2010-09_10 1
因为Grouper
是动态工作的-使用每个组的第一个日期时间来指定按月映射的组使用:
d1['date'] = pd.to_datetime(d1['date'])
N = 3 # for correct groups possible use 2,3,4,6
df1 = pd.DataFrame({'month':range(1, 13)})
df1.index = df1.index // N
df1['group'] = (df1['month'].astype(str).str.zfill(2)
.groupby(level=0)
.transform(lambda x: x.iat[0] + '_' + x.iat[-1]))
d = df1.set_index('month')['group'].to_dict()
print (d)
{1: '01_03', 2: '01_03', 3: '01_03', 4: '04_06',
5: '04_06', 6: '04_06', 7: '07_09', 8: '07_09',
9: '07_09', 10: '10_12', 11: '10_12', 12: '10_12'}
df = d1.groupby(['ID',
d1['date'].dt.strftime('%Y-').rename('Y'),
d1['date'].dt.month.map(d)]).size().reset_index(name="count")
df['date'] = df.pop('Y') + df['date']
print (df)
ID date count
0 A 2010-01_03 2
1 A 2010-10_12 1
2 B 2012-01_03 2
3 C 2011-01_03 3
4 C 2014-01_03 1
5 D 2010-07_09 3
def solve(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: (x[0], -x[1]))
end_mx = float("-inf")
ans = 0
for start, end in intervals:
if end <= end_mx:
ans += 1
end_mx = max(end_mx, end)
return ans
intervals = [[2, 6],[3, 4],[4, 7],[5, 5]]
print(solve(intervals))