如何以固定的2个月间隔计算每日病例数?



我想统计每天固定2个月反转的病例数(例如,1 - 2月,3 - 4月,5 - 6月,7 - 8月等)。例如,

import pandas as pd
d1 = pd.DataFrame({'ID': ["A", "A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "D"],
"date": ["2010-12-30", "2010-02-27", "2010-02-26", "2012-01-01", "2012-01-03",
"2011-01-01", "2011-01-02", "2011-01-08", "2014-02-21", "2010-08-31", "2010-08-30", "2010-09-01"]})

,我想产生的结果如下:

ID        date  count
0  A  2010-01_02      2
1  A  2010-11_12      1
2  B  2012-01_02      2
3  C  2011-01_02      3
4  C  2014-01_02      1
5  D  2010-07_08      2
6  D  2010_09_10      1

你有什么好主意吗?计算每月的病例数很简单,但这个问题对我来说很难。提前感谢!

按频率使用Grouper2个月:

d1['date'] = pd.to_datetime(d1['date'])
df = (d1.groupby(['ID', pd.Grouper(freq='2m', key='date')])
.size()
.reset_index(name='count'))
m = df['date'].dt.month
df['date'] = (df['date'].dt.year.astype(str) + '-' +
m.sub(1).astype(str).str.zfill(2) + '_' + 
m.astype(str).str.zfill(2))
print (df)
ID        date  count
0  A  2010-01_02      2
1  A  2010-11_12      1
2  B  2012-01_02      2
3  C  2011-01_02      3
4  C  2014-01_02      1
5  D  2010-07_08      2
6  D  2010-09_10      1

因为Grouper是动态工作的-使用每个组的第一个日期时间来指定按月映射的组使用:

d1['date'] = pd.to_datetime(d1['date'])
N = 3 # for correct groups possible use 2,3,4,6
df1 = pd.DataFrame({'month':range(1, 13)})
df1.index = df1.index // N
df1['group'] = (df1['month'].astype(str).str.zfill(2)
.groupby(level=0)
.transform(lambda x: x.iat[0] + '_' + x.iat[-1]))
d = df1.set_index('month')['group'].to_dict()
print (d)
{1: '01_03', 2: '01_03', 3: '01_03', 4: '04_06',
5: '04_06', 6: '04_06', 7: '07_09', 8: '07_09',
9: '07_09', 10: '10_12', 11: '10_12', 12: '10_12'}
df = d1.groupby(['ID', 
d1['date'].dt.strftime('%Y-').rename('Y'), 
d1['date'].dt.month.map(d)]).size().reset_index(name="count")
df['date'] = df.pop('Y') + df['date']
print (df)
ID        date  count
0  A  2010-01_03      2
1  A  2010-10_12      1
2  B  2012-01_03      2
3  C  2011-01_03      3
4  C  2014-01_03      1
5  D  2010-07_09      3
def solve(intervals):
if not intervals:
return 0
intervals.sort(key=lambda x: (x[0], -x[1]))
end_mx = float("-inf")
ans = 0
for start, end in intervals:
if end <= end_mx:
ans += 1
end_mx = max(end_mx, end)
return ans
intervals = [[2, 6],[3, 4],[4, 7],[5, 5]]
print(solve(intervals))

最新更新