使用日期/时间戳列跟踪索赔,并使用pandas创建最终计数



我有一个问题,我需要根据这些状态的日期跟踪患者保险索赔状态的进展。我还需要根据某些条件创建状态计数。

p>
<表类="年代桌子">ClaimID新接受否认在到期组tbody><0012021 - 01 - 01 t09:58:35:335z2021 - 01 - 01 t10:05:43:000z0022021 - 01 - 01 t06:30:30:000z2021 - 03 - 01 t04:11:45:000z2021 - 03 - 01 t04:11:53:000z0032021 - 02 - 14 t14:23:54:154z2021 - 02 - 15 t11:11:56:000z2021 - 02 - 15 t11:15:00:000z0042021 - 02 - 14 t15:36:05:335z2021 - 02 - 14 t17:15:30:000z0052021 - 02 - 14 t15:56:59:009z2021 - 03 - 01 t10:05:43:000z

首先将日期列转换为

for i in ['New', 'Accepted', 'Denied', 'Pending', 'Expired']:
df[i] = pd.to_datetime(df[i], format="%Y-%m-%dT%H:%M:%S:%f%z")

然后根据您的列条件开发适用的日期范围。在此逻辑中,如果Denied存在,则范围为new——>拒绝,或者如果接受,新的——>已接受,如果未接受,则重新接受——>现在的代码是(alter as per rules):

df['new_range'] = df[['New','Accepted','Denied']].apply (lambda x: pd.date_range(x['New'],x['Denied']).date.tolist() if 
pd.notnull(x['Denied']) else 
pd.date_range(x['New'],x['Accepted']).date.tolist() if 
pd.notnull(x['Accepted']) else
pd.date_range(x['New'],datetime.today()).date.tolist()
,axis=1)

你应该能够对一个组进行筛选,并在df中看到日期范围,如:

df[df['Group']=='A']['new_range']
0                                         [2021-01-01]
1    [2021-01-01, 2021-01-02, 2021-01-03, 2021-01-0...
2                                         [2021-02-14]
3                                         [2021-02-14]
4    [2021-02-14, 2021-02-15, 2021-02-16, 2021-02-1..

然后你可以扩展日期范围和计数分组,以获得每天的新计数,代码如下:

new = pd.to_datetime(df[df['Group']=='A']['new_range'].explode('Date')).reset_index()

newc = new.groupby('new_range').count()
newc
new_range
2021-01-01    2
2021-01-02    1
2021-01-03    1
2021-01-04    1
2021-01-05    1
2021-01-06    1...

同样地,获取被接受、被拒绝的计数,然后在日期上左联接以到达最终表,将na填入0。

通过创建规则来扩展日期范围,然后扩展日期范围和分组来获得计数,您应该能够避免许多昂贵的操作。

我认为这是你想要的,或者可以很容易地修改你的需要:

import pandas as pd
import numpy as np
from datetime import timedelta
from datetime import date
def dateRange(d1,d2):
return [d1 + timedelta(days=x) for x in range((d2-d1).days)]

def addCount(dic,group,dat,cat):
if group not in dic:
dic[group]={}
if dat not in dic[group]:
dic[group][dat]={}
if cat not in dic[group][dat]:
dic[group][dat][cat]=0
dic[group][dat][cat]+=1

df =pd.read_csv("testdf.csv",
parse_dates=["New","Accepted","Denied","Pending", "Expired"])#,
cdic={}
for i,row in df.iterrows():
cid=row["ClaimID"]
dnew=row["New"].date()
dacc=row["Accepted"].date()
dden=row["Denied"].date()
dpen=row["Pending"].date()
dexp=row["Expired"].date()
group=row["Group"]

if not pd.isna(dacc): #Claim has been accepted
if(dnew == dacc):
dacc+=timedelta(days=1)
nend=dacc
addCount(cdic,group,dacc,"acc")
if not pd.isna(dden): # Claim has been denied
if(dnew == dden):
dden+=timedelta(days=1)
if pd.isna(dacc):
nend=dden
addCount(cdic,group,dden,"den")
if not pd.isna(dpen):
addCount(cdic,group,dpen,"pen") # Claim is pending
if not pd.isna(dexp):
addCount(cdic,group,dexp,"exp") # Claim is expired
if pd.isna(dacc) and pd.isna(dden):
nend=date.today()+timedelta(days+1)
for d in dateRange(dnew,nend):  # Fill new status until first change
addCount(cdic,group,d,"new")
ndfl=[]            
for group in cdic:
for dat in sorted(cdic[group].keys()):
r=cdic[group][dat]
ndfl.append([group,dat,r.get("new",0),r.get("acc",0),
r.get("den",0),r.get("pen",0),r.get("exp",0)])
ndf=pd.DataFrame(ndfl,columns=["Group", "Date","New","Accepted","Denied","Pending","Expired"])

相关内容

  • 没有找到相关文章