按min组组,并从另一列第2部分中填充NAS



这是原始问题:按MIN分组,并从另一列中填充NAS

我有此数据框:

mydf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename':
['home', 'blah', 
'blah', 'home', 'blah', 'blah','blah','home','blah','blah'], 'startpage': 
[np.nan, np.nan, np.nan, 'home', 
'home', 'blah',np.nan,np.nan,np.nan,np.nan], 'date_time': 
[0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10]})

我想获得此数据框:

endingdf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename':
['home', 'blah', 'blah', 'home', 'blah','blah','blah','home','blah','blah'], 
'startpage': [np.nan, np.nan, np.nan, 'home', 
'home','blah',np.nan,np.nan,np.nan,np.nan],
'date_time': [0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10],
'new_start_page':['home', 'home', 'home', 'home', 'home', 'blah', 'home', 
'home', 'home', 'home']})

我想做的是由UID组组,如果startpageNULL,则使用访问的第一个pagename(最小date_time(,但仅在page_event = 0时使用。因此,如果第一个pagename具有page_event = 10,则跳过直至page_event = 0

e = mydf.page_event
p = mydf.pagename
s = mydf.startpage
u = mydf.uid
m = e.mask(e == 10).groupby(u).apply(pd.Series.first_valid_index)
s.fillna(u.map(m).map(p), inplace=True)
print(mydf)
   date_time  page_event pagename startpage  uid
0          0           0     home      home    1
1          1           0     blah      home    1
2          2           0     blah      home    1
3          5           0     home      home    2
4          9           0     blah      home    2
5          1           0     blah      blah    3
6          1          10     blah      home    4
7          2           0     home      home    4
8          3           0     blah      home    4
9          4          10     blah      home    4

最新更新