下面是我的数据框架。
df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
'c1':[35,'NA','NA','NA',4,'NA','NA','NA',56,406],
'c2':[35,'NA','NA','NA',43,'NA','NA','NA',67,77],
'c3':[36,'NA','NA','NA',104,'NA','NA','NA',44,306]})
我想相对于列"vin"拖动整数值"c1"、"c2"one_answers"c3"。
以下是预期输出:-
df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
'c1':[35,35,35,4,4,4,'NA',56,56,406],
'c2':[35,35,35,43,43,43,'NA',67,67,77],
'c3':[36,36,36,104,104,104,'NA',44,44,306]})
注意:"vin"列"ddd"的值将保持为"NA",因为没有可用于"ddd’的整数值。
我尝试了ffinill((或groupbyffinill(,但没有得到预期的输出。下面是我试过的代码行。
#(1)
df['c1'] = df.groupby(['vin'])['c1'].ffill()
#(2)
df[["c1","c2" "c3"]] = df[["c1", "c2","c3"]].fillna(df.groupby(['vin'])[["c1", "c2","c3"]].ffill())
#(3)
df["c1"] = df.groupby('vin')['c1'].transform(lambda x: x.ffill())
如果ffill
第一个排序簇的每个组有一个非缺失值:
c = ["c1","c2", "c3"]
df[c] = df.replace('NA',np.nan).sort_values(['vin', 'c1']).groupby('vin')[c].ffill()
print (df)
vin c1 c2 c3
0 aaa 35.0 35.0 36.0
1 aaa 35.0 35.0 36.0
2 aaa 35.0 35.0 36.0
3 bbb 4.0 43.0 104.0
4 bbb 4.0 43.0 104.0
5 bbb 4.0 43.0 104.0
6 ddd NaN NaN NaN
7 eee 56.0 67.0 44.0
8 eee 56.0 67.0 44.0
9 fff 406.0 77.0 306.0
或聚合GroupBy.transform
:中新列的第一个非缺失值
c = ["c1","c2", "c3"]
df[c] = df.replace('NA',np.nan).groupby('vin')[c].transform('first')
print (df)
vin c1 c2 c3
0 aaa 35.0 35.0 36.0
1 aaa 35.0 35.0 36.0
2 aaa 35.0 35.0 36.0
3 bbb 4.0 43.0 104.0
4 bbb 4.0 43.0 104.0
5 bbb 4.0 43.0 104.0
6 ddd NaN NaN NaN
7 eee 56.0 67.0 44.0
8 eee 56.0 67.0 44.0
9 fff 406.0 77.0 306.0