我有下表,其中缺少一些(但不是全部(用户ID:
用户ID | 项目ID | 项目类型|
---|---|---|
10 | 223 | 问题|
NaN | 126 | 答案//tr>|
14 | 129 | 问题 |
试用numpy.select
:
import numpy as np
conditions = [df["user ID"].isnull() & df["item type"].eq("question"),
df["user ID"].isnull() & df["item type"].eq("answer")]
choices = [df["item ID"].map(dict(zip(question["item ID"],question["user ID"]))),
df["item ID"].map(dict(zip(answer["item ID"],question["user ID"])))]
df["user ID"] = np.select(conditions, choices, df["user ID"])
>>> df
user ID item ID item type
0 10.0 123 question
1 10.0 126 answer
2 14.0 129 question
您可以使用np.where((和merge来获得所需的数据
df['user ID'] = df['user ID'].fillna(0).astype(int)
df_final = pd.merge(left = df, right = answer_df, on = 'item ID', how = 'outer', suffixes = ('', '_right'))
df_final['user ID'] = np.where(df_final['user ID'] == 0, df_final['user ID_right'], df_final['user ID']).astype(int)
df_final[['user ID', 'item ID', 'item type']]