后缀相似的多列计算



示例数据帧如下所示:

d = {'id': [1, 2],'year': [1999, 2020],'cat_up': [0, 15], 'cat_down': [2, 60],'dog_up': [1, 11], 'dog_down': [2, 12],'fish_up': [12, 11], 'fish_down': [22, 12]}

df = pd.DataFrame(data=d)

id  year  cat_up  cat_down  dog_up  dog_down  fish_up  fish_down
0   1  1999       0         2       1         2       12         22
1   2  2020      15        60      11        12       11         12

真正的数据帧还有大约60多列,都有_up或_down后缀。

我想通过划分每个_up/_down来添加新的_res列,例如:

df['cat_res'] = df['cat_up']/df['cat_down']
df['dog_res'] = df['dog_up']/df['dog_down']
...

我试着使用for循环,但下面的一个是不对的:

up_cols = [col for col in df.columns if '_up' in col]
down_cols  = [col for col in df.columns if '_down' in col]

for up, down in enumerate(up_cols, down_cols):
df[column + '_res'] =  up_cols/ down_cols

有人能告诉我如何计算和迭代列吗?

如果你想小心你的列,因为你的数据帧可能有带有_up后缀的列,而与任何其他带有_down后缀的列无关,你可以使用集合。

up_cols = [col[:-3] for col in df.columns if col.endswith("_up")]
down_cols = [col[:-5] for col in df.columns if col.endswith("_down")]
# Using sets intersection to address the case when there is a X_up but no X_down
for col in set(up_cols)&set(down_cols): 
df[col+"_res"] = df[col+"_up"]/df[col+"_down"]
for up, down in zip(up_cols, down_cols):
df[up.replace("_up", "_res")] = df[up]/df[down]

最新更新