我有两个独立的数据帧,我想对它们做同样的事情。我想根据下划线(a,b,x,y(之前的第一个子字符串对列进行配对,然后如果第一列中的值包含一个单词,但totals列中对应的行为null,我想将total更新为零。我想更新数据帧,然后分别输出它们。
import pandas as pd
import numpy as np
d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
'a_total': [5,12,4,np.nan,0],
'b': ['blue','orange','pink', np.nan, np.nan],
'b_total': [12,6,0,0, np.nan]})
d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
'y_total': [182,32,13, np.nan,8],
'z': ['car','bike','walk', np.nan, np.nan],
'z_total': [12,6,np.nan,np.nan, np.nan]})
然后我想对这两个数据帧做点什么,然后分别输出更新后的版本。我在下面复制的当前代码输出不正确。我正试图输出一组数据帧,但如果我能以某种方式输出两个数据帧(d1和d2(,那也很好。
out = {}
for i, df in enumerate([d1, d2]):
key_id = [*df.loc[:,~df.columns.str.endswith('total')].columns]
totals = [*df.loc[:,df.columns.str.endswith('total')].columns]
for col in key_id:
pairs = df.loc[:, df.columns.str.startswith(col)]
pairs[col+'_total'].loc[(pairs[col].notnull()) & (pairs[col+'_total'].isnull())] = 0
out[i] = pd.concat([pairs], axis=1)
谢谢你看
我不确定我是否完全理解您在输出中的需求,但这可能有效?
import pandas as pd
import numpy as np
d1 = pd.DataFrame(data={'a':['yes', 'no', 'maybe', 'sometimes', np.nan],
'a_total': [5,12,4,np.nan,0],
'b': ['blue','orange','pink', np.nan, np.nan],
'b_total': [12,6,0,0, np.nan]})
d2 = pd.DataFrame(data={'y':['frog', 'snail', 'snake', 'spider', 'pig'],
'y_total': [182,32,13, np.nan,8],
'z': ['car','bike','walk', np.nan, np.nan],
'z_total': [12,6,np.nan,np.nan, np.nan]})
#show d1 before making changes
print(d1)
#make the changes directly to d1 and d2
for i, df in enumerate([d1, d2]):
cols = [c for c in df.columns if not c.endswith('total')]
for col in cols:
tot_col = col+'_total'
df.loc[df[col].notnull() & df[tot_col].isnull(), tot_col] = 0
#show d1 after making changes
print(d1)
更改前d1:
a a_total b b_total
0 yes 5.0 blue 12.0
1 no 12.0 orange 6.0
2 maybe 4.0 pink 0.0
3 sometimes NaN NaN 0.0
4 NaN 0.0 NaN NaN
更改后的d1:
a a_total b b_total
0 yes 5.0 blue 12.0
1 no 12.0 orange 6.0
2 maybe 4.0 pink 0.0
3 sometimes 0.0 NaN 0.0
4 NaN 0.0 NaN NaN