我有 2 个数据帧df_criterias和df_tofill。
df_criterias
goto_emptycol1 goto_emptycol2 data1 data2
0 some value1 another value1 a val1
1 some value2 another value2 b val2
2 some value3 another value3 c val3
3 some value4 another value4 d val4
4 some value5 another value5 e val5
5 some value6 another value6 f val6
6 some value7 another value7 g val7
df_tofill
emptycol1 emptycol2 data1 data2
0 f val6
1 nok nok
2 nok nok
3 a val1
4 nok nok
5 g val7
6 d val4
expected_results
emptycol1 emptycol2 data1 data2
0 some value6 another value6 f val6
1 nok nok
2 nok nok
3 some value1 another value1 a val1
4 nok nok
5 some value7 another value7 g val7
6 some value4 another value4 d val4
从这两个列表中,我创建了 2 个带有索引的列表(其中来自两个 dfs 的一些条件,列"data1"、"data2" - 匹配(
list_fill = [0,3,5,6] #from df_tofill
list_crt = [5,0,6,3] #from df_criterias
其中 list_crt[0] 元素 5 与 list_fill[0] 元素 0 匹配。
为了expected_results我正在尝试这样做:
for i, icrt in enumerate(list_crt):
#Get the value
val1 = df_criterias.loc[icrt,"goto_emptycol1"]
val2 = df_criterias.loc[icrt,"goto_emptycol2"]
#Set the value
df_tofill.loc[list_fill[i], "emptycol1"] = val1
df_tofill.loc[list_fill[i], "emptycol2"] = val2
我正在努力获得"expected_results"df。算法正确吗?
更新:设法让它工作 - .at 给了我一些奇怪的错误,我用 .loc 替换了它。 在创建带有索引的列表之前,需要一个 .reset_index((。
索引列表是使用以下方法创建的:
def common_elements(crtlist, radlist):
#where crtlist is all criterias and radlist all to be checked
#returns 2 lists with indexes where elements where a match
crtli_idx = []
radli_idx = []
for idx1, crt in enumerate(crtlist):
for idx2, rad in enumerate(radlist):
if rad.startswith(crt):
crtli_idx.append(idx1)
radli_idx.append(idx2)
return crtli_idx, radli_idx
crtlist = ['1', '21', '444']
radlist = ['asda','aererv','1vrvssq','4447676767']
idxcrt, ixdrad = common_elements(crtlist, radlist)
print(idxcrt, ixdrad)
OUT:
[0, 2] [2, 3]
一种方法是对齐索引/列,将''
替换为目标数据框中的np.nan
,然后通过.loc
将一个数据帧分配给另一个数据帧。
df_criterias = df_criterias.rename(columns={'goto_emptycol1': 'emptycol1',
'goto_emptycol2': 'emptycol2'})
.set_index(['data1', 'data2'])
df_tofill = df_tofill.replace('', np.nan)
.set_index(['data1', 'data2'])
df_tofill.loc[:] = df_criterias.loc[df_criterias.index.isin(df_tofill.index)]
df_tofill = df_tofill.reset_index()
# data1 data2 emptycol1 emptycol2
# 0 f val6 somevalue6 anothervalue6
# 1 nok nok NaN NaN
# 2 nok nok NaN NaN
# 3 a val1 somevalue1 anothervalue1
# 4 nok nok NaN NaN
# 5 g val7 somevalue7 anothervalue7
# 6 d val4 somevalue4 anothervalue4