如何从 df 获取数据并将其放置在单元级别的另一个 df - 熊猫



我有 2 个数据帧df_criterias和df_tofill。

df_criterias

     goto_emptycol1     goto_emptycol2    data1     data2
0    some value1        another value1    a         val1
1    some value2        another value2    b         val2
2    some value3        another value3    c         val3
3    some value4        another value4    d         val4
4    some value5        another value5    e         val5
5    some value6        another value6    f         val6
6    some value7        another value7    g         val7

df_tofill

     emptycol1          emptycol2         data1     data2
0                                         f         val6
1                                         nok       nok
2                                         nok       nok
3                                         a         val1
4                                         nok       nok
5                                         g         val7
6                                         d         val4

expected_results

     emptycol1          emptycol2         data1     data2
0    some value6        another value6    f         val6
1                                         nok       nok
2                                         nok       nok
3    some value1        another value1    a         val1
4                                         nok       nok
5    some value7        another value7    g         val7
6    some value4        another value4    d         val4

从这两个列表中,我创建了 2 个带有索引的列表(其中来自两个 dfs 的一些条件,列"data1"、"data2" - 匹配(

list_fill = [0,3,5,6] #from df_tofill
list_crt = [5,0,6,3] #from df_criterias

其中 list_crt[0] 元素 5 与 list_fill[0] 元素 0 匹配。

为了expected_results我正在尝试这样做:

for i, icrt in enumerate(list_crt):
        #Get the value
        val1 = df_criterias.loc[icrt,"goto_emptycol1"]
        val2 = df_criterias.loc[icrt,"goto_emptycol2"]
        #Set the value
        df_tofill.loc[list_fill[i], "emptycol1"] = val1
        df_tofill.loc[list_fill[i], "emptycol2"] = val2

我正在努力获得"expected_results"df。算法正确吗?

更新:设法让它工作 - .at 给了我一些奇怪的错误,我用 .loc 替换了它。 在创建带有索引的列表之前,需要一个 .reset_index((。

索引列表是使用以下方法创建的:

def common_elements(crtlist, radlist):
    #where crtlist is all criterias and radlist all to be checked
    #returns 2 lists with indexes where elements where a match
    crtli_idx = []
    radli_idx = []
    for idx1, crt in enumerate(crtlist):
        for idx2, rad in enumerate(radlist):
            if rad.startswith(crt):
                crtli_idx.append(idx1)
                radli_idx.append(idx2)    
    return crtli_idx, radli_idx

crtlist = ['1', '21', '444']
radlist = ['asda','aererv','1vrvssq','4447676767']
idxcrt, ixdrad = common_elements(crtlist, radlist)
print(idxcrt, ixdrad)
OUT:
[0, 2] [2, 3]

一种方法是对齐索引/列,将''替换为目标数据框中的np.nan,然后通过.loc将一个数据帧分配给另一个数据帧。

df_criterias = df_criterias.rename(columns={'goto_emptycol1': 'emptycol1',
                                            'goto_emptycol2': 'emptycol2'})
                           .set_index(['data1', 'data2'])
df_tofill = df_tofill.replace('', np.nan)
                     .set_index(['data1', 'data2']) 
df_tofill.loc[:] = df_criterias.loc[df_criterias.index.isin(df_tofill.index)]
df_tofill = df_tofill.reset_index()
#   data1 data2   emptycol1      emptycol2
# 0     f  val6  somevalue6  anothervalue6
# 1   nok   nok         NaN            NaN
# 2   nok   nok         NaN            NaN
# 3     a  val1  somevalue1  anothervalue1
# 4   nok   nok         NaN            NaN
# 5     g  val7  somevalue7  anothervalue7
# 6     d  val4  somevalue4  anothervalue4

相关内容

最新更新