另一个熊猫设置与复制警告的问题

是的，这个问题已经被问了很多次了！不，我仍然无法弄清楚如何在不生成 Pandas SettingWithCopyWarning 警告的情况下运行此布尔过滤器。

for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]
    df_D['count'].iloc[x] = len(df_C) # triggers warning

我试过：

在各个可能的位置复制df_A和df_B
使用蒙版
使用查询

我知道我可以抑制警告，但我不想那样做。

我错过了什么？我知道这可能是显而易见的事情。

非常感谢！

有关为什么获得SettingWithCopyWarning的更多详细信息，我建议您阅读此答案。这主要是因为选择列df_D['count']然后使用iloc[x]执行以这种方式标记的"链式分配"。

为了防止这种情况，您可以获取所需的列在df_D中的位置，然后对循环for中的行和列都使用 iloc：

pos_col_D = df_D.columns.get_loc['count']
for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]
    df_D.iloc[x,pos_col_D ] = len(df_C) #no more warning

另外，因为你比较了df_A.age的所有值和df_B.age_limits的边界，我认为你可以使用numpy.ufunc.outer来提高代码的速度，ufunc greater_equal和less_egal，然后在轴=0上sum。

#Setup
import numpy as np
import pandas as pd
df_A = pd.DataFrame({'age': [12,25,32]})
df_B = pd.DataFrame({'age_limits':[[3,99], [20,45], [15,30]]})
#your result
for x in range(len(df_A)):
    df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
                    (df_A['age'] <= df_B['age_limits'].iloc[x][1])]
    print (len(df_C))
3
2
1
#with numpy
print ( ( np.greater_equal.outer(df_A.age, df_B.age_limits.str[0])
         & np.less_equal.outer(df_A.age, df_B.age_limits.str[1]))
        .sum(0) )
array([3, 2, 1])

因此，您可以直接在df_D['count']中分配前一行代码，而无需循环for。希望这对你有用

相关内容

最新更新

热门标签：