熊猫排除总计如果有详细信息

的DF是这样的：

df = pd.DataFrame({'Art': [210, 211, 212, 310, 420, 421], 'Sum': [300, 120, 180, 250, 650, 650]})

在表视图中

Art  Sum
0  210  300  # this is total
1  211  120  # children for index 0
2  212  180  # children for index 0
3  310  250  # !!! this is Not total
4  420  650  # this is total
5  421  650  # children for index 4

总行是Art以0结尾但没有以相同两位数开头的子项的行。

艺术210有孩子： 21 1，212

艺术310没有孩子没有以31开头的行

问题：需要删除总计行。

结果需要：

Art  Sum
1  211  120
2  212  180
3  310  250  # !! this is Not total
5  421  650

怎么办？

您可以根据前两位数字为Art列编制索引并相应地进行筛选：

buckets = (df['Art'] // 10).value_counts()
df = df.loc[(df['Art'] // 10).isin(buckets.loc[buckets == 1].index) |
(df['Art'] % 10 != 0)]

哪些输出：

Art  Sum
1  211  120
2  212  180
3  310  250
5  421  650

这也适用于：

>>> df[~(df.Art.astype(str).str.endswith("0") & df.Art.astype(str).str[:2].duplicated(keep=False))]
Art  Sum
1  211  120
2  212  180
3  310  250
5  421  650
>>>

解释：

A=df.Art.astype(str).str.endswith("0")：检查哪些值以 0 结尾
B=df.Art.astype(str).str[:2].duplicated(keep=False)：检查哪些值有两个重复的数字。
C = 否定 A&B
使用 C 作为掩码筛选数据帧。

相关内容

最新更新

热门标签：