如何在过滤熊猫数据框列时使用 .le() 和 .ge()



下面是一个示例 pandas DataFrame:

import pandas as pd
import numpy as np
data = {"first_column": ["item1", "item2", "item3", "item4", "item5", "item6", "item7"],
        "second_column": ["cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"],
        "third_column": [5, 1, 8, 3, 731, 189, 9]}
df = pd.DataFrame(data)
df
     first_column second_column  third_column
0        item1          cat1             5
1        item2          cat1             1
2        item3          cat1             8
3        item4          cat2             3
4        item5          cat2           731
5        item6          cat2           189
6        item7          cat2             9

我想根据 10 =

如果我大于或等于 10,则为:

df['greater_than_ten'] = df.third_column.ge(10).astype(np.uint8)

如果我做的少于 1000,那就是:

df['less_than_1K'] = df.third_column.le(1000).astype(np.uint8)

但我不能同时执行这些操作,即

df['both'] = df.third_column.le(1000).ge(10).astype(np.uint8)

我也无法按顺序尝试这些操作。

如何同时使用.ge().le()

您可以将between()用于您感兴趣的系列。

df['both'] = df.third_column.between(10, 1000).astype(np.uint8)

屈服

>>> df
  first_column second_column  third_column  both
0        item1          cat1             5     0
1        item2          cat1             1     0
2        item3          cat1             8     0
3        item4          cat2             3     0
4        item5          cat2           731     1
5        item6          cat2           189     1
6        item7          cat2             9     0

使用 & 来复合条件:

In [28]:
df['both'] = df['third_column'].ge(10) & df['third_column'].le(1000)
df
Out[28]:
  first_column second_column  third_column   both
0        item1          cat1             5  False
1        item2          cat1             1  False
2        item3          cat1             8  False
3        item4          cat2             3  False
4        item5          cat2           731   True
5        item6          cat2           189   True
6        item7          cat2             9  False
In [11]: df['both'] = df.eval("10 <= third_column <= 1000").astype(np.uint8)
In [12]: df
Out[12]:
  first_column second_column  third_column  both
0        item1          cat1             5     0
1        item2          cat1             1     0
2        item3          cat1             8     0
3        item4          cat2             3     0
4        item5          cat2           731     1
5        item6          cat2           189     1
6        item7          cat2             9     0

更新:

In [13]: df.eval("second_column in ['cat2'] and 10 <= third_column <= 1000").astype(np.uint8)
Out[13]:
0    0
1    0
2    0
3    0
4    1
5    1
6    0
dtype: uint8

相关内容