如何根据给定的分布对数据帧进行采样,其中有限的类会削弱其他类



给定类的分布和这些类的示例行的数据帧,是否有一种简单/快速的方法从数据帧中采样与给定分布匹配的分布,其中没有足够示例的类会减少其他类中的示例数:

例如

+------+-------+-------+
| col1 | col2 | class |
+------+-------+-------+
| 4    | 45    | A     |
+------+-------+-------+
| 5    | 66    | B     |
+------+-------+-------+
| 5    | 6     | C     |
+------+-------+-------+
| 4    | 6     | A     |
+------+-------+-------+
| 321  | 1     | A     |
+------+-------+-------+
| 32   | 432   | A     |
+------+-------+-------+
| 5    | 3     | B     |
+------+-------+-------+
given a dataframe like above and the distribution like below:
+-------+--------------+
| class | proportion   |
+-------+--------------+
| A     | 0.50         |
+-------+--------------+
| B     | 0.25         |
+-------+--------------+
| C     | 0.25         |
+-------+--------------+
I would like to return something like:
+------+-------+-------+
| col1 | col2 | class |
+------+-------+-------+
| 5    | 66    | B     |
+------+-------+-------+
| 5    | 6     | C     |
+------+-------+-------+
| 4    | 6     | A     |
+------+-------+-------+
| 32   | 432   | A     |
+------+-------+-------+

df.sample支持称重实体:

s = pd.Series({'A': 0.5, 'B': 0.25, 'C': 0.25})
df.sample(n, weights=df['class'].map(s/df['class'].value_counts()))

要获得有关该主题的更多信息,请搜索";标签移位";

相关内容

最新更新