如何用python pandas分组和计算新字段?

我想在名为" Fruit "的数据框架中按特定列分组，并计算该特定水果的"好"百分比

见下面的初始数据框架

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple','Apple','Banana'], 'Condition': ['Good','Bad','Good']})

Dataframe

Fruit   Condition
0   Apple   Good
1   Apple   Bad
2   Banana  Good

见下面我想要的输出数据帧

Fruit   Percentage
0   Apple   50%
1   Banana  100%

注意:因为有1个"Good"苹果和1 &;bad &;苹果，好苹果的比例是50%。

请参阅下面的尝试，它覆盖了所有列

groupedDF = df.groupby('Fruit')
groupedDF.apply(lambda x: x[(x['Condition'] == 'Good')].count()/x.count())

见下面的结果表，它似乎在现有列而不是新列中计算百分比:

Fruit Condition
Fruit       
Apple   0.5 0.5
Banana  1.0 1.0

我们可以将Condition与eq进行比较，利用True作为数字处理时为(1)，False为(0)的事实，将groupby mean与Fruits进行比较:

new_df = (
df['Condition'].eq('Good').groupby(df['Fruit']).mean().reset_index()
)

new_df:

Fruit  Condition
0   Apple        0.5
1  Banana        1.0

我们可以进一步将map和rename转换为格式字符串，以得到所示的输出:

new_df = (
df['Condition'].eq('Good')
.groupby(df['Fruit']).mean()
.map('{:.0%}'.format)  # Change to Percent Format
.rename('Percentage')  # Rename Column to Percentage
.reset_index()  # Restore RangeIndex and make Fruit a Column
)

new_df:

Fruit Percentage
0   Apple        50%
1  Banana       100%

*当然还可以做进一步的操作。

相关内容

最新更新

热门标签：