我正在尝试编写代码,为数据帧中的列循环以下代码:四个不同数组的四次:
median_alcohol = df.alcohol.median()
for i, alcohol in enumerate(df.alcohol):
if alcohol >= median_alcohol:
df.loc[i, 'alcohol'] = 'high'
else:
df.loc[i, 'alcohol'] = 'low'
df.groupby('alcohol').quality.mean()
数据框中的列包括:
alcohol
pH
residual_sugar
citric_acid
我正在尝试想出一种方法来捕获四个不同的数组。任何想法我应该怎么做?
我不确定你到底想做什么,但是,根据我的理解,你可以尝试这样的事情:
import pandas as pd
from statistics import mean
df = pd.DataFrame({'alcohol':[45, 88, 56, 15, 71], 'pH':[12, 83, 56, 25,71],'residual_sugar':[14, 25, 55, 8, 21]})
print(df)
#Output
>>> alcohol pH residual_sugar
0 45 12 14
1 88 83 25
2 56 56 55
3 15 25 8
4 71 71 21
def func(colum):
dftemp=df.copy()
median_colum = eval('df.'+colum).median()
for i, item in enumerate(eval('df.'+colum)):
dftemp.loc[i, colum] = 'high' if item >= median_colum else 'low'
return dftemp.groupby(colum).agg(list).applymap(mean)
diferrentarrays = [func(i) for i in df.columns]
for array in diferrentarrays:
print(array)
输出:
pH residual_sugar
alcohol
high 70.0 33.666667
low 18.5 11.000000
alcohol residual_sugar
pH
high 71.666667 33.666667
low 30.000000 11.000000
alcohol pH
residual_sugar
high 71.666667 70.0
low 30.000000 18.5
def numeric_to_buckets(df, column_name):
median = df[column_name].median()
for i, val in enumerate(df[column_name]):
if val >= median:
df.loc[i, column_name] = 'high'
else:
df.loc[i, column_name] = 'low'
for feature in df.columns[:-1]:
numeric_to_buckets(df, feature)
print(df.groupby(feature).quality.mean(), 'n')