使用 if 语句循环访问数据帧中的数组



我正在尝试编写代码,为数据帧中的列循环以下代码:四个不同数组的四次:

median_alcohol = df.alcohol.median()
for i, alcohol in enumerate(df.alcohol):
if alcohol >= median_alcohol:
df.loc[i, 'alcohol'] = 'high'
else:
df.loc[i, 'alcohol'] = 'low'
df.groupby('alcohol').quality.mean()

数据框中的列包括:

alcohol
pH
residual_sugar
citric_acid

我正在尝试想出一种方法来捕获四个不同的数组。任何想法我应该怎么做?

我不确定你到底想做什么,但是,根据我的理解,你可以尝试这样的事情:

import pandas as pd 
from statistics import mean
df = pd.DataFrame({'alcohol':[45, 88, 56, 15, 71], 'pH':[12, 83, 56, 25,71],'residual_sugar':[14, 25, 55, 8, 21]}) 
print(df)
#Output
>>> alcohol  pH  residual_sugar
0      45    12   14
1      88    83   25
2      56    56   55
3      15    25    8
4      71    71   21
def func(colum):
dftemp=df.copy()
median_colum = eval('df.'+colum).median()
for i, item in enumerate(eval('df.'+colum)):
dftemp.loc[i, colum] = 'high' if item >= median_colum else 'low'
return dftemp.groupby(colum).agg(list).applymap(mean)

diferrentarrays = [func(i) for i in df.columns]
for array in diferrentarrays:
print(array) 

输出:

pH  residual_sugar
alcohol                      
high     70.0       33.666667
low      18.5       11.000000 

alcohol  residual_sugar
pH                             
high  71.666667       33.666667
low   30.000000       11.000000 

alcohol    pH
residual_sugar                 
high            71.666667  70.0
low             30.000000  18.5 
def numeric_to_buckets(df, column_name):
median = df[column_name].median()
for i, val in enumerate(df[column_name]):
if val >= median:
df.loc[i, column_name] = 'high'
else:
df.loc[i, column_name] = 'low' 
for feature in df.columns[:-1]:
numeric_to_buckets(df, feature)
print(df.groupby(feature).quality.mean(), 'n')

最新更新