对数据帧列中的所有唯一值运行spicy.stats方差分析测试



我有一个数据框架,由许多城市及其相应的温度组成:

CurrentThermostatTemp
City                                
Cradley Heath                   20.0
Cradley Heath                   20.0
Cradley Heath                   18.0
Cradley Heath                   15.0
Cradley Heath                   19.0
...                              ...
Walsall                         16.0
Walsall                         22.0
Walsall                         20.0
Walsall                         20.0
Walsall                         20.0
[6249 rows x 1 columns]

唯一值为:

Index(['Cradley Heath', 'ROWLEY REGIS', 'Smethwick', 'Oldbury',
'West Bromwich', 'Bradford', 'Bournemouth', 'Poole', 'Wareham',
'Wimborne',
...
'St. Helens', 'Altrincham', 'Runcorn', 'Widnes', 'St Helens',
'Wakefield', 'Castleford', 'Pontefract', 'Walsall', 'Wednesbury'],
dtype='object', name='City', length=137)

我的目标是做单向方差分析测试,即

from scipy.stats import f_oneway

用于数据帧中的所有唯一值。也是如此

SciPy.stats.f_oneway("all unique values")

并接收输出:所有变量的单向方差分析测试给出p值为{}的{}这是我尝试过很多次但都不起作用的方法:

all = Tempvs.index.unique()
Tempvs.sort_index(inplace=True)
for n in range(len(all)):
truncated = Tempvs.truncate(all[n], all[n])
print(f_oneway(truncated))

IIUC您想要一个ANOVA测试,其中每个样本包含唯一元素City的值Temp。如果是这种情况,你可以做

import numpy as np
import pandas as pd
import scipy.stats as sps
# I create a sample dataset
index = ['Cradley Heath', 'ROWLEY REGIS',
'Smethwick', 'Oldbury',
'West Bromwich', 'Bradford', 
'Bournemouth', 'Poole', 'Wareham',
'Wimborne','St. Helens', 'Altrincham', 
'Runcorn', 'Widnes', 'St Helens',
'Wakefield', 'Castleford', 'Pontefract', 
'Walsall', 'Wednesbury']
np.random.seed(1)
df = pd.DataFrame({
'City': np.random.choice(index, 500),
'Temp': np.random.uniform(15, 25, 500)
})
# populate a list with all
# values of unique Cities
values = []
for city in df.City.unique():
_df = df[df.City==city]
values.append(_df.Temp.values)
# compute the ANOVA
# with starred *list
# as arguments
sps.f_oneway(*values)

在这种情况下,将给出

F_onewayResult(statistic=0.4513685152123563, pvalue=0.9788508507035195)

PS:不要将all用作变量,因为它是一个内置的python函数,请参阅https://docs.python.org/3/library/functions.html#all

最新更新