如何在python中执行box-cox到单列的转换



我正试图将box-cox转换应用于单个列,但我无法做到这一点。有人能帮我解决这个问题吗?

from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer
california_housing = fetch_california_housing(as_frame=True).frame
california_housing
power = PowerTransformer(method='box-cox', standardize=True)
california_housing['MedHouseVal']=power.fit_transform(california_housing['MedHouseVal'])

函数power.fit_transform要求输入数据在单个特征的情况下具有形状(n, 1)而不是(n,)(其中california_housing['MedHouseVal']的形状为(n,),因为它是pd.Series(。这可以通过重塑来实现,即通过更换

power.fit_transform(california_housing['MedHouseVal'])

带有

power.fit_transform(california_housing['MedHouseVal'].to_numpy().reshape(-1, 1))

或者,可替换地,通过简单地用california_housing[['MedHouseVal']]访问列列表(其给出pd.DataFrame(,而不是用california_housing['MedHouseVal']访问单列(其给出了pd.Series(,也就是说,通过使用

power.fit_transform(california_housing[['MedHouseVal']])

注意

print(california_housing['MedHouseVal'].shape)
print(california_housing[['MedHouseVal']].shape)

打印

(20640,)
(20640, 1)

另一种选择是使用scipy.stats.boxcox:

from sklearn.datasets import fetch_california_housing
from scipy.stats import boxcox
california_housing = fetch_california_housing(as_frame=True).frame
california_housing['MedHouseVal'] = boxcox(california_housing['MedHouseVal'])[0]

相关内容

  • 没有找到相关文章

最新更新