TypeError:float() 参数必须是字符串或数字,而不是'function' – Python/Sklearn



我有一个名为Flights.py

的程序中的代码段
...
#Load the Dataset
df = dataset
df.isnull().any()
df = df.fillna(lambda x: x.median())
# Define X and Y
X = df.iloc[:, 2:124].values
y = df.iloc[:, 136].values
X_tolist = X.tolist()
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

第二行的第二行是抛出以下错误:

Traceback (most recent call last):
  File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
    X_train = sc.fit_transform(X_train)
  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
    return self.partial_fit(X, y)
  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)
  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
TypeError: float() argument must be a string or a number, not 'function'

我的dataframe df大小(22587,138)

我正在研究以下问题以获取灵感:

typeError:float()参数必须是字符串或数字,而不是地理编码器中的"方法"

我尝试了以下调整:

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train.as_matrix)
X_test = sc.transform(X_test.as_matrix)

导致以下错误:

AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'

我目前无法通过数据框架扫描并查找/转换有问题的条目。

正如此答案所解释的那样,fillna并非旨在使用回调。如果您通过一个,它将被视为字面填充值,这意味着您的NaN S将被lambdas替换:

df
      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0
df4.fillna(lambda x: x.median())
                                    col1  col2  
row1                                  65    24   
row2                                  33    48   
row3  <function <lambda> at 0x10bc47730>    34   
row4                                  24    12   
                                    col3                                col4  
row1                                  47  <function <lambda> at 0x10bc47730>  
row2  <function <lambda> at 0x10bc47730>                                  89  
row3                                  67  <function <lambda> at 0x10bc47730>  
row4                                  52                                  17 

如果您要通过中位数填充,则解决方案是根据列创建中位数的数据框,然后将其传递给fillna

df
      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0
df.fillna(df.median())
df 
      col1  col2  col3  col4
row1  65.0    24  47.0  53.0
row2  33.0    48  52.0  89.0
row3  33.0    34  67.0  53.0
row4  24.0    12  52.0  17.0
df = df.fillna(lambda x: x.median())

这实际上不是使用fillna的有效方法。它期望这里的字面价值或从列到文字值的映射。它不会应用您提供的功能;相反,NA单元的值将简单地设置为函数本身。这是您的估计器试图变成浮点的功能。

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.fillna.html

我使用df = df.fillna(lambda x: x.median())遇到了相同的麻烦这是我的解决方案,可以获取真实值而不是"函数"到数据框中:

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np

i创建数据帧10行,3个带有NAN的Colunm

df = pd.DataFrame(np.random.randint(100,size=(10,3)))
df.iloc[3:5,0] = np.nan
df.iloc[4:6,1] = np.nan
df.iloc[5:8,2] = np.nan

属性愚蠢的列标签以后为方便起来

df.columns=['Number_of_Holy_Hand_Grenades_of_Antioch', 'Number_of_knight_fleeings', 'Number_of_rabbits_of_Caerbannog']
print df.isnull().any()  # tell if nan per column

对于每列的标签,我们通过在列本身上计算的中位数填充所有NAN值。可以与平均()等一起使用

for i in df.columns:     #df.columns[w:] if you have w column of line description 
    df[i] = df[i].fillna(df[i].median() )
print df.isnull().any()

现在DF包含中位数

代替的NAN
print df

您可以做例如

X = df.ix[:,:].values
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

df = df.fillna(lambda x: x.median())不起作用现在,我们可以将DF使用到向前的方法中,因为所有值都是真实值,而不是函数。与使用lambda在dataFrame.fillna()(例如)的方法相反,所有建议使用填充的建议合并到lambda

相关内容

  • 没有找到相关文章

最新更新