线性回归值错误:输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值



我正在研究一个线性回归模型,我得到了错误:

ValueError:输入包含NaN、无穷大或对于dtype('float64'(来说太大的值

这是我的代码:

### List Column Data Types for df
# Convert "Paid' column to float64 by first changing NaN to 0
Training_Data['Paid'].fillna(0).astype(float)
# Convert 'Sale Price' column to float64 by first changing NaN to 0
#print(df.loc[pd.to_numeric(df['Sale Price'], errors='coerce').isnull()])
#pd.to_numeric(df['Sale Price']).astype(int)
Training_Data["Sale Price"] = Training_Data["Sale 
Price"].astype(str).str.strip().replace("",0).astype(float)
# List Data Types
Training_Data.dtypes

哪个返回:已支付float64销售价格float64数据类型:对象

### List Column Data Types for df2
# Convert "Paid' column to float64 by first changing NaN to 0
Test_Data['Paid'].fillna(0).astype(float)
# Convert 'Sale Price' column to float64 by first changing NaN to 0
#print(df.loc[pd.to_numeric(df['Sale Price'], errors='coerce').isnull()])
#pd.to_numeric(df['Sale Price']).astype(int)
Test_Data["Sale Price"] = Test_Data["Sale 
Price"].astype(str).str.strip().replace("",0).astype(float)
# List Data Types
Test_Data.dtypes

哪个返回:已支付float64销售价格float64数据类型:对象

### Declare and Drop Dependent (Measured) Variable
SourceData_train_independent = Training_Data.drop(['Sale Price'], axis = 1) # 
Drop depedent variable from training dataset
SourceData_train_dependent = Training_Data['Sale Price'].copy() # New dataframe 
with only Dependent variable value for training dataset
SourceData_test_independent = Test_Data.drop(['Sale Price'], axis = 1)
SourceData_test_dependent = Test_Data['Sale Price'].copy()
SourceData_train_independent.dtypes

返回:付费float64数据类型:对象

### Scaling Independent Train and Test Variable
sc_X = StandardScaler()
X_train = sc_X.fit_transform(SourceData_train_independent.values) #scale the 
independent variables
y_train = SourceData_train_dependent # scaling is not required for dependent 
variable
X_test = sc_X.transform(SourceData_test_independent)
y_test = SourceData_test_dependent

最后,当我运行时:

### Feeding Train Data
reg = LinearRegression().fit(X_train, y_train)
print("The Linear regression score on training data is ", 
round(reg.score(X_train, y_train),2))

我明白错误。所以我认为我的文件仍然有NaN值,我认为我已经更正了。有人能帮忙吗?谢谢

尝试这个

def check_nan_inf(df):
for col in df.columns:
if df[col].isnull().any():
print(col, 'has nan')
if np.isinf(df[col]).any():
print(col, 'has inf')

相关内容

最新更新