面对python中LDA接口的问题



我想用Python的LDA算法中的运行时值对数据进行分类。下面是我试过的代码,但模型。fit(X, y)给出错误:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

代码:

from sklearn.model_selection import train_test_split
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.read_excel (r'C:UsersAdil ArshadDesktopdataacquistion pyadc.xlsx')
#print (df)
last_column = df.iloc[: , -1:]
print("Last Column Of Dataframe : ")
print(last_column)
new_last = str(last_column)
print(type(new_last))
df.columns = ['TL', 'CL', 'TR', 'CR', 'new_last']
#view first six rows of DataFrame
df.head()
print(df)
X = df[['TL', 'CL', 'TR', 'CR', 'new_last']]
y = df['new_last']
#Fit the LDA model
model = LinearDiscriminantAnalysis()
model.fit(X, y)

您需要清理您的数据集并检查它是否没有nan值。始终执行EDA检查是否有丢失的值,不正确的条目,对象而不是字符串值等…

使用这个函数返回清理后的df:

import pandas as pd
import numpy as np
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
df.reset_index()
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

return df[indices_to_keep].astype(np.float64)
clean_df = clean_dataset(df)

最新更新