将CSV索引成逻辑回归的样本数量不一致



我当前在索引一个CSV,其值以下并陷入错误:

valueerror:发现的输入变量,数量不一致 样本:[1,514]

它将其作为1行,使用514列,强调我称之为特定的参数错误,或者是由于我删除了NAN(大多数数据默认为?(

"Classification","DGMLEN","IPLEN","TTL","IP"
"1","0.000000","192.168.1.5","185.60.216.35","TLSv1.2"
"2","0.000160","192.168.1.5","185.60.216.35","TCP"
"3","0.000161","192.168.1.5","185.60.216.35","TLSv1.2"

import pandas  
df = pandas.read_csv('wcdemo.csv', header=0,
                  names = ["Classification", "DGMLEN", "IPLEN", "TTL", "IP"], 
                  na_values='.')
df = df.apply(pandas.to_numeric, errors='coerce')
#Data=pd.read_csv ('wcdemo.csv').reset_index()#index_col='false')
feature_cols=['Classification','DGMLEN','IPLEN','IP']
X=df[feature_cols]

    #datanewframe = pandas.Series(['Classification', 'DGMLEN', 'IPLEN', 'TTL', 'IP'], dtype='object')
#df = pandas.read_csv('wcdemo.csv')
#indexed_df = df.set_index(['Classification', 'DGMLEN','IPLEN','TTL','IP']

df['IPLEN'] = pandas.to_numeric(df['IPLEN'], errors='coerce').fillna(0)
df['TTL'] = pandas.to_numeric(df['TTL'], errors='coerce').fillna(0)
#DEFINE X TRAIN
X_train = df['IPLEN']
y_train = df['TTL']
#s = pandas.Series(['Classification', 'DGMLEN', 'IPLEN', 'TTL', 'IP'])
Y=df['TTL'] 
from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
logreg.fit(X_train,y_train,).fillna(0.0)
#with the error being triggered here 
logreg.fit(X_train,y_train,).fillna(0.0)

,由于x_train中只有1个功能,因此其当前形状为 (n_samples,)。但是Scikit估计器需要X为形状(n_samples, n_features)。因此您需要重塑数据。

使用此:

logreg.fit(X_train.reshape(-1,1), y_train).fillna(0.0)

最新更新