我需要如何准备Linux路径/DIR进行训练Scikit。目前,这是如此之遥
db = get_db()
df = pd.read_sql_query(
'SELECT Request,Used,Count FROM history where User = "john"', db)
print(df.head)
X = df[['Request','Count']]
y = df['Used']
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = LinearRegression().fit(X_train, y_train)
print("LinearRegression Training set score: {:.2f}".format(model.score(X_train, y_train)))
print("LinearRegression Test set score: {:.2f}".format(model.score(X_test, y_test)))
#Output of print(df.head)
Request Used Count
0 5400 3088 20
1 6400 3500 20
现在我想更改SQL查询并添加到结果的路径。
df = pd.read_sql_query(
'SELECT Request,Used,Count,Path FROM history where User = "john"', db)
df['Path'] = df['Path'].str.split("/")
print(df.head)
X = df[['Request','Count','Path']]
y = df['Used']
X_train, X_test, y_train, y_test = train_test_split(X, y)
...
#Output of print(df.head)
#Output of print(df.head)
Request Used Count Path
0 5400 3088 20 [, home, john, testdir]
1 6400 3500 20 [, home, john, blub]
如何将df ['path']转换为Scikit。也许是像orhotenocder这样的代表矩阵?任何帮助或技巧都会对我有很大帮助。
预先感谢
我以这种方式解决了。
dummy = df['Path'].str.split('/').apply(pd.Series).astype(str)
df = df.drop('Path', axis=1)
new_df = pd.concat([df, dummy], axis=1)
取决于数据框的不同路径可能会变得很大。但这足以继续。我将检查df ['path']。str.extract()以更好地分解路径。