如何转换Scikit-Learn的路径

我需要如何准备Linux路径/DIR进行训练Scikit。目前，这是如此之遥

db = get_db()
df = pd.read_sql_query(
    'SELECT Request,Used,Count FROM history where User = "john"', db)
print(df.head)
X = df[['Request','Count']]
y = df['Used']
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = LinearRegression().fit(X_train, y_train)
print("LinearRegression Training set score: {:.2f}".format(model.score(X_train, y_train)))
print("LinearRegression Test set score: {:.2f}".format(model.score(X_test, y_test))) 
#Output of print(df.head)
             Request       Used         Count
0            5400          3088         20   
1            6400          3500         20

现在我想更改SQL查询并添加到结果的路径。

df = pd.read_sql_query(
  'SELECT Request,Used,Count,Path FROM history where User = "john"', db)
df['Path'] = df['Path'].str.split("/")
print(df.head)
X = df[['Request','Count','Path']]
y = df['Used']
X_train, X_test, y_train, y_test = train_test_split(X, y)
...
#Output of print(df.head)
    #Output of print(df.head)
             Request       Used         Count    Path
0            5400          3088         20      [, home, john, testdir] 
1            6400          3500         20      [, home, john, blub]

如何将df ['path']转换为Scikit。也许是像orhotenocder这样的代表矩阵？任何帮助或技巧都会对我有很大帮助。

预先感谢

我以这种方式解决了。

dummy = df['Path'].str.split('/').apply(pd.Series).astype(str)
df = df.drop('Path', axis=1)
new_df = pd.concat([df, dummy], axis=1)

取决于数据框的不同路径可能会变得很大。但这足以继续。我将检查df ['path']。str.extract（）以更好地分解路径。

相关内容

最新更新

热门标签：