拆分训练和测试集 df 包含多个用户的位置点



df == 多个用户数据的位置点

tslot             user location point
0   2015-12-04 13:00:00 0   4356
1   2015-12-04 13:15:00 0   4356
2   2015-12-04 13:30:00 0   4356
3   2015-12-04 13:45:00 0   4356
4   2015-12-04 14:00:00 0   4356
5   2015-12-04 14:15:00 0   4356
6   2015-12-04 14:30:00 0   4356
7   2015-12-04 14:45:00 0   4356
8   2015-12-04 15:00:00 0   7645
... ... ... ...
616688  2015-12-10 18:30:00 38204   820
616689  2015-12-10 18:45:00 38204   1081
616690  2015-12-10 19:00:00 38204   672
616691  2015-12-10 19:15:00 38204   694
616692  2015-12-10 19:30:00 38204   46
616693  2015-12-10 19:45:00 38204   360
616694  2015-12-10 20:00:00 38204   1380
616695  2015-12-10 20:15:00 38204   1380
616696  2015-12-10 20:30:00 38204   1380
616697  2015-12-10 20:45:00 38204   1381
616698  2015-12-10 21:00:00 38204   1380

使用以下代码分隔每个用户数据:

users = ["0", "6356"]
df_ = {}
for i in users:
df_[i] = newdataframe[newdataframe.user== int(i)]

我尝试使用

def split(dataframe, border, col):
return dataframe.loc[:border,col], dataframe.loc[border:,col]
df_new = {}
for i in users:
df_new[i] = {}
df_new[i]["Train"], df_new[i]["Test"] = split(df_[i], "500", "location point")

我的要求是获取大小为 500 行的训练集和剩余的测试数据集。如何拆分每个用户的训练值和测试值。

我很确定您的代码按预期工作

import pandas as pd
df_rawdata = pd.DataFrame({'user':[1,2,1,2,1],'location point':[4,11,10,9,7]})
users = ["1", "2"]
df_ = {}
for i in users:
df_[i] = df_rawdata[df_rawdata.user== int(i)]
def split(dataframe, border, col):
return dataframe.loc[:border,col], dataframe.loc[border:,col]
df_new = {}
for i in users:
df_new[i] = {}
df_new[i]["Train"], df_new[i]["Test"] = split(df_[i], "2", "location point")

然后

print(df_new["1"]['Train'])

0     4
2    10
Name: location point, dtype: int64

即用户 1 的前 2 个数据点的索引和位置点

print(df_new["1"]['Test'])

给最后

最新更新