df == 多个用户数据的位置点
tslot user location point
0 2015-12-04 13:00:00 0 4356
1 2015-12-04 13:15:00 0 4356
2 2015-12-04 13:30:00 0 4356
3 2015-12-04 13:45:00 0 4356
4 2015-12-04 14:00:00 0 4356
5 2015-12-04 14:15:00 0 4356
6 2015-12-04 14:30:00 0 4356
7 2015-12-04 14:45:00 0 4356
8 2015-12-04 15:00:00 0 7645
... ... ... ...
616688 2015-12-10 18:30:00 38204 820
616689 2015-12-10 18:45:00 38204 1081
616690 2015-12-10 19:00:00 38204 672
616691 2015-12-10 19:15:00 38204 694
616692 2015-12-10 19:30:00 38204 46
616693 2015-12-10 19:45:00 38204 360
616694 2015-12-10 20:00:00 38204 1380
616695 2015-12-10 20:15:00 38204 1380
616696 2015-12-10 20:30:00 38204 1380
616697 2015-12-10 20:45:00 38204 1381
616698 2015-12-10 21:00:00 38204 1380
使用以下代码分隔每个用户数据:
users = ["0", "6356"]
df_ = {}
for i in users:
df_[i] = newdataframe[newdataframe.user== int(i)]
我尝试使用
def split(dataframe, border, col):
return dataframe.loc[:border,col], dataframe.loc[border:,col]
df_new = {}
for i in users:
df_new[i] = {}
df_new[i]["Train"], df_new[i]["Test"] = split(df_[i], "500", "location point")
我的要求是获取大小为 500 行的训练集和剩余的测试数据集。如何拆分每个用户的训练值和测试值。
我很确定您的代码按预期工作
import pandas as pd
df_rawdata = pd.DataFrame({'user':[1,2,1,2,1],'location point':[4,11,10,9,7]})
users = ["1", "2"]
df_ = {}
for i in users:
df_[i] = df_rawdata[df_rawdata.user== int(i)]
def split(dataframe, border, col):
return dataframe.loc[:border,col], dataframe.loc[border:,col]
df_new = {}
for i in users:
df_new[i] = {}
df_new[i]["Train"], df_new[i]["Test"] = split(df_[i], "2", "location point")
然后
print(df_new["1"]['Train'])
给
0 4
2 10
Name: location point, dtype: int64
即用户 1 的前 2 个数据点的索引和位置点
和
print(df_new["1"]['Test'])
给最后