我有两个带日期时间的数据帧:
df["datetime"] = df[["date","time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df["datetime"] = pd.to_datetime(df["datetime"], format='%Y-%m-%d %H:%M:%S')
另一种:
df_labels.columns = ["start_date","start_time","end_date","end_time","mode"]
df_labels["start_datetime"] = df_labels[["start_date","start_time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df_labels["end_datetime"] = df_labels[["end_date","end_time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df_labels["start_datetime"] = df_labels["start_datetime"].str.replace("/","-")
df_labels["end_datetime"] = df_labels["end_datetime"].str.replace("/","-")
df_labels["start_datetime"] = pd.to_datetime(df_labels["start_datetime"], format='%Y-%m-%d %H:%M:%S')
df_labels["end_datetime"] = pd.to_datetime(df_labels["end_datetime"], format='%Y-%m-%d %H:%M:%S')
以上所有代码都已成功运行。
df示例:
lat long u1 alt d date time datetime mode
0 39.921712 116.472343 0 13 39298.146204 2007-08-04 03:30:32 2007-08-04 03:30:32
1 39.921705 116.472343 0 13 39298.146215 2007-08-04 03:30:33 2007-08-04 03:30:33
2 39.921695 116.472345 0 13 39298.146227 2007-08-04 03:30:34 2007-08-04 03:30:34
3 39.921683 116.472342 0 13 39298.146238 2007-08-04 03:30:35 2007-08-04 03:30:35
4 39.921672 116.472342 0 13 39298.146250 2007-08-04 03:30:36 2007-08-04 03:30:36
df_labels示例:
start_date start_time end_date end_time mode start_datetime end_datetime
0 2007/06/26 11:32:29 2007/06/26 11:40:29 bus 2007-06-26 11:32:29 2007-06-26 11:40:29
1 2008/03/28 14:52:54 2008/03/28 15:59:59 train 2008-03-28 14:52:54 2008-03-28 15:59:59
2 2008/03/28 16:00:00 2008/03/28 22:02:00 train 2008-03-28 16:00:00 2008-03-28 22:02:00
3 2008/03/29 01:27:50 2008/03/29 15:59:59 train 2008-03-29 01:27:50 2008-03-29 15:59:59
4 2008/03/29 16:00:00 2008/03/30 15:59:59 train 2008-03-29 16:00:00 2008-03-30 15:59:59
然而,当我运行这个:
for index, row in df_labels.iterrows():
df.loc[(df["datetime"] >= row["start_datetime"]) & (df["datetime"] < row["end_datetime"])] = row["mode"]
我得到以下错误:
TypeError: Cannot compare type 'Timestamp' with type 'str'
请告知
考虑到:日期时间值采用此dd/mm/yy hh:mm:ss
格式。
df['datetime'] = pd.to_datetime(df['datetime'], format='%d/%m/%y %H:%M:%S')
df_labels["start_datetime"] = pd.to_datetime(df_labels["start_datetime"], format='%d/%m/%y %H:%M:%S')
df_labels["end_datetime"] = pd.to_datetime(df_labels["end_datetime"], format='%%d/%m/%y %H:%M:%S')
确保数据类型:
df.dtypes
df_label.dtypes
正确转换时,datetime列应显示datetime64[ns]
附加(效率(:
import numpy as np
import pandas as pd
import pandasql as ps
from pandas import Timestamp
from pandasql import sqldf
import sqlite3
conn = sqlite3.connect(':memory:')
##### write the tables
df.to_sql('df', conn, index=False)
df_label.to_sql('df', conn, index=False)
qry = '''
select *
from df
inner join
(select mode df_label_mode, start_date, end_date from df_label) df_label
on (df.datetime between df_label.start_date and df_label.end_date)
'''
df_x = pd.read_sql_query(qry, conn)
df_x.head()
参考:转换日期列