制定函数并为日期时间数据帧赋值



我正在尝试完成下面的问题,遇到了一些创建此函数并为不同日期分配字符串值的问题。我该如何创建这个函数来返回各种字符串值呢。

这是我的建议:

编辑:使用这个问题的正则表达式检查时间有效性

import pandas as pd
import numpy as np
import re 

def compute_time_day_year(data_dict):
"""
returns: pandas DataFrame with variables weekday (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday),
time_of_day (Morning - 06:00-11:59, Afternoon - 12:00-17:59, Evening - 18:00-23:59, Night - 00:00 - 05:59),
and season (Summer for June, July and August; Autumn for September, October, November; Winter for
December, January, February; Spring for March, April, May). If either of the input parameters is in incorrect
form, the function returns INVALID in all outputs
"""
df = pd.DataFrame({'year': (data_dict['Year']),
'month': (data_dict['Crash_Month']),
'day': (data_dict['Crash_Day']),
'time': (data_dict['Crash_Time'])})
df["invalid_time"] = df["time"].apply(lambda x: not re.match("^([0-1]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$", x))
df.loc[~df["invalid_time"], 'date'] = pd.to_datetime(df.loc[~df["invalid_time"],'year'].astype(str)
+ df.loc[~df["invalid_time"],'month'].astype(str).str.zfill(2)
+ df.loc[~df["invalid_time"],'day'].astype(str).str.zfill(2)
+ ' '
+ df.loc[~df["invalid_time"],'time'].astype(str), format='%Y%m%d %H:%M:%S')
df.loc[~df["invalid_time"],'weekday'] = df.loc[~df["invalid_time"],"date"].dt.day_name(locale="english")
df.loc[~df["invalid_time"],'season'] = (df.loc[~df["invalid_time"],'date'] - pd.DateOffset(months=1))
.dt.quarter
df.loc[~df["invalid_time"],'season'] = df.loc[~df["invalid_time"],'season']
.map({1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Autumn'})
df.loc[~df["invalid_time"],"time_of_day"] = df.loc[~df["invalid_time"],"date"].dt.hour
.apply(lambda x: np.select(
[0 < x <= 6,
6 < x <= 12,
12 < x <= 18,
18 < x <= 24],
["Night", "Morning", "Afternoon", "Evening"]
))
df.loc[df["invalid_time"],["weekday", "season", "time_of_day"]] = "Invalid"
return df.loc[:, ["weekday", "season", "time_of_day"]]
data_dict = {'Year': [2018, 2019, 2020],
'Crash_Month': [1, 2, 3],
'Crash_Day': [4, 5, 6],
'Crash_Time': ["8:00:00", '26:22:00', '8:12:00']}
compute_time_day_year(data_dict)

对于这个例子,它返回:

weekday   season time_of_day
0  Thursday   Autumn     Morning
1   Invalid  Invalid     Invalid
2    Friday   Winter     Morning

希望能有所帮助。

最新更新