在特定条件下使用python时间戳类型上的假日库



我有以下数据帧:

|      ID      |     date                                |
|--------------|-----------------------------------------|
|      0       |         2022-01-01 12:00:01+05:00       |
|      1       |         2022-01-30 21:30:01+03:00       |
|      2       |         2022-02-15 13:04:02+02:00       |
|      3       |         2022-09-05 15:30:01+00:00       |
|      4       |         2022-04-21 13:18:02+02:00       |

日期列是一个python TimeStamp。我使用的是python假日库,我想使用以下代码:

from datetime import date
import holidays

usa_holidays = holidays.country_holidays('US')
texas_holidays = holidays.country_holidays('US', subdiv='TX') 
florida_holidays = holidays.country_holidays('US', subdiv='FL')
california_holidays = holidays.country_holidays('US', subdiv='CA')

# df is the dataframe above 
# It doesn't work.
df['only_date'] = df['date'].apply(lambda x: x.date())
df['federal_holiday'] = df['only_date'].isin(usa_holidays)
# Returns holiday name 'New Year's Day'
print(usa_holidays.get('2022-01-01'))

我想添加以下列:

  1. federal_holiday:如果当天是银行假日,则为True或False(国家/地区词典(
  2. holiday_state:如果它在至少一个与州相关的词典中是假日,则为True。在其他情况下为False
  3. name_state:当天是公共假日的州的名称,如果是所有日子,则写all
  4. holiday_name:节日名称

生成的数据帧如下所示:

| ID | date                      | federal_holiday | holiday_state | name_state | holiday_name         |
|----|---------------------------|-----------------|---------------|------------|----------------------|
| 0  | 2022-01-01 12:00:01+05:00 | True            | True          | all        | New Year's Day       |
| 1  | 2022-01-30 21:30:01+03:00 | False           | False         | NaN        | NaN                  |
| 2  | 2022-02-15 13:04:02+02:00 | False           | True          | FL,CA      | Susan B. Anthony Day |
| 3  | 2022-09-05 15:30:01+00:00 | True            | True          | all        | Labor Day            |
| 4  | 2022-04-21 13:18:02+02:00 | False           | True          | TX         | San Jacinto Day      |

具有以下数据帧:

import holidays
import pandas as pd
pd.options.display.max_columns = 500
df = pd.DataFrame(
{
"ID": [0, 1, 2, 3, 4],
"date": [
"2022-01-01 12:00:01+05:00",
"2022-01-30 21:30:01+03:00",
"2022-02-15 13:04:02+02:00",
"2022-09-05 15:30:01+00:00",
"2022-04-21 13:18:02+02:00",
],
}
)

你可以试试这个:

cal = {
"USA": holidays.country_holidays("US"),
"TX": holidays.country_holidays("US", subdiv="TX"),
"FL": holidays.country_holidays("US", subdiv="FL"),
"CA": holidays.country_holidays("US", subdiv="CA"),
}
fmt = "%Y-%m-%d"

df = (
df.assign(
date=lambda df_: pd.to_datetime(
df_["date"], format="%Y-%m-%d %H:%M:%S", utc=True
)
)  # convert values to datetime
.assign(
federal_holiday=lambda df: df["date"].apply(
lambda x: True if cal["USA"].get(x.strftime(fmt)) else False
)
)  # add a new column for federal holidays
.assign(
holiday_name=lambda df: df["date"].apply(
lambda x: cal["USA"].get(x.strftime(fmt))
)
)  # add a new column for holiday name
.assign(
name_state=lambda df: df["date"].apply(
lambda x: [
state
for state, calendar in cal.items()
if calendar.get(x.strftime(fmt)) and state != "USA"
]
)
)  # add a new column for state names
.assign(
holiday_name=lambda df: df["date"].apply(
lambda x: list(
set(
[
calendar.get(x.strftime(fmt))
for calendar in cal.values()
if calendar.get(x.strftime(fmt))
]
)
)
)
)  # add state holiday names
.assign(
holiday_name=lambda df: df["holiday_name"].apply(
lambda x: ", ".join(x) if len(x) > 0 else pd.NA
)
)  # convert list of names to string
.assign(
name_state=lambda df: df["name_state"]
.apply(lambda x: ", ".join(x) if len(x) > 0 else pd.NA)
.str.replace("TX, FL, CA", "all")
)  # convert list of names to string and replace with 'all'
.assign(holiday_state=lambda df: ~df["name_state"].isna())  # add new column
.reindex(
[
"ID",
"date",
"federal_holiday",
"holiday_state",
"name_state",
"holiday_name",
],
axis=1,
)  # reorder columns order
)

因此:

print(df)
# Output
ID                      date federal_holiday  holiday_state name_state  
0   0 2022-01-01 07:00:01+00:00            True           True        all   
1   1 2022-01-30 18:30:01+00:00           False          False       <NA>   
2   2 2022-02-15 11:04:02+00:00           False           True     FL, CA   
3   3 2022-09-05 15:30:01+00:00            True           True        all   
4   4 2022-04-21 11:18:02+00:00           False           True         TX   
holiday_name  
0        New Year's Day  
1                  <NA>  
2  Susan B. Anthony Day  
3             Labor Day  
4       San Jacinto Day 

最新更新