我有以下数据:
fip_code npi start_date
0 1 gathering_size_10_0 3/28/2020
1 1 gathering_size_25_to_11 3/19/2020
2 1 non-essential_services_closure 3/28/2020
. . .
. . .
. . .
并且我想将start_date列的每个值转换为日期时间对象,比如x,然后给定日期时间对象y=2020-03-12 00:00:00时,将start_dame列中的值替换为x-y。
以下是用于生成数据帧的代码:
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil import parser
url_npi = 'https://raw.githubusercontent.com/Keystone-Strategy/covid19-interventiondata/master/complete_npis_raw_policies.csv'
df = pd.read_csv(url_npi, error_bad_lines=False)
df = df[['fip_code','npi','start_date']]
好吧,我想好了这个:
df['start_date'] = pd.to_datetime(df['start_date'],infer_datetime_format=True,errors="coerce")
base_str = "3/1/2020"; print("nn base date: ",base_str)
end_str = "4/29/2020"; print("nn end date: ",end_str)
base = pd.to_datetime(base_str)
end = pd.to_datetime(end_str)
df_npi['days_in_effect'] = df_npi.apply(lambda row: (end - row['start_date']).days, axis=1)
df_npi['days_from_base'] = df_npi.apply(lambda row: (row['start_date'] - base).days, axis=1)