df
f
0 l2y_q1_eps_gg
1 l2y_q2_eps_gg
2 l2y_q3_eps_gg
3 l2y_q4_eps_gg
4 l1y_q1_eps_gg
目标
f fr_date
0 l2y_q1_eps_gg 20190331
1 l2y_q2_eps_gg 20190630
2 l2y_q3_eps_gg 20190930
3 l2y_q4_eps_gg 20191231
4 l1y_q1_eps_gg 20200331
5 cy_q1_eps_gg 20210331
fr_date
列的值是每个季度每年的最后一天,规则如下,fr_date的类型为int:
- 2019年12月
- 2020年1月1日
- cy:2021
- q1-q4:每个季度的最后一天
注意:
f
列的起始模式为l2y/l1y/cy
+q1/q2/q3/q4
- 如果当前年份发生更改,则规则将发生更改。例如,如果当前年份是2022年,那么l2y→2020年1月1日→2021年,cy→2022年
您需要两件事:一个转换函数,以及如何将该函数应用于pandas Dataframe的列以获得新列。
翻译功能
有几种方法可以做到这一点,但这里有一种:
from datetime import datetime
# Last days of quarters are always the same
last_quarter_days = {"q1": "0331", "q2": "0630", "q3": "0930", "q4": "1231"}
def translate_date(string):
# Extract year and quarter for the full string
year_str, quarter_str, _, _, = string.split("_")
# Compute year automatically
current_year = datetime.today().year
if year_str == "cy":
year = current_year
else:
# This is a dumb extractor, you could do a pattern search
# and raise an exception if the string is not correct
sub = int(year_str[1])
year = current_year - sub
# Translate the quarter string thanks to the translation table
day = last_quarter_days[quarter_str]
# return the date as an integer (but maybe you want a string?)
return int("{year}{day}".format(year=year, day=day))
哪个给出:
>>> translate_date("cy_q1_eps_gg")
20210331
如何将其应用于数据帧
采用熊猫地图法。
df["fr_date"] = df["f"].map(translate_date)
您可以使用QuarterEnd
偏移量来计算每个季度结束的日期:
current_year = pd.datetime.now().year
mapping = {"l2y": current_year - 2, "l1y": current_year - 1, "cy": current_year}
df["year"] = df.f.str.extract(r"([^_]+)")
df["year"] = df["year"].map(mapping)
df["quarter"] = df.f.str.extract(r"_q([d])")
df["fr_date"] = df.apply(
lambda x: (
pd.Timestamp(year=x["year"], month=int(x["quarter"]) * 3, day=1)
+ pd.tseries.offsets.QuarterEnd()
).strftime("%Y%m%d"),
axis=1,
)
print(df[["f", "fr_date"]])
印刷品(2021年(:
f fr_date
0 l2y_q1_eps_gg 20190331
1 l2y_q2_eps_gg 20190630
2 l2y_q3_eps_gg 20190930
3 l2y_q4_eps_gg 20191231
4 l1y_q1_eps_gg 20200331
5 cy_q1_eps_gg 20210331
df = pd.concat([df, df['f'].str.split('_', expand=True)], axis=1)
df
f 0 1 2 3
0 l2y_q1_eps_gg l2y q1 eps gg
1 l2y_q2_eps_gg l2y q2 eps gg
2 l2y_q3_eps_gg l2y q3 eps gg
3 l2y_q4_eps_gg l2y q4 eps gg
4 l1y_q1_eps_gg l1y q1 eps gg
df['year']=df[0].map({'l2y':'2019','l1y':'2020','cy':'2021'})
df['quarter']=df[1].str.upper()
df['fr_date'] = df['year'] + '-' + df['quarter']
df = df.drop([0,1,2,3], axis=1)
print(df)
f year quarter fr_date
0 l2y_q1_eps_gg 2019 Q1 2019-Q1
1 l2y_q2_eps_gg 2019 Q2 2019-Q2
2 l2y_q3_eps_gg 2019 Q3 2019-Q3
3 l2y_q4_eps_gg 2019 Q4 2019-Q4
4 l1y_q1_eps_gg 2020 Q1 2020-Q1
df['fr_date'] = pd.to_datetime([f'{x[:4]}{x[-2:]}' for x in df['fr_date']])
df
f year quarter fr_date
0 l2y_q1_eps_gg 2019 Q1 2019-01-01
1 l2y_q2_eps_gg 2019 Q2 2019-04-01
2 l2y_q3_eps_gg 2019 Q3 2019-07-01
3 l2y_q4_eps_gg 2019 Q4 2019-10-01
4 l1y_q1_eps_gg 2020 Q1 2020-01-01
df['fr_date'] = pd.to_datetime(df['fr_date']) + pd.tseries.offsets.QuarterEnd()
df['fr_date'] = df['fr_date'].dt.strftime('%Y%m%d')
df = df.drop(['year', 'quarter'], axis=1)
print(df)
f fr_date
0 l2y_q1_eps_gg 20190331
1 l2y_q2_eps_gg 20190630
2 l2y_q3_eps_gg 20190930
3 l2y_q4_eps_gg 20191231
4 l1y_q1_eps_gg 20200331
生成一个函数change_string并应用于列f。该功能执行以下操作:
- 创建一个包含年份映射的字典
- 使用正则表达式从字符串中提取年份代码,然后使用dictionary从该代码中提取年份
- 使用正则表达式从字符串中提取季度
- 使用
pd.Timestamp
创建季度开始,使用年、月=季度*3和日=1以及pd.tseries.offsets.QuarterEnd()
计算季度结束 - 最后使用
strftime
以所需字符串格式返回datetime
def change_string(data):
changes = {"cy": date.today().year, "l1y": date.today().year-1, "l2y": date.today().year-2}
year = changes[re.findall("^ldy", data)[0]]
quarter = int(re.findall("_q(d)", data)[0])
data = (pd.Timestamp(year=year, month =quarter * 3, day=1) + pd.tseries.offsets.QuarterEnd()).strftime("%Y%m%d")
return data
df = pd.DataFrame({"f":["l2y_q1_eps_gg","l2y_q2_eps_gg","l2y_q3_eps_gg","l2y_q4_eps_gg","l1y_q1_eps_gg"]})
df["fr_date"] = df.f.apply(change_string)
print(df)
f fr_date
0 l2y_q1_eps_gg 20190331
1 l2y_q2_eps_gg 20190630
2 l2y_q3_eps_gg 20190930
3 l2y_q4_eps_gg 20191231
4 l1y_q1_eps_gg 20200331