我有以下数据框架:
df
Current_date ID Current_date+1 Current_Date+2
19/10/2021 11 0.9 0.8
19/10/2021 11 0.7 0.6
19/10/2021 12 1.0 1.5
19/10/2021 13 0.8 0.5
我想重新整理一下,得到下面的表格。为了明确起见,要获得Date
列值,如果它是Current_date+1
,并且在第一列上有1天(' ' Current_date) and if it is
Current_date+2 ' ' ' '),则在当前日期上添加两天。然后创建一个新列来赋值。
Current_date ID Date Value
19/10/2021 11 20/10/2021 0.9
19/10/2021 11 20/10/2021 0.7
19/10/2021 12 20/10/2021 1.0
19/10/2021 13 20/10/2021 0.8
19/10/2021 11 21/10/2021 0.8
19/10/2021 11 21/10/2021 0.6
19/10/2021 12 21/10/2021 1.5
19/10/2021 13 21/10/2021 0.5
在Pandas Python中有人能帮我吗?
通过Series.str.extract
和to_timedelta
列的最后整数提取add timedeltas来使用DataFrame.melt
:
df = df.melt(['Current_date','ID'], var_name='Date', value_name='Value')
td = pd.to_timedelta(df['Date'].str.extract('(d+)$', expand=False).astype(int), unit='D')
df['Date']=pd.to_datetime(df['Current_date'],dayfirst=True).add(td).dt.strftime('%d/%m/%Y')
print (df)
Current_date ID Date Value
0 19/10/2021 11 20/10/2021 0.9
1 19/10/2021 11 20/10/2021 0.7
2 19/10/2021 12 20/10/2021 1.0
3 19/10/2021 13 20/10/2021 0.8
4 19/10/2021 11 21/10/2021 0.8
5 19/10/2021 11 21/10/2021 0.6
6 19/10/2021 12 21/10/2021 1.5
7 19/10/2021 13 21/10/2021 0.5
一个选项是使用pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df['Current_date'] = pd.to_datetime(df.Current_date)
(df.rename(columns = {"Current_date+1":"Value+1",
"Current_Date+2":"Value+2"})
.pivot_longer(index=['Current_date', 'ID'],
names_to=(".value", "Date"),
names_sep="+")
.assign(Date = lambda df: df.Current_date +
pd.to_timedelta(df.Date.astype(int), unit='D'))
)
Current_date ID Date Value
0 2021-10-19 11 2021-10-20 0.9
1 2021-10-19 11 2021-10-20 0.7
2 2021-10-19 12 2021-10-20 1.0
3 2021-10-19 13 2021-10-20 0.8
4 2021-10-19 11 2021-10-21 0.8
5 2021-10-19 11 2021-10-21 0.6
6 2021-10-19 12 2021-10-21 1.5
7 2021-10-19 13 2021-10-21 0.5
您也可以使用pd.wide_to_long:
实现此目的df = df.rename(columns = {"Current_date+1":"Value+1",
"Current_Date+2":"Value+2"})
(pd.wide_to_long(df.reset_index(),
stubnames = 'Value',
i = ['Current_date', 'ID', 'index'],
j= 'Date',
sep='+')
.droplevel('index')
.reset_index()
.assign(Date = lambda df: df.Current_date +
pd.to_timedelta(df.Date.astype(int), unit='D'))
)
Current_date ID Date Value
0 2021-10-19 11 2021-10-20 0.9
1 2021-10-19 11 2021-10-21 0.8
2 2021-10-19 11 2021-10-20 0.7
3 2021-10-19 11 2021-10-21 0.6
4 2021-10-19 12 2021-10-20 1.0
5 2021-10-19 12 2021-10-21 1.5
6 2021-10-19 13 2021-10-20 0.8
7 2021-10-19 13 2021-10-21 0.5