我有一个数据帧,如下所示:
Col1 6/13/2020-6/15/2020 6/16/2020
A1 2.3 1.65
A2 1.4 1.4
A3 1.3 1.3
我想在存在日期范围时分解列,该日期范围由列名称中存在-
表示。
期望的结果如下:
Col1 6/13/2020 6/14/2020 6/15/2020 6/16/2020
A1 2.3 2.3 2.3 1.65
A2 1.4 1.4 1.4 1.4
A3 1.3 1.3 1.3 1.3
我不确定如何在列中爆炸它。
我们仍然explode
s=df.set_index('Col1').T.reset_index()
s
Out[49]:
Col1 index A1 A2 A3
0 6/13/2020-6/15/2020 2.30 1.4 1.3
1 6/16/2020 1.65 1.4 1.3
s['index']=[pd.date_range(x.split('-')[0],x.split('-')[-1]) for x in s['index']]
s=s.explode('index').set_index('index').T.reset_index()
s
Out[52]:
index Col1 2020-06-13 00:00:00 ... 2020-06-15 00:00:00 2020-06-16 00:00:00
0 A1 2.3 ... 2.3 1.65
1 A2 1.4 ... 1.4 1.40
2 A3 1.3 ... 1.3 1.30
[3 rows x 5 columns]
将非日期时间列转换为索引,然后在列表推导式中创建新的数据帧,最后通过concat
numpy.broadcast_to
连接在一起:
df1 = df.set_index('Col1')
dfs = [pd.DataFrame(data=np.broadcast_to(df1.iloc[:,[i]].to_numpy(),
shape=(len(df1), len(pd.date_range(s, e)))),
index=df1.index,
columns=pd.date_range(s, e))
if pd.notna(e)
else pd.DataFrame(df1.iloc[:,[i]].to_numpy(),
index=df1.index,
columns=[pd.to_datetime(s)])
for i, (s, e) in enumerate(df1.columns.str.split('-', expand=True))]
df = pd.concat(dfs, axis=1)
print (df)
2020-06-13 2020-06-14 2020-06-15 2020-06-16
Col1
A1 2.3 2.3 2.3 1.65
A2 1.4 1.4 1.4 1.40
A3 1.3 1.3 1.3 1.30
如果可能重叠:
print (df)
Col1 6/13/2020-6/16/2020 6/16/2020
0 A1 2.3 1.65 <- 6/16/2020 is overlap
1 A2 1.4 1.40
2 A3 1.3 1.30
df1 = df.set_index('Col1')
dfs = [pd.DataFrame(data=np.broadcast_to(df1.iloc[:,[i]].to_numpy(),
shape=(len(df1), len(pd.date_range(s, e)))),
index=df1.index,
columns=pd.date_range(s, e))
if pd.notna(e)
else pd.DataFrame(df1.iloc[:,[i]].to_numpy(),
index=df1.index,
columns=[pd.to_datetime(s)])
for i, (s, e) in enumerate(df1.columns.str.split('-', expand=True))]
df = pd.concat(dfs, axis=1).sum(level=0, axis=1)
print (df)
2020-06-13 2020-06-14 2020-06-15 2020-06-16
Col1
A1 2.3 2.3 2.3 3.95
A2 1.4 1.4 1.4 2.80
A3 1.3 1.3 1.3 2.60