我有一个数据帧,其中包含一年中某些日子的价格,现在我想创建一个更大的数据帧,显示从年初到某个特定日期的所有日子。然后使用原始数据框中已有的天数的价格,并在没有价格的天数之间填充该日期的最后价格。
以为例:
df = pd.DataFrame({
'timestamps': pd.to_datetime(
['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29',
'2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
'117.0541','117.5977']})
我希望我的新日期序列是这样的
index=pd.date_range('2021-01-01', '2021-02-28')
dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
dfObj['new_Date'] = index
所以,理想情况下,我应该有类似下面的数据框架。(只是顶部部分)
new_Date new_LastPrice
0 2021-01-01 0
1 2021-01-02 0
2 2021-01-03 0
3 2021-01-04 113.4377
4 2021-01-05 113.4377
5 2021-01-06 113.4377
6 2021-01-07 115.0741
7 2021-01-08 115.0741
8 2021-01-09 115.0741
9 2021-01-10 115.0741
10 2021-01-11 115.0741
11 2021-01-12 115.0741
12 2021-01-13 115.0741
这里有人能帮我一下吗?使用DataFrame.reindex
与method='ffill'
:
index=pd.date_range('2021-01-01', '2021-02-28')
dfObj = (df.set_index('timestamps')
.reindex(index, method='ffill')
.fillna(0)
.add_prefix('new_')
.rename_axis('new_Date')
.reset_index())
print (dfObj.head(13))
new_Date new_LastPrice
0 2021-01-01 0
1 2021-01-02 0
2 2021-01-03 0
3 2021-01-04 113.4377
4 2021-01-05 113.4377
5 2021-01-06 113.4377
6 2021-01-07 115.0741
7 2021-01-08 115.0741
8 2021-01-09 115.0741
9 2021-01-10 115.0741
10 2021-01-11 115.0741
11 2021-01-12 115.0741
12 2021-01-13 115.0741
这将适用于您的情况:(合并数据帧并使用fillna作为填充填充缺失的值,然后使用filna作为初始记录的0)
df = pd.DataFrame({
'timestamps': pd.to_datetime(
['2021-01-04', '2021-01-07', '2021-01-14', '2021-01-21', '2021-01-28', '2021-01-29',
'2021-02-04', '2021-02-12', '2021-02-18', '2021-02-25']),
'LastPrice':['113.4377','115.0741','115.5709','116.5197','116.681','116.4198','117.5749','117.2175',
'117.0541','117.5977']})
index=pd.date_range('2021-01-01', '2021-02-28')
dfObj = pd.DataFrame(columns=['new_Date','new_LastPrice'])
dfObj['new_Date'] = index
dfObj = dfObj.merge(df,how='left', left_on='new_Date', right_on='timestamps')
dfObj = dfObj[['new_Date', 'LastPrice']]
dfObj = dfObj.fillna(method='ffill')
dfObj = dfObj.fillna(0)
输出:
new_Date LastPrice
0 2021-01-01 0
1 2021-01-02 0
2 2021-01-03 0
3 2021-01-04 113.4377
4 2021-01-05 113.4377
5 2021-01-06 113.4377
6 2021-01-07 115.0741
7 2021-01-08 115.0741
8 2021-01-09 115.0741
9 2021-01-10 115.0741
10 2021-01-11 115.0741
...
您可以使用pyjanitor
中的complete
函数来抽象暴露缺失值/行的过程:
#pip install pyjanitor
import janitor
import pandas as pd
index=pd.date_range('2021-01-01', '2021-02-28')
# assign the new values as a dictionary,
# with the column name as the key
new_dates = {"timestamps": index} # accepts a callable too
(df.complete([new_dates])
.ffill()
.fillna(0)
.set_axis(['new_Date', 'new_LastPrice'],
axis = 'columns')
.head(10) # shows the first 10 rows, you can get rid of this line
)
new_Date new_LastPrice
0 2021-01-01 0
1 2021-01-02 0
2 2021-01-03 0
3 2021-01-04 113.4377
4 2021-01-05 113.4377
5 2021-01-06 113.4377
6 2021-01-07 115.0741
7 2021-01-08 115.0741
8 2021-01-09 115.0741
9 2021-01-10 115.0741