熊猫创建新的数据帧,其中 2 分钟数据与 1 分钟对齐



这是相同的csv:

Time,o1,h1,l1,c1,o2,h2,l2,c2
        2019.04.08 23:59:00,1.12582,1.1261,1.12582,1.12609,1.12588,1.12591,1.12587,1.12588
        2019.04.09 00:00:00,1.12587,1.12598,1.12587,1.12598,1.12583,1.1261,1.12582,1.12609
        2019.04.09 00:01:00,1.12599,1.12607,1.12587,1.12607,1.12598,1.12607,1.12587,1.12607
        2019.04.09 00:02:00,1.12607,1.12632,1.12605,1.12632,1.12598,1.12607,1.12587,1.12607
        2019.04.09 00:03:00,1.12603,1.12614,1.12603,1.12613,1.12607,1.12632,1.12603,1.12613
        2019.04.09 00:04:00,1.12613,1.12613,1.12609,1.12611,1.12607,1.12632,1.12603,1.12613
        2019.04.09 00:05:00,1.12612,1.12613,1.12612,1.12613,1.12613,1.12613,1.12609,1.12613
        2019.04.09 00:06:00,1.12613,1.12613,1.12613,1.12613,1.12613,1.12613,1.12609,1.12613
        2019.04.09 00:07:00,1.12613,1.12613,1.1261,1.12612,1.12613,1.12613,1.1261,1.12612
        2019.04.09 00:08:00,1.12612,1.12612,1.12612,1.12612,1.12613,1.12613,1.1261,1.12612
        2019.04.09 00:09:00,1.12612,1.12612,1.12609,1.12609,1.12612,1.12612,1.12609,1.12609
        2019.04.09 00:10:00,1.12609,1.12609,1.12602,1.12607,1.12612,1.12612,1.12609,1.12609
        2019.04.09 00:11:00,1.12607,1.12609,1.12607,1.12608,1.12609,1.12609,1.12602,1.12608
        2019.04.09 00:12:00,1.12608,1.12608,1.12608,1.12608,1.12609,1.12609,1.12602,1.12608
        2019.04.09 00:13:00,1.12608,1.12608,1.126,1.126,1.12608,1.12608,1.126,1.126
        2019.04.09 00:14:00,1.126,1.126,1.1259,1.12592,1.12608,1.12608,1.126,1.126
        2019.04.09 00:15:00,1.12592,1.12614,1.12592,1.12613,1.126,1.12614,1.1259,1.12613
        2019.04.09 00:16:00,1.12614,1.12619,1.12612,1.12619,1.126,1.12614,1.1259,1.12613
        2019.04.09 00:17:00,1.12619,1.1262,1.12615,1.12615,1.12614,1.1262,1.12612,1.12615
        2019.04.09 00:18:00,1.12616,1.12616,1.12608,1.12608,1.12614,1.1262,1.12612,1.12615
        2019.04.09 00:19:00,1.12608,1.12611,1.12608,1.12611,1.12616,1.12616,1.12608,1.12611
        2019.04.09 00:20:00,1.12608,1.12637,1.12608,1.1262,1.12616,1.12616,1.12608,1.12611
        2019.04.09 00:21:00,1.1262,1.12631,1.12614,1.12623,1.12608,1.12637,1.12608,1.12623
        2019.04.09 00:22:00,1.12624,1.12646,1.12619,1.12626,1.12608,1.12637,1.12608,1.12623
        2019.04.09 00:23:00,1.12626,1.12628,1.12585,1.1261,1.12624,1.12646,1.12585,1.1261
        2019.04.09 00:24:00,1.12609,1.12621,1.12607,1.12616,1.12624,1.12646,1.12585,1.1261
        2019.04.09 00:25:00,1.12615,1.12625,1.12609,1.1262,1.12609,1.12625,1.12607,1.1262
        2019.04.09 00:26:00,1.1262,1.12629,1.12615,1.12618,1.12609,1.12625,1.12607,1.1262
        2019.04.09 00:27:00,1.12618,1.12621,1.12614,1.12617,1.1262,1.12629,1.12614,1.12617
        2019.04.09 00:28:00,1.12618,1.12618,1.12586,1.12587,1.1262,1.12629,1.12614,1.12617
        2019.04.09 00:29:00,1.12586,1.12598,1.12586,1.12597,1.12618,1.12618,1.12586,1.12597
        2019.04.09 00:30:00,1.12598,1.12608,1.12594,1.12604,1.12618,1.12618,1.12586,1.12597

我想知道如何在熊猫数据帧中做到这一点:

Time,o1,h1,l1,c1,o2,h2,l2,c2,o11,h11,l11,c11
2019.04.08 23:59:00,1.12582,1.1261,1.12582,1.12609,1.12588,1.12591,1.12587,1.12588,1.12587,1.12598,1.12587,1.12598
2019.04.09 00:01:00,1.12599,1.12607,1.12587,1.12607,1.12598,1.12607,1.12587,1.12607,1.12607,1.12632,1.12605,1.12632
2019.04.09 00:03:00,1.12603,1.12614,1.12603,1.12613,1.12607,1.12632,1.12603,1.12613,1.12613,1.12613,1.12609,1.12611
2019.04.09 00:05:00,1.12612,1.12613,1.12612,1.12613,1.12613,1.12613,1.12609,1.12613,1.12613,1.12613,1.12613,1.12613
2019.04.09 00:07:00,1.12613,1.12613,1.1261,1.12612,1.12613,1.12613,1.1261,1.12612,1.12612,1.12612,1.12612,1.12612
2019.04.09 00:09:00,1.12612,1.12612,1.12609,1.12609,1.12612,1.12612,1.12609,1.12609,1.12609,1.12609,1.12602,1.12607

我想附加 2 分钟到 1 分钟的数据,如上例所示。
请告诉我。

使用 Index.get_indexer 表示列表和拟合行中列的位置 DataFrame.iloc ,最后一起DataFrame.join

pos2 = df.columns.get_indexer(['o1','h1','l1','c1'])
df1 = df.iloc[0::2].reset_index(drop=True)
df2 = df.iloc[1::2, pos2].reset_index(drop=True)
df = df1.join(df2, rsuffix='1')

print (df)
                   Time       o1       h1       l1       c1       o2       h2  
0   2019.04.08 23:59:00  1.12582  1.12610  1.12582  1.12609  1.12588  1.12591   
1   2019.04.09 00:01:00  1.12599  1.12607  1.12587  1.12607  1.12598  1.12607   
2   2019.04.09 00:03:00  1.12603  1.12614  1.12603  1.12613  1.12607  1.12632   
3   2019.04.09 00:05:00  1.12612  1.12613  1.12612  1.12613  1.12613  1.12613   
4   2019.04.09 00:07:00  1.12613  1.12613  1.12610  1.12612  1.12613  1.12613   
5   2019.04.09 00:09:00  1.12612  1.12612  1.12609  1.12609  1.12612  1.12612   
6   2019.04.09 00:11:00  1.12607  1.12609  1.12607  1.12608  1.12609  1.12609   
7   2019.04.09 00:13:00  1.12608  1.12608  1.12600  1.12600  1.12608  1.12608   
8   2019.04.09 00:15:00  1.12592  1.12614  1.12592  1.12613  1.12600  1.12614   
9   2019.04.09 00:17:00  1.12619  1.12620  1.12615  1.12615  1.12614  1.12620   
10  2019.04.09 00:19:00  1.12608  1.12611  1.12608  1.12611  1.12616  1.12616   
11  2019.04.09 00:21:00  1.12620  1.12631  1.12614  1.12623  1.12608  1.12637   
12  2019.04.09 00:23:00  1.12626  1.12628  1.12585  1.12610  1.12624  1.12646   
13  2019.04.09 00:25:00  1.12615  1.12625  1.12609  1.12620  1.12609  1.12625   
14  2019.04.09 00:27:00  1.12618  1.12621  1.12614  1.12617  1.12620  1.12629   
15  2019.04.09 00:29:00  1.12586  1.12598  1.12586  1.12597  1.12618  1.12618   
         l2       c2      o11      h11      l11      c11  
0   1.12587  1.12588  1.12587  1.12598  1.12587  1.12598  
1   1.12587  1.12607  1.12607  1.12632  1.12605  1.12632  
2   1.12603  1.12613  1.12613  1.12613  1.12609  1.12611  
3   1.12609  1.12613  1.12613  1.12613  1.12613  1.12613  
4   1.12610  1.12612  1.12612  1.12612  1.12612  1.12612  
5   1.12609  1.12609  1.12609  1.12609  1.12602  1.12607  
6   1.12602  1.12608  1.12608  1.12608  1.12608  1.12608  
7   1.12600  1.12600  1.12600  1.12600  1.12590  1.12592  
8   1.12590  1.12613  1.12614  1.12619  1.12612  1.12619  
9   1.12612  1.12615  1.12616  1.12616  1.12608  1.12608  
10  1.12608  1.12611  1.12608  1.12637  1.12608  1.12620  
11  1.12608  1.12623  1.12624  1.12646  1.12619  1.12626  
12  1.12585  1.12610  1.12609  1.12621  1.12607  1.12616  
13  1.12607  1.12620  1.12620  1.12629  1.12615  1.12618  
14  1.12614  1.12617  1.12618  1.12618  1.12586  1.12587  
15  1.12586  1.12597  1.12598  1.12608  1.12594  1.12604  

编辑:对于添加缺失行一分钟的频率,DataFrame.asfreq

print (df)
                  Time       o1       h1       l1       c1       o2       h2  
0  2019.04.10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1  2019.04.10 16:17:00  1.12327  1.12333  1.12302  1.12322  1.12308  1.12340   
2  2019.04.10 16:21:00  1.12314  1.12354  1.12313  1.12353  1.12314  1.12354   
3  2019.04.10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
4  2019.04.10 16:23:00  1.12359  1.12389  1.12357  1.12383  1.12353  1.12389   
5  2019.04.10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
6  2019.04.10 16:25:00  1.12383  1.12389  1.12373  1.12376  1.12383  1.12396   
7  2019.04.10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   
        l2       c2  
0  1.12300  1.12308  
1  1.12302  1.12322  
2  1.12313  1.12353  
3  1.12313  1.12353  
4  1.12345  1.12383  
5  1.12345  1.12383  
6  1.12373  1.12376  
7  1.12373  1.12376  

df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time').asfreq('1min').reset_index()
#if need forward filling missing values
#df = df.set_index('Time').asfreq('1min', method='ffill').reset_index()
print (df)
                  Time       o1       h1       l1       c1       o2       h2  
0  2019-04-10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1  2019-04-10 16:17:00  1.12327  1.12333  1.12302  1.12322  1.12308  1.12340   
2  2019-04-10 16:18:00      NaN      NaN      NaN      NaN      NaN      NaN   
3  2019-04-10 16:19:00      NaN      NaN      NaN      NaN      NaN      NaN   
4  2019-04-10 16:20:00      NaN      NaN      NaN      NaN      NaN      NaN   
5  2019-04-10 16:21:00  1.12314  1.12354  1.12313  1.12353  1.12314  1.12354   
6  2019-04-10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
7  2019-04-10 16:23:00  1.12359  1.12389  1.12357  1.12383  1.12353  1.12389   
8  2019-04-10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
9  2019-04-10 16:25:00  1.12383  1.12389  1.12373  1.12376  1.12383  1.12396   
10 2019-04-10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   
         l2       c2  
0   1.12300  1.12308  
1   1.12302  1.12322  
2       NaN      NaN  
3       NaN      NaN  
4       NaN      NaN  
5   1.12313  1.12353  
6   1.12313  1.12353  
7   1.12345  1.12383  
8   1.12345  1.12383  
9   1.12373  1.12376  
10  1.12373  1.12376 

pos2 = df.columns.get_indexer(['o1','h1','l1','c1'])
df1 = df.iloc[0::2].reset_index(drop=True)
df2 = df.iloc[1::2, pos2].reset_index(drop=True)
df = df1.join(df2, rsuffix='1')
print (df)
                 Time       o1       h1       l1       c1       o2       h2  
0 2019-04-10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1 2019-04-10 16:18:00      NaN      NaN      NaN      NaN      NaN      NaN   
2 2019-04-10 16:20:00      NaN      NaN      NaN      NaN      NaN      NaN   
3 2019-04-10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
4 2019-04-10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
5 2019-04-10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   
        l2       c2      o11      h11      l11      c11  
0  1.12300  1.12308  1.12327  1.12333  1.12302  1.12322  
1      NaN      NaN      NaN      NaN      NaN      NaN  
2      NaN      NaN  1.12314  1.12354  1.12313  1.12353  
3  1.12313  1.12353  1.12359  1.12389  1.12357  1.12383  
4  1.12345  1.12383  1.12383  1.12389  1.12373  1.12376  
5  1.12373  1.12376      NaN      NaN      NaN      NaN  

拆分数据然后连接:

# split data every other row
n = df[0::2]
n1 = df[1::2]
# reset_index and join
new_df = n.reset_index(drop=True).join(n1.reset_index(drop=True), rsuffix='_other')

然后删除您不需要的列

最新更新