我正在通过财务库将一些库存数据导入到数据帧中。数据帧如下所示:
In [43]:stockinfo['AAPL']
Out[43]:
Open High ... Adj Close Volume
Date ...
2011-01-10 NaN NaN ... NaN NaN
2011-01-11 12.317142 12.320000 ... 10.525573 444108000.0
2011-01-12 12.258928 12.301071 ... 10.611223 302590400.0
2011-01-13 12.327143 12.380000 ... 10.650039 296780400.0
2011-01-14 12.353214 12.445714 ... 10.736307 308840000.0
... ... ... ... ...
2021-01-04 133.520004 133.610001 ... 129.410004 143301900.0
2021-01-05 128.889999 131.740005 ... 131.009995 97664900.0
2021-01-06 127.720001 131.050003 ... 126.599998 155088000.0
2021-01-07 128.360001 131.630005 ... 130.919998 109578200.0
2021-01-08 132.429993 132.630005 ... 132.050003 104965400.0
[2585 rows x 6 columns]
数据帧列的结构如下:
IN [44]:stockinfo.columns
Out[44]:
MultiIndex([('AAPL', 'Open'),
('AAPL', 'High'),
('AAPL', 'Low'),
('AAPL', 'Close'),
('AAPL', 'Adj Close'),
('AAPL', 'Volume'),
( 'GOOG', 'Open'),
( 'GOOG', 'High'),
( 'GOOG', 'Low'),
( 'GOOG', 'Close'),
...
( 'TSLA', 'Open'),
( 'TSLA', 'High'),
( 'TSLA', 'Low'),
( 'TSLA', 'Close'),
( 'TSLA', 'Adj Close'),
( 'TSLA', 'Volume')],
length=24822)
我想增加一个额外的字段。这可以通过以下命令完成:
update['AAPL', 'NEWFIELD'] = update['AAPL','Close'].rolling(window=50, min_periods=1).mean()
但是,我需要为所有条目(AAPL,GOOG,TSLA,…(做这件事。遍历它非常慢,所以我需要一种方法用一行完成更新。对于一个普通的数据帧,我可以做:update['NEWFIELD'] = value
,但由于这个涉及多维标头,我不知道语法,而且我还没有在stackerflow上看到这个问题的答案。。。你知道吗?
以下是迭代解决方案:
for a in range(len(stockinfo.columns)):
stockinfo[stockinfo.columns[a][0], '50SMA'] = stockinfo[stockinfo.columns[a][0],'Close'].rolling(window=50, min_periods=1).mean()
但这是放慢速度的方法。。
约翰。
这是我的输出:
df
Out[97]:
TSLA ... AAPL
Open High ... Adj Close Volume
Date ...
2020-12-30 672.000000 696.599976 ... 133.720001 96452100
2020-12-31 699.989990 718.719971 ... 132.690002 98990400
2021-01-04 719.460022 744.489990 ... 129.410004 143301900
2021-01-05 723.659973 740.840027 ... 131.009995 97664900
2021-01-06 758.489990 774.000000 ... 126.599998 155088000
2021-01-07 777.630005 816.989990 ... 130.919998 109578200
2021-01-08 856.000000 884.489990 ... 132.050003 104965400
[7 rows x 12 columns]
df_new = df.unstack().to_frame().unstack(1).reset_index(col_level=1).droplevel(0, axis=1)
df_new
Out[99]:
level_0 Date Open ... Close Adj Close Volume
0 TSLA 2020-12-30 672.000000 ... 694.780029 694.780029 42846000.0
1 TSLA 2020-12-31 699.989990 ... 705.669983 705.669983 49570900.0
2 TSLA 2021-01-04 719.460022 ... 729.770020 729.770020 48638200.0
3 TSLA 2021-01-05 723.659973 ... 735.109985 735.109985 32245200.0
4 TSLA 2021-01-06 758.489990 ... 755.979980 755.979980 44700000.0
5 TSLA 2021-01-07 777.630005 ... 816.039978 816.039978 51498900.0
6 TSLA 2021-01-08 856.000000 ... 880.020020 880.020020 74661000.0
7 AAPL 2020-12-30 135.580002 ... 133.720001 133.720001 96452100.0
8 AAPL 2020-12-31 134.080002 ... 132.690002 132.690002 98990400.0
9 AAPL 2021-01-04 133.520004 ... 129.410004 129.410004 143301900.0
10 AAPL 2021-01-05 128.889999 ... 131.009995 131.009995 97664900.0
11 AAPL 2021-01-06 127.720001 ... 126.599998 126.599998 155088000.0
12 AAPL 2021-01-07 128.360001 ... 130.919998 130.919998 109578200.0
13 AAPL 2021-01-08 132.429993 ... 132.050003 132.050003 104965400.0
[14 rows x 8 columns]
但是第二个命令失败了,因为没有级别1:
KeyError: 'level_1'
你能在第一个命令之后做一个df_new吗?
如果这是您的数据帧:
TSLA AAPL
Open High Low Close Adj Close Volume Open High Low Close Adj Close Volume
2020-12-30 672.000000 696.599976 668.359985 694.780029 694.780029 42846000 135.580002 135.990005 133.399994 133.720001 133.720001 96452100
2020-12-31 699.989990 718.719971 691.119995 705.669983 705.669983 49570900 134.080002 134.740005 131.720001 132.690002 132.690002 98990400
2021-01-04 719.460022 744.489990 717.190002 729.770020 729.770020 48638200 133.520004 133.610001 126.760002 129.410004 129.410004 143301900
2021-01-05 723.659973 740.840027 719.200012 735.109985 735.109985 32245200 128.889999 131.740005 128.429993 131.009995 131.009995 97664900
2021-01-06 758.489990 774.000000 749.099976 755.979980 755.979980 44700000 127.720001 131.050003 126.379997 126.599998 126.599998 155088000
2021-01-07 777.630005 816.989990 775.200012 816.039978 816.039978 51498900 128.360001 131.630005 127.860001 130.919998 130.919998 109578200
2021-01-08 856.000000 884.489990 838.390015 880.020020 880.020020 74661000 132.429993 132.630005 130.229996 132.050003 132.050003 104965400
这应该做的工作
>>> df_new = df.unstack().to_frame().unstack(1).reset_index(col_level=1).droplevel(0, axis=1)
>>> df_new['CloseMean']=df_new.groupby(['level_0'])['level_1','Close'].rolling(window=50, min_periods=1).mean().reset_index()['Close']
>>> df_new.set_index(['level_0','level_1']).unstack(0).swaplevel(1,0, axis=1).stack().unstack().rename_axis(None)
level_0 AAPL TSLA
Adj Close Close CloseMean High Low Open Volume Adj Close Close CloseMean High Low Open Volume
2020-12-30 133.720001 133.720001 133.720001 135.990005 133.399994 135.580002 96452100.0 694.780029 694.780029 694.780029 696.599976 668.359985 672.000000 42846000.0
2020-12-31 132.690002 132.690002 133.205002 134.740005 131.720001 134.080002 98990400.0 705.669983 705.669983 700.225006 718.719971 691.119995 699.989990 49570900.0
2021-01-04 129.410004 129.410004 131.940002 133.610001 126.760002 133.520004 143301900.0 729.770020 729.770020 710.073344 744.489990 717.190002 719.460022 48638200.0
2021-01-05 131.009995 131.009995 131.707500 131.740005 128.429993 128.889999 97664900.0 735.109985 735.109985 716.332504 740.840027 719.200012 723.659973 32245200.0
2021-01-06 126.599998 126.599998 130.686000 131.050003 126.379997 127.720001 155088000.0 755.979980 755.979980 724.262000 774.000000 749.099976 758.489990 44700000.0
2021-01-07 130.919998 130.919998 130.725000 131.630005 127.860001 128.360001 109578200.0 816.039978 816.039978 739.558329 816.989990 775.200012 777.630005 51498900.0
2021-01-08 132.050003 132.050003 130.914286 132.630005 130.229996 132.429993 104965400.0 880.020020 880.020020 759.624285 884.489990 838.390015 856.000000 74661000.0