我有一只熊猫,我的分组操作将索引变成糊状。我需要日期作为我的索引,在每个股票组中排序
举例来说。像这样设置熊猫:
import pandas as pd
from StringIO import StringIO
text = """Date Ticker Open High Low Adj_Close Volume
2015-04-09 vws.co 315.000000 316.100000 312.500000 311.520000 1686800
2015-04-10 vws.co 317.000000 319.700000 316.400000 312.700000 1396500
2015-04-13 vws.co 317.900000 321.500000 315.200000 315.850000 1564500
2015-04-14 vws.co 320.000000 322.400000 318.700000 314.870000 1370600
2015-04-15 vws.co 320.000000 321.500000 319.200000 316.150000 945000
2015-04-16 vws.co 319.000000 320.200000 310.400000 307.870000 2236100
2015-04-17 vws.co 309.900000 310.000000 302.500000 299.100000 2711900
2015-04-20 vws.co 303.000000 312.000000 303.000000 306.490000 1629700
2016-03-31 mmm 166.750000 167.500000 166.500000 166.630005 1762800
2016-04-01 mmm 165.630005 167.740005 164.789993 167.529999 1993700
2016-04-04 mmm 167.110001 167.490005 165.919998 166.399994 2022800
2016-04-05 mmm 165.179993 166.550003 164.649994 165.809998 1610300
2016-04-06 mmm 165.339996 167.080002 164.839996 166.809998 2092200
2016-04-07 mmm 165.880005 167.229996 165.250000 167.160004 2721900"""
df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)
和代码
import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os
stocklist = ['vws.co','nflx','mmm']
print ('df.tail (Input df)n',df.tail(6),'n')
def Screener(group):
def diff_calc(group):
df['Difference'] = df['Adj_Close'].diff()
return df['Difference']
df['Difference'] = diff_calc(group)
return df
if __name__ == '__main__':
df = GetStock(stocklist, start, end)
df['Adj_Close'] = df['Adj Close']
for ticker in stocklist:
### groupby screeener (filtering to only rel ticker group)
df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener)
df.reset_index().sort(['Ticker', 'Date'], ascending=[1,1]).set_index('Ticker')
print ('(Output df)n',df,'n')
# Test the first 7 rows of each group for rolling_mean transgress groups...
df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
print ('df_test (summary from df) (Output)n',df_test,'n')
显然我的索引现在搞砸了,我不知道这是怎么发生的。
(Output df)
Ticker Open High Low Adj Close Adj_Close Date
0 0 0 2016-05-20 vws.co 443.00 446.30 441.40 442.90 442.90
2016-05-23 vws.co 442.00 446.70 439.90 439.90 439.90
2016-05-24 vws.co 439.10 450.00 438.10 450.00 450.00
2016-05-25 vws.co 455.50 466.10 454.30 464.90 464.90
2016-05-26 vws.co 465.00 470.80 464.60 464.60 464.60
2016-05-27 vws.co 464.00 480.70 461.20 476.00 476.00
2016-05-30 vws.co 477.00 481.80 473.10 475.00 475.00
2016-05-31 vws.co 474.00 479.30 472.20 479.00 479.00
2016-06-01 vws.co 477.40 480.20 472.90 474.40 474.40
2016-05-20 nflx 90.08 93.28 89.98 92.49 92.49
2016-05-23 nflx 92.98 95.29 92.85 94.89 94.89
2016-05-24 nflx 95.98 99.14 95.75 97.89 97.89
2016-05-25 nflx 99.00 100.31 98.30 100.20 100.20
我需要日期作为我的索引,在每个股票组中排序
谁能帮忙?
我终于找到了解决方案。这句台词是我的秘诀
df = df.reset_index(level=0, drop=True)
这个问题帮助我把索引恢复到我想要的样子。如何获取熊猫数据框中的行,一列中有最大值并保留原始索引?
下面的代码将让我摆脱索引中添加的不需要的 pr. 迭代 col。谢谢大家!
import pandas as pd
from pandas.io.data import DataReader
import numpy as np
import time
import os
from io import StringIO
text = """Date Ticker Open High Low Adj_Close Volume
2015-04-09 vws.co 315.000000 316.100000 312.500000 311.520000 1686800
2015-04-10 vws.co 317.000000 319.700000 316.400000 312.700000 1396500
2015-04-13 vws.co 317.900000 321.500000 315.200000 315.850000 1564500
2015-04-14 vws.co 320.000000 322.400000 318.700000 314.870000 1370600
2015-04-15 vws.co 320.000000 321.500000 319.200000 316.150000 945000
2015-04-16 vws.co 319.000000 320.200000 310.400000 307.870000 2236100
2015-04-17 vws.co 309.900000 310.000000 302.500000 299.100000 2711900
2015-04-20 vws.co 303.000000 312.000000 303.000000 306.490000 1629700
2016-03-31 mmm 166.750000 167.500000 166.500000 166.630005 1762800
2016-04-01 mmm 165.630005 167.740005 164.789993 167.529999 1993700
2016-04-04 mmm 167.110001 167.490005 165.919998 166.399994 2022800
2016-04-05 mmm 165.179993 166.550003 164.649994 165.809998 1610300
2016-04-06 mmm 165.339996 167.080002 164.839996 166.809998 2092200
2016-04-07 mmm 165.880005 167.229996 165.250000 167.160004 2721900"""
df = pd.read_csv(StringIO(text), delim_whitespace=1, parse_dates=[0], index_col=0)
runstart = time.time() # Start script timer
stocklist = ['vws.co','nflx','mmm']#,'msft','tsla']
tickers = []
def Screener(group):
def diff_calc(group):
df['Difference'] = df['Adj_Close'].diff()
return df['Difference']
df['Difference'] = diff_calc(group)
return df
if __name__ == '__main__':
for ticker in stocklist:
### groupby screeener (filtering to only rel ticker group)
df = df.groupby('Ticker', as_index=False).Adj_Close.apply(Screener) #.reset_index()
df = df.reset_index(level=0, drop=True)
print ('(Output df)n',df,'n')
# Test the first 7 rows of each group for rolling_mean transgress groups...
df_test = df.groupby('Ticker').head(7).reset_index().set_index('Date')
print ('df_test (summary from df) (Output)n',df_test,'n')
groupby 函数会自动将您分组的列设置为索引。要避免这种情况,请传递以下选项:在您的分组中as_index = False
:
df_test = df.groupby('Ticker', as_index = False).head(7).reset_index().set_index('Date')
GroupBy
对象的diff
方法将日期保留为索引:
# sort if needed (use 'mergesort' algorithm to preserve date order)
df = df.sort().sort_values('Ticker', kind='mergesort')
df['Difference'] = df.groupby('Ticker')['Adj_Close'].diff()
注意:如果输入未排序,则需要先对其进行排序。上面的代码首先按索引(即日期)排序,然后使用 mergesort
按股票代码排序以保留日期的顺序。
输出
In [21]: df[['Ticker', 'Adj_Close', 'Difference']]
Out[21]:
Ticker Adj_Close Difference
Date
2015-04-09 vws.co 311.520000 NaN
2015-04-10 vws.co 312.700000 1.180000
2015-04-13 vws.co 315.850000 3.150000
2015-04-14 vws.co 314.870000 -0.980000
2015-04-15 vws.co 316.150000 1.280000
2015-04-16 vws.co 307.870000 -8.280000
2015-04-17 vws.co 299.100000 -8.770000
2015-04-20 vws.co 306.490000 7.390000
2016-03-31 mmm 166.630005 NaN
2016-04-01 mmm 167.529999 0.899994
2016-04-04 mmm 166.399994 -1.130005
2016-04-05 mmm 165.809998 -0.589996
2016-04-06 mmm 166.809998 1.000000
2016-04-07 mmm 167.160004 0.350006