我正在分析雅虎财经的股票数据,我目前有我的DataFrame = Df经过筛选后仅显示自1991年以来三月份的天数。我想知道每年三月的月回报率是多少。或任何其他月份的组合,即自1991年以来,一月至三月的回报率是多少。我还想要按年份分列。
我还想知道当天的情况,例如,自1991年以来,苹果的库存在所有星期五都发生了多少变化。这是另一个示例问题
我想把这个弄到我可以打印一个实际的纸质副本,按年分解它;就像一个报告。
我试着在pandas.pydata.org/上阅读多索引教程和组,但它非常令人困惑,我不确定是否这样是我需要的。
这是我当前的代码
from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import dateutil.parser
from tkinter import *
# Sets the max rows that can be displayed
# when the program is executed
pd.options.display.max_rows = 120
# df is the name of the dataframe, it is
# reading the csv file containing data loaded
# from yahoo finance(Date,Open,High,Low,Close
# volume,adj close,)the name of the ticker
# is placed before _data.csv i.e. the ticker aapl
# would have a csv file named aapl_data.csv.
df = pd.read_csv("cde_data.csv")
# resets the index back to the pandas default
# i.e. index starts at 0 for the first row and
# 1 for the second and continues by one till the
# end of the data in the above csv file.
df.reset_index()
# the following code will allow for filtering of the datafram
# based on the year, day of week (dow), and month. It then gets
# applied to the dataframe and then can be used to sort data i.e
# print(df[(df.year == 2015) & (df.month == 5) & (df.dow == 4)])
# which will give you all the days in the month of May(df.month == 5),
# that fall on a Thursday(df.dow == 4), in the year 2015
# (df.year == 2015)
#
# Month Dow Year
# January = 1 Monday = 1 The year will be dispaly in a four
# February = 2 Tuesday = 2 digit format i.e. 2015
# March = 3 Wednesday = 3
# April = 4 Thursday = 4
# May = 5 Friday = 5
# June = 6
# July = 7
# August = 8
# September = 9
# October = 10
# November = 11
# December = 12
def year(x):
return(x.year)
def dow(x):
return(x.isoweekday())
def month(x):
return(x.month)
df.Date = df.Date.apply(dateutil.parser.parse)
df['year'] = df.Date.apply(year)
df['dow'] = df.Date.apply(dow)
df['month'] = df.Date.apply(month)
# The code below has a total of five sections all labeled by number.
# They are #1, #2, #3, #4, #5. Number one adds new columns to the df
# and populates them with data, number two filters out all the days
# that the market went down or flat for the day, number three filters
# out all of the days that the market went up or flat, number four
# filters all of the days that the market went up or down, and
# number five drops the excess columns and concats steps #2, #3, & #4.
# 1
# there are five columns that are being added, up_down, up, down,
# flat, and %chg. up, down, and flat are temporary and will be
# deleted later on the other two up_down, and %chg will be permeant.
# The up_down column is derived from taking the 'close' column minus the
# 'open'column, this tells you how much the stock has moved for the day.
# The 'up' column is temporary and has a value of 'up' for all the rows
# of the DataFrame df. The 'down' column is temporary and has a value of
# 'down' for all the rows of the DataFrame df. The 'down' column is
# temporary and has a value of 'flat' for all the rows of the DataFrame
# df. The '%chg' column is calculated by taking the results of the
# 'up_down' divided by the 'close' column, and then times 100, which
# turns it into a percentage show what percent the stock moved up or
# down for the day. All of the columns added below are added to the
# DataFrame called df, which contains a a csv file(see code lines 14-20
# for information on the csv file contained in the DataFrame df).
df['up'] = 'up'
df['down'] = 'down'
df['flat'] = 'flat'
df['up_down'] = df['Close'] - df['Open']
df['%chg'] = ((df['up_down']/df['Close'])*100)
# 2
# df column[up_down] is first filtered on the greater than zero
# criteria from the year 1984 on up and then is turned into df2.
# If the up_down column is greater than zero than this means that
# the stock went up. Next df3 is set = to df2['up'], df3 now holds
# just the days where the asset went up
df2= (df[(df.year > 1984) & (df.up_down > 0)])
df3 = df2['up']
# 3
# df column[up_down] is first filtered on the less than zero
# criteria from the year 1984 on up and then is turned into df4.
# If the up_down column is less than zero than this means that
# the stock went Down. Next df5 is set = to df4['down'], df5 now holds
# just the days where the asset went down
df4= (df[(df.year > 1984) & (df.up_down < 0)])
df5 = df4['down']
# 4
# df column[up_down] is first filtered on the equal to zero
# criteria from the year 1984 on up and then is turned into df6.
# If the up_down column is equal to zero than this means that
# the stock did not move. Next df7 is set = to df6['flat'],df5
# now holds just the days where the asset did not move at all
df6= (df[(df.year > 1984) & (df.up_down == 0)])
df7 = df6['flat']
# 5
# The code below starts by droping the columns 'up', 'down', and 'flat'.
# These were temporary and were used to help filter data in the above
# code in sections two, three, and four. Finally we concat the
# DataFrames df, df3, df5, and df7. We now have new 'up', 'down' and
# 'flat' columns that only display up, down, or flat when the criteria
# is true.
df = df.drop(['up'], axis = 1)
df = df.drop(['down'], axis = 1)
df = df.drop(['flat'], axis = 1)
df = pd.concat([df,df3,df5,df7],axis =1, join_axes=[df.index])
# The difference between the close of current day and the previous day
# non percentage
df['Up_Down'] = df.Close.diff()
# The percentage of change on the Up_Down column
df['%Chg'] = ((df['up_down']/df['Close'])*100)
# How much the current opening price has moved up from the previous
# opening price in terms of percentage,
df['Open%pd'] = df.Open.pct_change()*100
# How much the current high price has moved up from the previous high
# price in terms of percentage.
df['High%pd'] = df.High.pct_change()*100
# How much the current low price has moved up from the previous low
# price in terms of percentage
df['Low%pd'] = df.Low.pct_change()*100
# How much the current close price has moved up from the previous close
# price in terms of percentage
df['Close%pd'] = df.Close.pct_change()*100
# How much the current volume price has moved up from the previous days
# volume in terms of percetage
df['Volume%pd'] = df.Volume.pct_change()*100
# Both columns take the percentage of change from open to high and open
# to low
df['High%fo'] = ((df.High - df.Open)/(df.Open))*100
df['Low%fo'] = ((df.Open - df.Low) / (df.Open))*100
# Takes the difference from the high price and the low price non
# percentage
df['HighLowRange'] = df.High - df.Low
# Measures how much the range the high minus low has changed verses the
# previous day
df['HighLowRange%pd'] = df.HighLowRange.pct_change()*100
# df now is equal to only the months of March and only has the date and
# Close%pd column
df=df[['Date','Close%pd']][(df.month == 3)]
print(df)
这是df打印的内容(自1991年以来3月的所有日子)
Date Close%pd
223 1991-03-01 2.097902
224 1991-03-04 1.369863
225 1991-03-05 -2.702703
226 1991-03-06 1.388889
227 1991-03-07 0.000000
228 1991-03-08 6.164384
229 1991-03-11 -4.516129
230 1991-03-12 0.675676
231 1991-03-13 2.684564
232 1991-03-14 -2.614379
233 1991-03-15 -1.342282
234 1991-03-18 -7.482993
235 1991-03-19 0.000000
236 1991-03-20 0.735294
237 1991-03-21 0.000000
238 1991-03-22 0.000000
239 1991-03-25 -0.729927
240 1991-03-26 0.000000
241 1991-03-27 0.735294
242 1991-03-28 0.000000
476 1992-03-02 0.000000
477 1992-03-03 0.000000
478 1992-03-04 0.000000
479 1992-03-05 0.000000
480 1992-03-06 0.000000
481 1992-03-09 -3.174603
482 1992-03-10 2.459016
483 1992-03-11 0.000000
484 1992-03-12 -1.600000
485 1992-03-13 0.813008
486 1992-03-16 -1.612903
487 1992-03-17 -1.639344
488 1992-03-18 -2.500000
489 1992-03-19 1.709402
490 1992-03-20 -1.680672
491 1992-03-23 -1.709402
492 1992-03-24 -1.739130
493 1992-03-25 2.654867
494 1992-03-26 -0.862069
495 1992-03-27 4.347826
496 1992-03-30 -1.666667
497 1992-03-31 -1.694915
728 1993-03-01 2.000000
729 1993-03-02 0.980392
730 1993-03-03 0.000000
731 1993-03-04 1.941748
732 1993-03-05 1.904762
733 1993-03-08 1.869159
734 1993-03-09 0.000000
735 1993-03-10 0.000000
736 1993-03-11 -1.834862
737 1993-03-12 3.738318
738 1993-03-15 4.504505
739 1993-03-16 -1.724138
740 1993-03-17 0.000000
741 1993-03-18 2.631579
742 1993-03-19 0.000000
743 1993-03-22 5.128205
744 1993-03-23 0.000000
745 1993-03-24 2.439024
... ... ...
6023 2014-03-10 -3.372835
6024 2014-03-11 -0.943396
6025 2014-03-12 2.761905
6026 2014-03-13 -1.019462
6027 2014-03-14 1.029963
6028 2014-03-17 -1.853568
6029 2014-03-18 4.815864
6030 2014-03-19 -2.792793
6031 2014-03-20 1.297498
6032 2014-03-21 -0.548948
6033 2014-03-24 -7.083717
6034 2014-03-25 0.198020
6035 2014-03-26 -8.003953
6036 2014-03-27 0.214823
6037 2014-03-28 3.215434
6038 2014-03-31 -3.530633
6269 2015-03-02 2.226027
6270 2015-03-03 0.670017
6271 2015-03-04 -4.991681
6272 2015-03-05 -2.101576
6273 2015-03-06 -10.017889
6274 2015-03-09 -5.367793
6275 2015-03-10 -6.722689
6276 2015-03-11 4.504505
6277 2015-03-12 1.293103
6278 2015-03-13 1.063830
6279 2015-03-16 6.315789
6280 2015-03-17 -5.544554
6281 2015-03-18 8.805031
6282 2015-03-19 -3.853565
6283 2015-03-20 7.014028
6284 2015-03-23 0.561798
6285 2015-03-24 1.862197
6286 2015-03-25 -3.473492
6287 2015-03-26 -1.325758
6288 2015-03-27 -3.262956
6289 2015-03-30 -3.968254
6290 2015-03-31 -2.685950
6521 2016-03-01 -1.295337
6522 2016-03-02 4.986877
6523 2016-03-03 12.250000
6524 2016-03-04 0.668151
6525 2016-03-07 11.061947
6526 2016-03-08 -7.370518
6527 2016-03-09 1.720430
6528 2016-03-10 3.382664
6529 2016-03-11 2.862986
6530 2016-03-14 -2.783300
6531 2016-03-15 -1.226994
6532 2016-03-16 9.730849
6533 2016-03-17 3.207547
6534 2016-03-18 2.193784
6535 2016-03-21 3.041145
6536 2016-03-22 0.694444
6537 2016-03-23 -9.482759
6538 2016-03-24 0.571429
6539 2016-03-28 3.030303
6540 2016-03-29 4.963235
6541 2016-03-30 -1.050788
6542 2016-03-31 -0.530973
[568 rows x 2 columns]
Press any key to continue . . .
在评论中,您可以使用groupby:
查找每月总数#change the previous last line of code to this
df=df[['Date','year','month','Close%pd']][(df.month == 3)]
#make a new dataframe
new_df = df.groupby(['year','month']).sum()
另一种方法是使用resample
命令(docs)。这可能是获得每周总数的最佳方法,特别是因为您没有指示"一年中的几周"的变量,而这是您将传递给groupby的变量。
df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals