如何使此函数可迭代(获取索引错误)



我对python和编码相当陌生。 我有一个大数据文件,提供2011-2018年期间一些股票行情(300~(的每日数据。

数据是一个大约 150k 行的.csv文件,如下所示(简短示例(:

Date,Symbol,ShortExemptVolume,ShortVolume,TotalVolume
20110103,AAWW,0.0,28369,78113.0
20110103,AMD,0.0,3183556,8095093.0
20110103,AMRS,0.0,14196,18811.0
20110103,ARAY,0.0,31685,77976.0
20110103,ARCC,0.0,177208,423768.0
20110103,ASCMA,0.0,3930,26527.0
20110103,ATI,0.0,193772,301287.0
20110103,ATSG,0.0,23659,72965.0
20110103,AVID,0.0,7211,18896.0
20110103,BMRN,0.0,21740,213974.0
20110103,CAMP,0.0,2000,11401.0
20110103,CIEN,0.0,625165,1309490.0
20110103,COWN,0.0,3195,24293.0
20110103,CSV,0.0,6133,25394.0

我有一个函数,允许我过滤特定品种,并在指定日期(可能是 2011 年至 2018 年之间的任何日期(之前和之后获得 10 个观测值。

import pandas as pd
from datetime import datetime
import urllib
import datetime
def get_data(issue_date, stock_ticker):
df = pd.read_csv (r'D:ProjectDataShort_Interestexampledata.csv')
df['Date'] = pd.to_datetime(df['Date'], format="%Y%m%d")
d = df
df = pd.DataFrame(d)
short = df.loc[df.Symbol.eq(stock_ticker)]
# get the index of the row of interest
ix = short[short.Date.eq(issue_date)].index[0]
# get the item row for that row's index
iloc_ix = short.index.get_loc(ix)
# get the +/-1 iloc rows (+2 because that is how slices work), basically +1 and -1 trading days
short_data = short.iloc[iloc_ix-10: iloc_ix+11]
return [short_data]

我想创建一个迭代"issue_dates"和"stock_tickers"列表的脚本。列表(.csv(如下所示:

ARAY,07/08/2017
ARAY,24/04/2014
ACETQ,16/11/2015
ACETQ,16/11/2015
NVLNA,15/08/2014
ATSG,29/09/2017
ATI,24/05/2016
MDRX,18/06/2013
MDRX,18/06/2013
AMAGX,10/05/2017
AMAGX,14/02/2014
AMD,14/09/2016

为了分解我的问题和问题,我想知道如何执行以下操作:

首先,如何加载输入?

其次,如何在每个输入上调用函数?

最后,如何将所有函数返回累积到一个数据帧中?

加载输入并为每一行调用函数;循环访问 csv 文件并将每行的值传递给函数,并将生成的Series累积到列表中。

我稍微修改了一下您的函数:删除了 DataFrame 创建,因此它只完成一次,并添加了一个try/except块来解释缺少的日期或股票代码(您的示例数据不太匹配(。第二个 csv 中的日期看起来像是day/month/year所以我将它们转换为该格式。

import pandas as pd
import datetime, csv
def get_data(df, issue_date, stock_ticker):
'''Return a Series for the ticker centered on the issue date.
'''
short = df.loc[df.Symbol.eq(stock_ticker)]
# get the index of the row of interest
try:
ix = short[short.Date.eq(issue_date)].index[0]
# get the item row for that row's index
iloc_ix = short.index.get_loc(ix)
# get the +/-1 iloc rows (+2 because that is how slices work), basically +1 and -1 trading days
short_data = short.iloc[iloc_ix-10: iloc_ix+11]
except IndexError:
msg = f'no data for {stock_ticker} on {issue_date}'
#log.info(msg)
print(msg)
short_data = None
return short_data
df = pd.read_csv (datafile)
df['Date'] = pd.to_datetime(df['Date'], format="%Y%m%d")
results = []
with open('issues.csv') as issues:
for ticker,date in csv.reader(issues):
day,month,year = map(int,date.split('/'))
#    dt = datetime.datetime.strptime(date, r'%d/%m/%Y')
date = datetime.date(year,month,day)
s = get_data(df,date,ticker)
results.append(s)
#    print(s)

为所有这些信息创建单个数据帧或表可能会出现问题,尤其是因为日期范围都不同。可能应该问一个单独的问题。它的 mcve 可能应该只包括一些最小的熊猫系列,有几个不同的日期范围和代码。

最新更新