使用滚动平均函数向pandas数据框添加新列时出错



我有一个脚本,我从网上下载一些外汇汇率,并想计算滚动平均值。在运行该脚本时,我获得了一个与我试图计算滚动平均值的rates列相关的错误。我想生成一个显示滚动平均值的额外列。这是我目前所知道的。注释上面的最后3行似乎是错误所在。

现在我得到以下错误"KeyError: 'rates'">

import pandas as pd
import matplotlib.pyplot as plt
url1 = 'http://www.bankofcanada.ca/'
url2 = 'valet/observations/group/FX_RATES_DAILY/csv?start_date='
start_date = '2017-01-03'  # Earliest start date is 2017-01-03
url = url1 + url2 + start_date  # Complete url to download csv file
# Read in rates for different currencies for a range of dates
rates = pd.read_csv(url, skiprows=39, index_col='date')
rates.index = pd.to_datetime(rates.index)  # assures data type to be a datetime
print("The pandas dataframe with the rates ")
print(rates)
# Get number of days & number of currences from shape of rates - returns a tuple in the 
#format (rows, columns)
days, currencies = rates.shape
# Read in the currency codes & strip off extraneous part. Uses url string, skips the first 
#10 rows and returns to the data frame columns of index 0 and 2. It will read n rows according
# to the variable currencies. This was returned in line 19 from a tuple produced by .shape
codes = pd.read_csv(url, skiprows=10, usecols=[0,2],
nrows=currencies)
#Print out the dataframe read from the web
print("Dataframe with the codes")
print(codes)
#A for loop to goe through the codes dataframe. For each ith row and for the index 1 column, 
# the for loop will split the string with a string 'to Canadian' 
for i in range(currencies):
codes.iloc[i, 1] = codes.iloc[i, 1].split(' to Canadian')[0]
# Report exchange rates for the most most recent date available
date = rates.index[-1]  # most recent date available
print('nCurrency values on {0}'.format(date))
#Using a for loop and zip, the values in the code and rate objects are grouped together 
# and then printed to the screen with a new format
for (code, rate) in zip(codes.iloc[:, 1], rates.loc[date]):
print("{0:20s}  Can$ {1:8.6g}".format(code, rate))
#Assign values into a dataframe/slice rates dataframe
FXAUDCAD_daily = pd.DataFrame(index=['dates'], columns={'dates', 'rates'})
FXAUDCAD_daily = FXAUDCAD
FXAUDCAD_daily['rolling mean'] = FXAUDCAD_daily.loc['rates'].rolling_mean()
print(FXAUDCAD_daily)
#Print the values to the screen
#Calculate the rolling average using the rolling average pandas function
#Create a figure object using matplotlib/pandas
#Plot values on figure on the figure object. 

新更新的代码使用反馈,我做了以下以pd方式导入熊猫进口matplotlib。Pyplot为PLT进口datetime

url1 = 'http://www.bankofcanada.ca/'
url2 = 'valet/observations/group/FX_RATES_DAILY/csv?start_date='
start_date = '2017-01-03'  # Earliest start date is 2017-01-03
url = url1 + url2 + start_date  # Complete url to download csv file
# Read in rates for different currencies for a range of dates
rates = pd.read_csv(url, skiprows=39, index_col='date')
rates.index = pd.to_datetime(rates.index)  # assures data type to be a     
datetime
#print("The pandas dataframe with the rates ")
#print(rates)
# Get number of days & number of currences from shape of rates - returns     
#a tuple in the 
#format (rows, columns)
days, currencies = rates.shape
# Read in the currency codes & strip off extraneous part. Uses url     
string, skips the first 
#10 rows and returns to the data frame columns of index 0 and 2. It will 
#read n rows according
# to the variable currencies. This was returned in line 19 from a tuple         
#produced by .shape
codes = pd.read_csv(url, skiprows=10, usecols=[0,2],
nrows=currencies)
#Print out the dataframe read from the web
#print("Dataframe with the codes")
#print(codes)
#A for loop to goe through the codes dataframe. For each ith row and for     
#the index 1 column, 
# the for loop will split the string with a string 'to Canadian' 
for i in range(currencies):
codes.iloc[i, 1] = codes.iloc[i, 1].split(' to Canadian')[0]
# Report exchange rates for the most most recent date available
date = rates.index[-1]  # most recent date available
#print('nCurrency values on {0}'.format(date))
#Using a for loop and zip, the values in the code and rate objects are     
grouped together 
# and then printed to the screen with a new format
#for (code, rate) in zip(codes.iloc[:, 1], rates.loc[date]):
#print("{0:20s}  Can$ {1:8.6g}".format(code, rate))  
#Create dataframe with columns of date and raters
#Assign values into a dataframe/slice rates dataframe
FXAUDCAD_daily = pd.DataFrame(index=['date'], columns={'date', 'rates'})
FXAUDCAD_daily = rates['FXAUDCAD']
print(FXAUDCAD_daily)
FXAUDCAD_daily['rolling mean'] = 
FXAUDCAD_daily['rates'].rolling(1).mean()

让我们试着修复你的代码。首先,这一行对我来说似乎有点奇怪,因为FXAUDCAD没有定义。

FXAUDCAD_daily = FXAUDCAD

那么,你可以考虑重写你的滚动平均值计算如下:

FXAUDCAD_daily['rolling mean'] = FXAUDCAD_daily['rates'].rolling(WINDOW_SIZE).mean()

你的熊猫版本是什么?熊猫0.18.0以上版本不支持Pd.rolling_mean ()

更新你的pandas库:

pip3 install --upgrade pandas
然后使用rolling()方法https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html):
FXAUDCAD_daily['rolling mean'] = FXAUDCAD_daily['rates'].rolling(*window_size*).mean()

我设法解决了这个问题,当我将原始数据帧速率切片到FXAUDCAD_daily时,它已经带有相同的日期索引。所以我得到一个关键错误,因为货币缩写被用作索引为1的列的名称,而不是字符串'rate'

但是现在我有另一个小问题,我如何将FXAUDCAD列重命名为仅rate。我将发布另一个问题

import pandas as pd
import matplotlib.pyplot as plt
import datetime
url1 = 'http://www.bankofcanada.ca/'
url2 = 'valet/observations/group/FX_RATES_DAILY/csv?start_date='
start_date = '2017-01-03'  
url = url1 + url2 + start_date  

rates = pd.read_csv(url, skiprows=39, index_col='date')
rates.index = pd.to_datetime(rates.index)  # assures data type to be a     
datetime
print("Print rates to the screen",rates)
#print index
print("Print index to the screen", rates.index)
days, currencies = rates.shape
codes = pd.read_csv(url, skiprows=10, usecols=[0,2],
nrows=currencies)
for i in range(currencies):
codes.iloc[i, 1] = codes.iloc[i, 1].split(' to Canadian')[0]
#date = rates.index[-1]  
#Make a series of just the rates of FXAUDCAD
FXAUDCAD_daily = pd.DataFrame(rates['FXAUDCAD'])
#Print FXAUDRATES to the screen
print(FXAUDCAD_daily)
#Calculate the MA using the rolling function with a window size of 1
FXAUDCAD_daily['rolling mean'] =         
FXAUDCAD_daily['FXAUDCAD'].rolling(1).mean()
#print out the new dataframe with calculation
print(FXAUDCAD_daily)
#Rename one of the data frame from FXAUDCAD to Exchange Rate
FXAUDCAD_daily.rename(columns={'rate':'FXAUDCAD'})
#print out the new dataframe with calculation
print(FXAUDCAD_daily)

最新更新