程序在 csv 数据集中找不到特定值



我正在尝试编写一个程序,使用matplotlib从GitHub和图形covid病例中获取CSV文件。

我给程序添加了注释,所以它应该是不言自明的。

代码的第一部分是错误,第二部分是程序本身。

它给了我这个错误,据我所知,它无法定位奥兰治县的数据。

Traceback (most recent call last):
File ~anaconda3libsite-packagespandascoreindexesbase.py:3621 in get_loc
return self._engine.get_loc(casted_key)
File pandas_libsindex.pyx:136 in pandas._libs.index.IndexEngine.get_loc
File pandas_libsindex.pyx:163 in pandas._libs.index.IndexEngine.get_loc
File pandas_libshashtable_class_helper.pxi:5198 in pandas._libs.hashtable.PyObjectHashTable.get_item
File pandas_libshashtable_class_helper.pxi:5206 in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Orange'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ~OneDriveDesktopCSC 314untitled0.py:48 in <module>
df.loc["Orange"]
File ~anaconda3libsite-packagespandascoreindexing.py:967 in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File ~anaconda3libsite-packagespandascoreindexing.py:1202 in _getitem_axis
return self._get_label(key, axis=axis)
File ~anaconda3libsite-packagespandascoreindexing.py:1153 in _get_label
return self.obj.xs(label, axis=axis)
File ~anaconda3libsite-packagespandascoregeneric.py:3864 in xs
loc = index.get_loc(key)
File ~anaconda3libsite-packagespandascoreindexesbase.py:3623 in get_loc
raise KeyError(key) from err
KeyError: 'Orange'

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Download the data from the internet
covid_url  = "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/"
covid_file = "time_series_covid19_deaths_US.csv"
covid = pd.read_csv(covid_url + covid_file, delimiter=",")
# Basic cleanup
covid = covid.rename(columns={"Admin2":"County", "Province_State":"State"})
unused_columns = ["UID", "iso2", "iso3", "code3", "FIPS", "Long_", "Lat", "Country_Region", "Combined_Key"]
covid = covid.drop(columns=unused_columns)
col_a = covid.columns.get_loc("1/22/20")                  # Index of the first data column
col_z = covid.shape[1]-1                                  # Index of the last column
for c in range(col_z, col_a, -1):                         # Walk backwards from the last column
covid.iloc[:, c] = covid.iloc[:,c] - covid.iloc[:,c-1]# Perform the subtraction

#merge the csv and txt datasets    
stats = pd.read_csv("california_county_stats.txt", delimiter=",")
covid = covid.set_index("County")
#filter down to CA counties and *then* perform the merge
covid = covid[covid["State"] == "California"]
covid = covid.drop(columns=['State'])
df = pd.merge(stats, covid, left_index=True, right_index=True)
first_column = df.columns.get_loc("1/22/20")
last_column = df.shape[1]-1
# Let's get the x- and y-values
df.loc["Orange"]
df.loc["Orange"][first_column:last_column]
y_vals = df.loc["Orange"][first_column:last_column]
x_vals = df.loc["Orange"][first_column:last_column].index
x_vals = [datetime.strptime(day, '%m/%d/%y') for day in x_vals]
# Plot the daily COVID case statistics for Orange County
plt.figure(figsize=(10,5))
plt.gca().yaxis.grid()
plt.bar(x_vals, y_vals, width=1, color="orangered")
plt.title("Daily New COVID-19 Cases in Orange County", fontsize=14, pad=15)
plt.show()
# Create a copy of the original dataframe to work from
df = covid.copy()
# Average the last seven days worth of positive COVID tests
df["sum"] = df.iloc[:,last_column-7:last_column].sum(axis=1)
df["avg"] = round(df["sum"] / 7, 1)
# Plot the data
plt.figure(figsize=(10,5))
plt.gca().yaxis.grid()
plt.plot(x_vals, y_vals, "-", color="orangered")
plt.fill_between(x_vals, y_vals, color="orangered", alpha=0.4)
plt.title("Daily New COVID-19 Cases in Orange County (7-Day Rolling Average)", fontsize=14, pad=15)
plt.show()

为了合并两个数据框,需要将stats的索引设置为County。将这行添加到代码中:

stats = pd.read_csv("california_county_stats.txt", delimiter=",")
covid = covid.set_index("County")
stats = stats.set_index("County") # <- add this
...

相关内容

  • 没有找到相关文章

最新更新