我正在尝试使用for循环自动加载12个具有相似名称的pickle文件。
我有3个不同城市的AirBnB数据(泽西市,纽约市和里约热内卢),每个城市有4种类型的文件(列表,日历,地区和评论);我总共有12个文件,文件名非常相似(city_fileType.pkl)。
jc_listings.pkl, jc_calendar.pkl, jc_locale.pkl, jc_reviews.pkl # Jersey city dataset
nyc_listings.pkl, nyc_calendar.pkl , nyc_locale.pkl, nyc_reviews # New York City dataset
rio_listings.pkl, rio_calendar.pkl, rio_locale.pkl, rio_reviews.pkl # Rio city dataset
我正在尝试自动加载这些文件。
当我运行代码时:
path_data = '../Data/' # local path
jc_listings = pd.read_pickle(path_data+'jc_listings.pkl')
jc_listings.info()
但是当我尝试自动化时,它确实工作正常。我正在尝试:
# load data
path_data = '../Data/'
#list of all data names
city_data = ['jc_listings','jc_calendar','jc_locale','jc_reviews',
'nyc_listings','nyc_calendar','nyc_locale','nyc_reviews',
'rio_listings','rio_calendar','rio_locale','rio_reviews']
# loop to load all the data with respective name
for city in city_data:
data_name = city
print(data_name) # just to inspect and troubleshoot
city = pd.read_pickle(path_data+data_name+'.pkl')
print(type(city)) # just to inspect and troubleshoot
这运行没有错误,打印输出看起来很好。然而当我尝试
rio_reviews.info()
我得到以下错误:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In [37], line 3
1 # inspecting the data
----> 3 rio_reviews.info()
NameError: name 'rio_reviews' is not defined
我建议您采用另一种方法:
import pandas as pd
from pathlib import Path
data = Path('../Data')
cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}
for city in cities:
for file in files:
dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')
这将给出一个字典dfs
,您可以从其中访问每个城市数据,如下所示:
dfs['jc']['listings'].info()
dfs['rio']['reviews'].info()
…例如,
我们可以使用itertools.product
进一步简化代码:
import pandas as pd
from pathlib import Path
from itertools import product
data = Path('../Data')
cities = ['jc', 'nyc', 'rio']
files = ['listings', 'calendar', 'locale', 'reviews']
dfs = {}
for city, file in product(cities, files):
dfs[city][file] = pd.read_pickle(data / f'{city}_{file}.pkl')
看起来您已经在city中存储了所有数据,但还没有定义"rio_reviews"变量,这就是为什么你得到这个错误