从Jupyter笔记本文件夹中打开多个泡菜文件不起作用

我在服务器上使用jupyter笔记本电脑(文件夹不在我的电脑上(。我有一个文件夹，里面有30个数据帧，它们有完全相同的列。它们都保存在下一个路径中：

Reut/folder_no_one/here_the_files_located

我想打开它们并连接它们。我知道我可以做这样的事情：

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

但我相信有更好、更聪明的方法可以做到这一点。我已经尝试打开所有的文件，并分别保存如下：

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

但是我有一个错误：

----------------------------------------------------------------------------TypeError Traceback(最近调用最后(---->1{f"df{num}"：num的pd.read_pickle(文件(，枚举中的文件(Path(Path_to_files(.glob('*.pickle'(}
TypeError:"str"对象不可调用

我试着播放并放入不同版本的路径，也没有放入路径(因为我的笔记本就是这些文件所在的地方(，但我一直收到同样的错误。

*值得一提的是，当笔记本也在该文件夹中时，我可以在不指定路径的情况下打开这些文件

我的最终目标是自动打开并将所有这些表合并为一个大表

编辑：我也试过这个：

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

以及

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
print(table)
tables.append(pd.read_pickle(table))

但在这两种情况下我都有错误

ValueError：没有要连接的对象当我尝试连接时。而且当我告诉它打印文件名/表时，它什么也不做。此外，如果在循环中，我尝试只打印普通字符串(如print('hello'(，则不会发生任何事情。路径似乎有问题，但当我打开这样一个特定的泡菜时：

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

它打开了。

'更新：

这最终对我起到了作用：

import pandas as pd
import glob
path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

从这里(从Jupyter笔记本文件夹打开多个pickle文件不起作用(

怎么样：

path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])

这相当于：

path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
filename = f'{path_to_files}/table{num}.pickle'
print(filename)
tables.append(pd.read_pickle(filename))
df = pd.concat(tables)

输出：

Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle

关于你的代码的一些评论：

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

num=list(range(1, 33)) #number of tables I have in the folder

不需要使用range创建list。在for循环或列表/字典理解中直接使用range非常有效。

Path=r'Reut/folder_no_one/here_the_files_located'

我猜您以前已经从pathlib导入了Path类。如果要像正常情况一样调用Path，则需要为该变量选择另一个名称。这就是您得到错误TypeError: 'str' object is not callable的原因。

如果表名"不相同"，有没有其他方法可以使用它？例如，如果一个是表1，一个是数据帧3，那么读取它们并不取决于它们的名称

当然。假设所有保存的表的文件名都以.pickle结尾，则可以像第一次尝试的那样使用glob方法。别忘了import pathlib。

import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
tables.append(pd.read_pickle(table))
df = pd.concat(tables)

相关内容

最新更新

热门标签：