从Jupyter笔记本文件夹中打开多个泡菜文件不起作用



我在服务器上使用jupyter笔记本电脑(文件夹不在我的电脑上(。我有一个文件夹,里面有30个数据帧,它们有完全相同的列。它们都保存在下一个路径中:

Reut/folder_no_one/here_the_files_located

我想打开它们并连接它们。我知道我可以做这样的事情:

df1=pd.read_pickle('table1')
df2=pd.read_pickle('table2')
df3=pd.read_pickle('table3')
...
#and then concat

但我相信有更好、更聪明的方法可以做到这一点。我已经尝试打开所有的文件,并分别保存如下:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}

但是我有一个错误:

----------------------------------------------------------------------------TypeError Traceback(最近调用最后(---->1{f"df{num}":num的pd.read_pickle(文件(,枚举中的文件(Path(Path_to_files(.glob('*.pickle'(}

TypeError:"str"对象不可调用

我试着播放并放入不同版本的路径,也没有放入路径(因为我的笔记本就是这些文件所在的地方(,但我一直收到同样的错误。

*值得一提的是,当笔记本也在该文件夹中时,我可以在不指定路径的情况下打开这些文件

我的最终目标是自动打开并将所有这些表合并为一个大表

编辑:我也试过这个:

path = r'file_name/file_location_with_all_pickles'
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

以及

path_to_files = r'file_name/file_location_with_all_pickles'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pkl"):
print(table)
tables.append(pd.read_pickle(table))

但在这两种情况下我都有错误

ValueError:没有要连接的对象当我尝试连接时。而且当我告诉它打印文件名/表时,它什么也不做。此外,如果在循环中,我尝试只打印普通字符串(如print('hello'(,则不会发生任何事情。路径似乎有问题,但当我打开这样一个特定的泡菜时:

pd.read_pickle(r'file_name/file_location_with_all_pickles/specific_table.pkl')

它打开了。

'更新:

这最终对我起到了作用:

import pandas as pd
import glob
path = r'folder' # use your path
all_files = glob.glob(path + "/*.pkl")
li = []
for filename in all_files:
df = pd.read_pickle(filename)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

从这里(从Jupyter笔记本文件夹打开多个pickle文件不起作用(

怎么样:

path_to_files = r'Reut/here_the_files_located'
df = pd.concat([pd.read_pickle(f'{path_to_files}/table{num}.pickle') for num in range(1, 33)])

这相当于:

path_to_files = r'Reut/here_the_files_located'
tables = []
for num in range(1, 33):
filename = f'{path_to_files}/table{num}.pickle'
print(filename)
tables.append(pd.read_pickle(filename))
df = pd.concat(tables)

输出:

Reut/here_the_files_located/table1.pickle
Reut/here_the_files_located/table2.pickle
Reut/here_the_files_located/table3.pickle
Reut/here_the_files_located/table4.pickle
Reut/here_the_files_located/table5.pickle
Reut/here_the_files_located/table6.pickle
Reut/here_the_files_located/table7.pickle
Reut/here_the_files_located/table8.pickle
Reut/here_the_files_located/table9.pickle
Reut/here_the_files_located/table10.pickle
Reut/here_the_files_located/table11.pickle
Reut/here_the_files_located/table12.pickle
Reut/here_the_files_located/table13.pickle
Reut/here_the_files_located/table14.pickle
Reut/here_the_files_located/table15.pickle
Reut/here_the_files_located/table16.pickle
Reut/here_the_files_located/table17.pickle
Reut/here_the_files_located/table18.pickle
Reut/here_the_files_located/table19.pickle
Reut/here_the_files_located/table20.pickle
Reut/here_the_files_located/table21.pickle
Reut/here_the_files_located/table22.pickle
Reut/here_the_files_located/table23.pickle
Reut/here_the_files_located/table24.pickle
Reut/here_the_files_located/table25.pickle
Reut/here_the_files_located/table26.pickle
Reut/here_the_files_located/table27.pickle
Reut/here_the_files_located/table28.pickle
Reut/here_the_files_located/table29.pickle
Reut/here_the_files_located/table30.pickle
Reut/here_the_files_located/table31.pickle
Reut/here_the_files_located/table32.pickle

关于你的代码的一些评论:

num=list(range(1, 33)) #number of tables I have in the folder
path_to_files=r'Reut/here_the_files_located'
Path=r'Reut/folder_no_one/here_the_files_located'
{f"df{num}" : pd.read_pickle(file) for num, file in enumerate(Path(path_to_files).glob('*.pickle'))}
num=list(range(1, 33)) #number of tables I have in the folder

不需要使用range创建list。在for循环或列表/字典理解中直接使用range非常有效。

Path=r'Reut/folder_no_one/here_the_files_located'

我猜您以前已经从pathlib导入了Path类。如果要像正常情况一样调用Path,则需要为该变量选择另一个名称。这就是您得到错误TypeError: 'str' object is not callable的原因。


如果表名"不相同",有没有其他方法可以使用它?例如,如果一个是表1,一个是数据帧3,那么读取它们并不取决于它们的名称

当然。假设所有保存的表的文件名都以.pickle结尾,则可以像第一次尝试的那样使用glob方法。别忘了import pathlib

import pathlib
path_to_files = r'Reut/here_the_files_located'
tables = []
for table in pathlib.Path(path_to_files).glob("*.pickle"):
tables.append(pd.read_pickle(table))
df = pd.concat(tables)

最新更新