在pandas系列中使用os.listdir(os.path.join)从变量中获取文件夹中的文件列表

任务

在excel中，我在单元格中有A1=数字PPAS，A2=1973.01，A3=1975.01，A4=1975.02我使用的单元格A2、A3、A4是文件夹的名称；1973.01〃"1975.01〃"1975.02"；。我使用它们访问目录F:/Comune/Breggia_test/1973.01、F:/Comune/Breggion_test/1975.01、F://Comune/Breggia_test1975.02。对于每个目录，我都想要文件列表。

import pandas as pd
df = pd.read_excel (r'P:/Breggia_Tresa_ZP_test.xlsx')
y=df['numero PPAS']
print(y)

结果如下：

0 1973.01

1 1975.01

2 1975.02

名称：numero PPAS，数据类型：float64

下一步，我将序列转换为字符串，并删除单元格值之前的干扰索引(0，1，2(。

for index, value in y.items():
z=f" {index} : {value}"
k=z[-7:]
print(k)

结果如下，它是一个字符串(由未显示的类型函数确认(：

1973.01

1975.01

1975.02

我知道os.path.join只接受字符串，现在应该可以了，因为上面的for循环有items函数。现在我想获得1973.01(第一次迭代(、1975.01(第二次迭代(和1975.01中的三个文件列表。

for item in k:
item=os.listdir(os.path.join('F:/Comune/Breggia_test', k) )
print(item)

但不幸的是，结果是F:\Comune\Breggia_test\1975.02的列表重复了七次，与用k=z[-7:]创建的字符串的字符数相同：

[apm_19761129.pdf'，'apcst_19780823.pdf'和'apad_19771213.pdf']

所希望的结果必须是来自以下目录的三个列表：

F： \Comune\Breggia_test\1973.01

F： \Comune\Breggia_test\1975.01

F： \Comune\Breggia_test\1975.02

有人能解释一下什么不起作用吗？

我不知道我是否理解你想做什么，但以下是如何获取Pandas Series，将其与基本路径组合，并列出其中的所有目录：


from pathlib import Path
import pandas as pd

# Path we'll be using as common base path.
base_path = Path(r'/content/sample_data')
# Our initial dataset. We'll be using `Pandas.Series`, and `pandas.DataFrame` common operation called:

# `.asype` at the end of the next code block represents the conversion into strings.
y = pd.Series([1973.01, 1975.01, 1975.02], name='PPAS').astype(str)

现在，根据您想要检索的内容，选择以下代码之一。

选项1：仅检索即时文件和目录

代码：


base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [
str(file) for file in base_path.joinpath(value).glob('*')]
).to_list()

在我的情况下，它返回：


[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints'],
['/content/sample_data/1975.02/california_housing_train.csv']]

选项2：仅检索即时文件


base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('*') if file.is_file()]
).to_list()
list_of_subdirs

在我的情况下，它返回：


[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]

选项3：递归检索所有子目录


base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*')]
).to_list()
list_of_subdirs

在我的情况下，它返回：


[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]

选项4：仅检索所有子目录中的文件


base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*') if file.is_file()]
).to_list()
list_of_subdirs

在我的情况下，它返回：


[
['/content/sample_data/1973.01/README.md'],
[
'/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'
],
['/content/sample_data/1975.02/california_housing_train.csv']
]

对于一些附加的上下文，这里是所有子目录的树状视图：

sample_data
├── 1973.01
│   └── README.md
├── 1975.01
│   ├── 1975.01
│   │   └── mnist_test.csv
│   └── california_housing_test.csv
├── 1975.02
│   └── california_housing_train.csv
├── anscombe.json
└── mnist_train_small.csv

选项1：仅检索即时文件和目录

选项2：仅检索即时文件

选项3：递归检索所有子目录

选项4：仅检索所有子目录中的文件

相关内容

最新更新

热门标签：