任务
在excel中,我在单元格中有A1=数字PPAS,A2=1973.01,A3=1975.01,A4=1975.02我使用的单元格A2、A3、A4是文件夹的名称;1973.01〃"1975.01〃"1975.02";。我使用它们访问目录F:/Comune/Breggia_test/1973.01、F:/Comune/Breggion_test/1975.01、F://Comune/Breggia_test1975.02。对于每个目录,我都想要文件列表。
import pandas as pd
df = pd.read_excel (r'P:/Breggia_Tresa_ZP_test.xlsx')
y=df['numero PPAS']
print(y)
结果如下:
0 1973.01
1 1975.01
2 1975.02
名称:numero PPAS,数据类型:float64
下一步,我将序列转换为字符串,并删除单元格值之前的干扰索引(0,1,2(。
for index, value in y.items():
z=f" {index} : {value}"
k=z[-7:]
print(k)
结果如下,它是一个字符串(由未显示的类型函数确认(:
1973.01
1975.01
1975.02
我知道os.path.join只接受字符串,现在应该可以了,因为上面的for循环有items函数。现在我想获得1973.01(第一次迭代(、1975.01(第二次迭代(和1975.01中的三个文件列表。
for item in k:
item=os.listdir(os.path.join('F:/Comune/Breggia_test', k) )
print(item)
但不幸的是,结果是F:\Comune\Breggia_test\1975.02的列表重复了七次,与用k=z[-7:]创建的字符串的字符数相同:
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
[apm_19761129.pdf','apcst_19780823.pdf'和'apad_19771213.pdf']
所希望的结果必须是来自以下目录的三个列表:
F: \Comune\Breggia_test\1973.01
F: \Comune\Breggia_test\1975.01
F: \Comune\Breggia_test\1975.02
有人能解释一下什么不起作用吗?
我不知道我是否理解你想做什么,但以下是如何获取Pandas Series
,将其与基本路径组合,并列出其中的所有目录:
from pathlib import Path
import pandas as pd
# Path we'll be using as common base path.
base_path = Path(r'/content/sample_data')
# Our initial dataset. We'll be using `Pandas.Series`, and `pandas.DataFrame` common operation called:
# `.asype` at the end of the next code block represents the conversion into strings.
y = pd.Series([1973.01, 1975.01, 1975.02], name='PPAS').astype(str)
现在,根据您想要检索的内容,选择以下代码之一。
选项1:仅检索即时文件和目录
代码:
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [
str(file) for file in base_path.joinpath(value).glob('*')]
).to_list()
在我的情况下,它返回:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints'],
['/content/sample_data/1975.02/california_housing_train.csv']]
选项2:仅检索即时文件
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('*') if file.is_file()]
).to_list()
list_of_subdirs
在我的情况下,它返回:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]
选项3:递归检索所有子目录
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*')]
).to_list()
list_of_subdirs
在我的情况下,它返回:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]
选项4:仅检索所有子目录中的文件
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*') if file.is_file()]
).to_list()
list_of_subdirs
在我的情况下,它返回:
[
['/content/sample_data/1973.01/README.md'],
[
'/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'
],
['/content/sample_data/1975.02/california_housing_train.csv']
]
对于一些附加的上下文,这里是所有子目录的树状视图:
sample_data
├── 1973.01
│ └── README.md
├── 1975.01
│ ├── 1975.01
│ │ └── mnist_test.csv
│ └── california_housing_test.csv
├── 1975.02
│ └── california_housing_train.csv
├── anscombe.json
└── mnist_train_small.csv