python pandas regex从另一行查找模式



我有一个具有以下模式的python-pandas数据帧:

file_path
/home
/home/folder1
/home/folder1/file1.xlsx
/home/folder1/file2.xlsx
/home/folder2
/home/folder2/日期
/home/folder2/date/dates.txt
/home/folder3

使用pathlib.Path.parent提取父级,如下所示:

import pandas as pd
import pathlib
df = pd.DataFrame(["/home", "/home/folder1", "/home/folder1/file1.xlsx",
"/home/folder1/file1.xlsx", "/home/folder1/file2.xlsx", "/home/folder2",
"/home/folder2/date", "/home/folder2/date/dates.txt", "/home/folder3"], columns=["file_path"])

df["parent"] = df["file_path"].apply(lambda x: pathlib.Path(x).parent)
print(df)

输出

file_path              parent
0                         /home                   /
1                 /home/folder1               /home
2      /home/folder1/file1.xlsx       /home/folder1
3      /home/folder1/file1.xlsx       /home/folder1
4      /home/folder1/file2.xlsx       /home/folder1
5                 /home/folder2               /home
6            /home/folder2/date       /home/folder2
7  /home/folder2/date/dates.txt  /home/folder2/date
8                 /home/folder3               /home

以匹配准确的输出:

df["parent"] = df["file_path"].apply(lambda x: res if (res := pathlib.Path(x).parent) != pathlib.Path("/") else "ROOT")
print(df)

输出

file_path              parent
0                         /home                ROOT
1                 /home/folder1               /home
2      /home/folder1/file1.xlsx       /home/folder1
3      /home/folder1/file1.xlsx       /home/folder1
4      /home/folder1/file2.xlsx       /home/folder1
5                 /home/folder2               /home
6            /home/folder2/date       /home/folder2
7  /home/folder2/date/dates.txt  /home/folder2/date
8                 /home/folder3               /home

最新更新