我有一个像这样的数据帧:
file name folder 1 folder 2 folder 3
0 1.txt 2021-05-11 14:04:30 2021-05-11 15:04:30 2021-05-11 16:04:30
1 2.txt 2021-05-12 14:04:30 2021-05-12 15:04:30 NaN
2 3.txt 2021-05-13 14:04:30 NaN NaN
所以,我有一些文件,当它们移动到另一个文件夹时,我注册了时间戳。基本上,我想要的是在这个数据帧名称中创建另一列";文件的位置";,因此,如果最后一个时间戳在特定的文件夹中,请将文件夹名称放在新列中。理想的输出看起来像这样:
file name folder 1 folder 2 folder 3 Location of the file
0 1.txt 2021-05-11 14:04:30 2021-05-11 15:04:30 2021-05-11 16:04:30 file is in folder 3
1 2.txt 2021-05-12 14:04:30 2021-05-12 15:04:30 NaN file is in folder 2
2 3.txt 2021-05-13 14:04:30 NaN NaN file is in folder 1
编辑:
嘿@Andrej,在我的真实场景中,我在文件夹1和文件名之间有一些列,就像我放在那里告诉文件类型的那个列。如果我使用您的解决方案,那么文件4.xlsx的位置将打印如下数据帧:
file name filetype .. folder 1 .. folder 3 Loc. of file
0 1.txt txt .. 2021-05-11 14:04:30 .. 2021-05-11 16:04:30 file is in folder 3
1 2.txt txt .. 2021-05-12 14:04:30 .. NaN file is in folder 2
2 3.txt txt .. 2021-05-13 14:04:30 .. NaN file is in folder 3
3 4.xlsx xlsx .. NaN .. NaN file is in filetype
我能以某种方式仅从列"中获得文件的位置吗;文件夹1"文件夹2";以及";文件夹3";。如果文件夹1中没有时间戳;文件的位置";列,如下df所示:
file name filetype .. folder 1 .. folder 3 Loc. of file
0 1.txt txt .. 2021-05-11 14:04:30 .. 2021-05-11 16:04:30 file is in folder 3
1 2.txt txt .. 2021-05-12 14:04:30 .. NaN file is in folder 2
2 3.txt txt .. 2021-05-13 14:04:30 .. NaN file is in folder 3
3 4.xlsx xlsx .. NaN .. NaN NaN
df["Location of the file"] = df.apply(
lambda x: "File is in {}".format(x.index[x.notna()][-1]),
axis=1,
)
print(df)
打印:
file name folder 1 folder 2 folder 3 Location of the file
0 1.txt 2021-05-11 14:04:30 2021-05-11 15:04:30 2021-05-11 16:04:30 File is in folder 3
1 2.txt 2021-05-12 14:04:30 2021-05-12 15:04:30 NaN File is in folder 2
2 3.txt 2021-05-13 14:04:30 NaN NaN File is in folder 1
编辑:处理所有NaN
:
df["Location of the file"] = df.loc[:, "folder 1":].apply(
lambda x: "File is in {}".format(x.index[x.notna()][-1])
if not x.isna().all()
else np.nan,
axis=1,
)
print(df)
打印:
file name folder 1 folder 2 folder 3 Location of the file
0 1.txt 2021-05-11 14:04:30 2021-05-11 15:04:30 2021-05-11 16:04:30 File is in folder 3
1 2.txt 2021-05-12 14:04:30 2021-05-12 15:04:30 NaN File is in folder 2
2 3.txt NaN NaN NaN NaN
您可以在axis=1
(列索引(上使用.idxmax()
来获取具有最大日期时间值的元素的索引标签和文本concat。使用"file is in"获取新的列内容,如下所示:
df['Location of the file'] = 'file is in ' + df.iloc[:, 1:].idxmax(axis=1)
请注意,此解决方案并不假定文件夹1、文件夹2、文件夹3是按时间顺序创建的(文件夹3晚于文件夹2和文件夹1(。只要这3列包含日期时间字段,就可以进行比较以获得最新时间
结果:
print(df)
file name folder 1 folder 2 folder 3 Location of the file
0 1.txt 2021-05-11 14:04:30 2021-05-11 15:04:30 2021-05-11 16:04:30 file is in folder 3
1 2.txt 2021-05-12 14:04:30 2021-05-12 15:04:30 NaT file is in folder 2
2 3.txt 2021-05-13 14:04:30 NaT NaT file is in folder 1