从Linux中的一个文件夹中获取日期信息,并保存在我的DF中的一列中



不知道该如何问这个问题,但它已经开始了。

我有这个df:

df

JOB_STREAM_NAME         JOB_NAME                        JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing

我想在JOB_COMMAND 中的树结构的第4项中获得日期(来自Linux SO(

文件夹aanx-dataeng-slas-sysyphus:

[m292121@mz-vl-vb-415 ~]$ ll /data/application/AANX/
total 1348
ldrwxrwsr-x 12 root bgdt 4096 Sep 26 11:30 aanx-dataeng-slas-sysyphus

这里没有第四项,所以它得到了最后一项,这是一个文件call_nba_as11.sh

[m292121@al-vl-vb-408 ~]$ ll /data/app_next_best_action/call_nba_as11.sh
-rwxrwsr-x 1 root bgdt 371 Sep 20 19:20 /data/app_next_best_action/call_nba_as11.sh

文件夹aain-srv-motor-extracao-next:

[m292121@mz-vl-vb-415 ~]$ ll /data/application/AAIN/
total 136
ldrwxrwsr-x 12 root bgdt 4096 Jul 15 10:30 aain-srv-motor-extracao-next

基本上我试图实现这个

df

JOB_STREAM_NAME         JOB_NAME                        Last_Update         JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing

我想把JOB_COMMAND拆分成一个新的列,并用它来进行搜索,但我仍然需要弄清楚如何获取信息。

有什么想法吗?

使用您提供的数据帧:

import pandas as pd
df = pd.DataFrame(
{
"JOB_STREAM_NAME": [
"P26_NEXT_MAU_TOD",
"P26_NEXT_MAU_TOD",
"P26_NEXT_NBA_TOD",
"P26_AAIN_TOD",
],
"JOB_NAME": [
"PP_NEXT_RTBA_MAU_IND_INVE_D",
"PP_NEXT_RTBA_MAU_IND_EMPF_D",
"PP_NEXT_NBA_AS110001_D",
"PP_AAIN_SPARK_CDLC_ING_DFLT_D",
],
"JOB_Command": [
"/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh",
"/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh",
"/data/app_next_best_action/call_nba_as11.sh",
"/data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing",
],
}
)

这里有一种使用Python标准库pathlib和datetime模块的方法:

import datetime
import numpy as np
from pathlib import Path

def get_fourth_elem(file_path):
"""Helper function.
Args:
file_path: file path as a string.
Returns:
absolute path to the fourth element (or last one if shorter) as a Pathlib object.
"""
file_path_length = len(file_path.strip("/").split("/"))
file_path = Path(file_path)
if file_path_length > 4:
for _ in range(file_path_length - 4):
file_path = Path(file_path.parent)
return file_path
else:
return file_path
df["Last_Update"] = df["JOB_Command"].apply(
lambda x: datetime.datetime.fromtimestamp(
get_fourth_elem(x).stat().st_mtime
).strftime("%Y-%m-%d %H:%H:%S")
if Path(x).exists()
else np.nan
)
df = df.reindex(columns=["JOB_STREAM_NAME", "JOB_NAME", "Last_Update", "JOB_Command"])
print(df)
# Output
JOB_STREAM_NAME         JOB_NAME                        Last_Update         JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing

最新更新