不知道该如何问这个问题,但它已经开始了。
我有这个df:
df
JOB_STREAM_NAME JOB_NAME JOB_Command
0 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_INVE_D /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_EMPF_D /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS110001_D /data/app_next_best_action/call_nba_as11.sh
3 P26_AAIN_TOD PP_AAIN_SPARK_CDLC_ING_DFLT_D /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing
我想在JOB_COMMAND 中的树结构的第4项中获得日期(来自Linux SO(
文件夹aanx-dataeng-slas-sysyphus
:
[m292121@mz-vl-vb-415 ~]$ ll /data/application/AANX/
total 1348
ldrwxrwsr-x 12 root bgdt 4096 Sep 26 11:30 aanx-dataeng-slas-sysyphus
这里没有第四项,所以它得到了最后一项,这是一个文件call_nba_as11.sh
[m292121@al-vl-vb-408 ~]$ ll /data/app_next_best_action/call_nba_as11.sh
-rwxrwsr-x 1 root bgdt 371 Sep 20 19:20 /data/app_next_best_action/call_nba_as11.sh
文件夹aain-srv-motor-extracao-next
:
[m292121@mz-vl-vb-415 ~]$ ll /data/application/AAIN/
total 136
ldrwxrwsr-x 12 root bgdt 4096 Jul 15 10:30 aain-srv-motor-extracao-next
基本上我试图实现这个
df
JOB_STREAM_NAME JOB_NAME Last_Update JOB_Command
0 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_INVE_D 2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_EMPF_D 2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS110001_D 2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3 P26_AAIN_TOD PP_AAIN_SPARK_CDLC_ING_DFLT_D 2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing
我想把JOB_COMMAND拆分成一个新的列,并用它来进行搜索,但我仍然需要弄清楚如何获取信息。
有什么想法吗?
使用您提供的数据帧:
import pandas as pd
df = pd.DataFrame(
{
"JOB_STREAM_NAME": [
"P26_NEXT_MAU_TOD",
"P26_NEXT_MAU_TOD",
"P26_NEXT_NBA_TOD",
"P26_AAIN_TOD",
],
"JOB_NAME": [
"PP_NEXT_RTBA_MAU_IND_INVE_D",
"PP_NEXT_RTBA_MAU_IND_EMPF_D",
"PP_NEXT_NBA_AS110001_D",
"PP_AAIN_SPARK_CDLC_ING_DFLT_D",
],
"JOB_Command": [
"/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh",
"/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh",
"/data/app_next_best_action/call_nba_as11.sh",
"/data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing",
],
}
)
这里有一种使用Python标准库pathlib和datetime模块的方法:
import datetime
import numpy as np
from pathlib import Path
def get_fourth_elem(file_path):
"""Helper function.
Args:
file_path: file path as a string.
Returns:
absolute path to the fourth element (or last one if shorter) as a Pathlib object.
"""
file_path_length = len(file_path.strip("/").split("/"))
file_path = Path(file_path)
if file_path_length > 4:
for _ in range(file_path_length - 4):
file_path = Path(file_path.parent)
return file_path
else:
return file_path
df["Last_Update"] = df["JOB_Command"].apply(
lambda x: datetime.datetime.fromtimestamp(
get_fourth_elem(x).stat().st_mtime
).strftime("%Y-%m-%d %H:%H:%S")
if Path(x).exists()
else np.nan
)
df = df.reindex(columns=["JOB_STREAM_NAME", "JOB_NAME", "Last_Update", "JOB_Command"])
print(df)
# Output
JOB_STREAM_NAME JOB_NAME Last_Update JOB_Command
0 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_INVE_D 2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1 P26_NEXT_MAU_TOD PP_NEXT_RTBA_MAU_IND_EMPF_D 2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS110001_D 2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3 P26_AAIN_TOD PP_AAIN_SPARK_CDLC_ING_DFLT_D 2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing