在pandas DF中,如何将列表的最后两部分作为条件输出的单个字符串



我正在通过panda对CSV进行一些修改。对于其中一种情况,我希望使用将URL解析为列表,获取该列表的最后两项,并输出一个组合这两个元素的字符串。我想用一行代码来完成这项工作,我可以在np.where的情况下插入这行代码。

例如,在csv中,我有一个url:";https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"。我想输出一个字符串"0";ININ0000085013D_1664645;。到目前为止,我设法用实现了目标

from urllib.parse import parse
testurl = "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"
print(urlparse(testurl).path[1:].split('/')[2:])

然而,我需要urlparse行以一种格式提供字符串输出,我可以将其推送到np.where语句中,如下面所示,其中x是上面的字符串。

import pandas
import numpy as np
svc_df = pandas.read_csv(r"\filelocServiceLines.txt", 
usecols=['Location', 'URLName', 'createdate'],
dtype={'Location':'string', 'URLName':'string'},
parse_dates=['createdate'])
# Create FieldNote column based on URLName
svc_df['FieldNote'] = np.where(svc_df['URLName'].str.contains('servicecards'), x, svc_df['URLName'].apply(lambda x: x[x.rfind('/')+1:]))

我也觉得自己迷失在这里的杂草中,也许有更简单的方法可以做到这一点?我试图基本上基于URLName创建FieldNote列,它采用文件名(在last/之后(,除非URLName包含"servicecards"(只有重复的(,在这种情况下,我想要子文件夹名+文件名。

作为替代方案,您可以使用Pandasapply函数来描述与where命令类似的行为。

def get_field(d):
s = d.rsplit('/',2)
if 'servicecards' in d:
return '_'.join(s[-2:])
return s[-1]
df['FieldNote'] = df['URLName'].apply(get_field)
print(df)

df的输出

URLName                    FieldNote
0  https://companymax/servicecards/city/ININ0000085013D/1664645.TIF  ININ0000085013D_1664645.TIF
1   https://companymax/otherstring/city/ININ0000085013E/1664646.TIF                  1664646.TIF
2   https://companymax/otherstring/city/ININ0000085013F/1664647.TIF                  1664647.TIF
3   https://companymax/otherstring/city/ININ0000085013G/1664648.TIF                  1664648.TIF
4  https://companymax/servicecards/city/ININ0000085013H/1664649.TIF  ININ0000085013H_1664649.TIF
5  https://companymax/servicecards/city/ININ0000085013I/1664650.TIF  ININ0000085013I_1664650.TIF
6   https://companymax/otherstring/city/ININ0000085013J/1664651.TIF                  1664651.TIF
7  https://companymax/servicecards/city/ININ0000085013K/1664652.TIF  ININ0000085013K_1664652.TIF
8   https://companymax/otherstring/city/ININ0000085013L/1664653.TIF                  1664653.TIF

最新更新