我正在通过panda对CSV进行一些修改。对于其中一种情况,我希望使用将URL解析为列表,获取该列表的最后两项,并输出一个组合这两个元素的字符串。我想用一行代码来完成这项工作,我可以在np.where的情况下插入这行代码。
例如,在csv中,我有一个url:";https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"。我想输出一个字符串"0";ININ0000085013D_1664645;。到目前为止,我设法用实现了目标
from urllib.parse import parse
testurl = "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"
print(urlparse(testurl).path[1:].split('/')[2:])
然而,我需要urlparse行以一种格式提供字符串输出,我可以将其推送到np.where语句中,如下面所示,其中x是上面的字符串。
import pandas
import numpy as np
svc_df = pandas.read_csv(r"\filelocServiceLines.txt",
usecols=['Location', 'URLName', 'createdate'],
dtype={'Location':'string', 'URLName':'string'},
parse_dates=['createdate'])
# Create FieldNote column based on URLName
svc_df['FieldNote'] = np.where(svc_df['URLName'].str.contains('servicecards'), x, svc_df['URLName'].apply(lambda x: x[x.rfind('/')+1:]))
我也觉得自己迷失在这里的杂草中,也许有更简单的方法可以做到这一点?我试图基本上基于URLName创建FieldNote列,它采用文件名(在last/之后(,除非URLName包含"servicecards"(只有重复的(,在这种情况下,我想要子文件夹名+文件名。
作为替代方案,您可以使用Pandasapply
函数来描述与where
命令类似的行为。
def get_field(d):
s = d.rsplit('/',2)
if 'servicecards' in d:
return '_'.join(s[-2:])
return s[-1]
df['FieldNote'] = df['URLName'].apply(get_field)
print(df)
df的输出
URLName FieldNote
0 https://companymax/servicecards/city/ININ0000085013D/1664645.TIF ININ0000085013D_1664645.TIF
1 https://companymax/otherstring/city/ININ0000085013E/1664646.TIF 1664646.TIF
2 https://companymax/otherstring/city/ININ0000085013F/1664647.TIF 1664647.TIF
3 https://companymax/otherstring/city/ININ0000085013G/1664648.TIF 1664648.TIF
4 https://companymax/servicecards/city/ININ0000085013H/1664649.TIF ININ0000085013H_1664649.TIF
5 https://companymax/servicecards/city/ININ0000085013I/1664650.TIF ININ0000085013I_1664650.TIF
6 https://companymax/otherstring/city/ININ0000085013J/1664651.TIF 1664651.TIF
7 https://companymax/servicecards/city/ININ0000085013K/1664652.TIF ININ0000085013K_1664652.TIF
8 https://companymax/otherstring/city/ININ0000085013L/1664653.TIF 1664653.TIF