我有一个'lead'数据集,它有'ref_url'列。本专栏包含链接,我想解析这些链接并只获取其中的特定部分。我需要用解析的值替换旧的值。
这就是旧价值观的样子:
https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medial=mailerlite&utm_campaign=regalia&utm_id=regalia
这就是我想要它们的样子:
https://regalia-deyaar.sales-centre.properties/
以下是我所做的:
from urllib.parse import urlparse
def parsing_url(Series):
for rows in Series:
parsed_url = urlparse(rows)
parsed=(f"{parsed_url.scheme}://{parsed_url.netloc}{parsed_url.path}")
rows=parsed
leads['ref_url'].apply(parsing_url)
然而,这并没有奏效。它只返回NaN值。你能帮我吗?
我假设您使用的是panda,您可以使用lambda并将字符串拆分为"quot;
df = pd.DataFrame({
'url': ["https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"
, "https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"
, "https://regalia-deyaar.sales-centre.properties/?utm_source=email&utm_medium=mailerlite&utm_campaign=regalia&utm_id=regalia"]
})
# split string by "?" and get the first, assume main url will not contains "?"
df['url']=df['url'].apply(lambda x: x.split("?",1)[0])