我有类似的字符串
string = 'https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6'
如何找到
'https://somewebsite.com/itesr0824'
使用regex?
我试过
re.sub('[^https://somewebsite.com/[a-zA-Z0-9].+$','',string)
但它只找到
'https://somewebsite.com/itesr0824/products/YYA'
当Python有内置的URL解析器时,为什么要使用正则表达式来实现这一点?不要重新发明轮子,不必要地要求说明URL可能为您呈现的所有奇怪的边缘情况,而是使用urllib.parse.urlparse()
和urllib.parse.urljoin
:
import urllib.parse
string = "https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6"
parsedURL = urllib.parse.urlparse(string)
trimmedURL = urllib.parse.urljoin(parsedURL.scheme + "://" + parsedURL.netloc, parsedURL.path.split("/")[1]) # 'https://somewebsite.com/itesr0824'
import re
string = 'https://somewebsite.com/itesr0824/products/YYA-002/fdrop-tQ?position=6'
x = re.sub('https://somewebsite.com/w+','',string)
# where:
# w - matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]
# + - one or more
print(x)
打印
/products/YYA-002/fdrop-tQ?position=6