如何将所有匹配的(REGEX)替换回列表中，以无需typeError:预期字符串或其他字符缓冲对象

我的文件有不同的URL：

www.example.com
www.example.com/validagain
www.example.com/search?q=jsdajasj;kdas     --> trying to get rid off
www.example.com/anothervalid

我能够使用REGEX隔离/search：

import re
generate_links = re.compile('http://(.*)') #compile all http links
generate_links2 = re.compile('(.*)/eng/(.*)') #compile all english url
with open ("VACqueue.txt", "r") as queued_list, open('newqueue.txt','w') as queued_list_updated:
    for links in queued_list:
        url = ""
        services_url = ""
        valid_url = ""
        match = generate_links2.search(links)
        if match is not None:
            url = match.group()
            generate_links3 = re.compile('(.*)/services/(.*)') #compile all services links
            match2 = generate_links3.search(links)
            if match2 is not None:
                services_url = match2.group()
                print services_url
                generate_links4 = re.compile('(.*)/search?(.*)') #compiled error links
                match3 = generate_links4.search(links) #matched all error links

但是如何使用match3变量返回services_url进行删除或更换？

所以预期结果将是：

www.example.com
www.example.com/validagain
www.example.com/anothervalid

如果要摆脱包含'搜索的URL？'尝试：

from __future__ import print_function
with open() as in, open() as out:
    cured_url = [l for l in in.readlines() if 'search?' not in l]
    for url in cured_url:
        print(url, file=out)

相关内容

最新更新

热门标签：