如何删除错误链接?



当我想从服务器获得响应时,我有解析问题-我突然发现链接是错误的。当我试图删除所有以。txt结尾的链接时:

out1 = ['https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.3-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.4-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt',
'https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I',]

I receive next print:

a = ['https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt']

我代码:

for ii in out1:
if ii.find('.txt'):
out1.remove(ii)
print(out1)

我怎么能删除错误的链接与。txt?谢谢你!更新,我在写:

r_list = []
for ii in out1:
d = re.sub(r'httpS+txt', '', ii)
r_list.append(d)
res = list(filter(lambda x: x, r_list))
print(res)

如前所述,regex不是必需的,但如果您喜欢使用它,请尝试搜索结尾:

import re
[l for l in out1 if not re.search(r'.txt$',l)]

没有regex,简单地使用endswith()将做同样的工作:

[l for l in out1 if not l.endswith('.txt')]

两者都会给你一个干净的列表:

['https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm','https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I']

简单地说,您可以检查.txt并使用continue关键字删除。

out1 = ['https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.3-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.4-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt',
'https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I',]

for i  in out1:
#print(i)
if '.txt' in i:
continue
print(i)

输出:

https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm
https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I

相关内容

  • 没有找到相关文章