当我想从服务器获得响应时,我有解析问题-我突然发现链接是错误的。当我试图删除所有以。txt结尾的链接时:
out1 = ['https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.3-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.4-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt',
'https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I',]
I receive next print:
a = ['https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt']
我代码:
for ii in out1:
if ii.find('.txt'):
out1.remove(ii)
print(out1)
我怎么能删除错误的链接与。txt?谢谢你!更新,我在写:
r_list = []
for ii in out1:
d = re.sub(r'httpS+txt', '', ii)
r_list.append(d)
res = list(filter(lambda x: x, r_list))
print(res)
如前所述,regex
不是必需的,但如果您喜欢使用它,请尝试搜索结尾:
import re
[l for l in out1 if not re.search(r'.txt$',l)]
没有regex
,简单地使用endswith()
将做同样的工作:
[l for l in out1 if not l.endswith('.txt')]
两者都会给你一个干净的列表:
['https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm','https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I']
简单地说,您可以检查.txt
并使用continue
关键字删除。
out1 = ['https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.3-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm',
'https://www.itu.int./htmldoc.asp?doc=t\rec\q\T-REC-Q.1238.4-200006-I!!SUM-TXT-E.txt',
'https://www.itu.int./htmldoc.asp?doc=t\rec\x\T-REC-X.42-200003-S!!SUM-TXT-E.txt',
'https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I',]
for i in out1:
#print(i)
if '.txt' in i:
continue
print(i)
输出:
https://www.itu.int/dms_pubrec/itu-t/rec/q/T-REC-Q.1248.1-200107-I!!SUM-HTM-E.htm
https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-X.51-198811-I