小贝子编程

使用Python的ReGex无法找到正确的正则表达式

本文关键字：正则表达式 Python ReGex 使用 python-2.7 helper web-clips
更新时间 : 2023-08-24
英文 : ReGex with Python cant find the correct Regular Expression

我正试图从一个设计糟糕的网页中提取一些文本用于一个项目，经过长时间的研究和学习python后，我接近实现它，但网页设计糟糕，找不到正确的正则表达式来做它。

这就是我所完成的。http://coj.uci.cu/24h/status.xhtml?username=Diego1149&abb=1006从这个网页的源代码，我想得到一个可接受的问题的第一个实例的整行。所以我想到了这个

exprespatFinderTitle = re.compile('<table id="submission" class="volume">.*(<tr class=.*>.*<label class="AC">.*Accepted.*</label>.*</tr>).*</table>')

，但它所做的是剪到表的最后一个<tr>。有人能帮我弄明白吗?

我使用Python 2.7和BeautifulSoup和urllib

坚持美汤;正则表达式是而不是 HTML解析的工具:

table = soup.find('table', id='submission')
accepted = table.tbody.find('label', class_='AC')
if accepted:
    row = accepted.parent.parent  # row with accepted column

使用Python的ReGex无法找到正确的正则表达式

相关内容

最新更新

热门标签：