Python - 使用通配符字符串匹配从网站的源代码中提取浮点数 - Python - Using wildcard string matching to extract float from a website's source code 小贝子编程网

我正在使用Python中的Web刮板来处理一些代码。

给定网站的源代码，我需要提取相关的数据点。源代码看起来像这样。

</sup>73.00</span> </td> </tr> <tr class="highlight"> <td><span class="data_lbl">Average</span></td> <td> <span class="data_data"><sup>
</sup>86.06</span> </td> </tr> <tr> <td><span class="data_lbl">Current Price</span></td> <td> <span class="data_data"><sup> </sup>83.20</span> </td>
 </tr> </tbody> </table> </div> </div> <!--data-module-name="quotes.module.researchratings.Module"--> </div> <div class="column at8-
col4 at16-col4 at12-col6" id="adCol"> <div intent in-at4units-prepend="#adCol" in-at8units-prepend="#adCol" in-at12units-prepend="#adCol

这是我正在使用的正则

regex = re.compile('Average*</sup>.....')

旨在在"平均"之后遇到的第一个"/sup"标签之后获得5个字符，在这种情况下，这将是" 86.06"（尽管我需要清理比赛，然后才剩下一个float）。

有一种更优雅的方法可以在看到字符串"平均值"之后输出第一个浮点。

我非常陌生，对使用Regex并道歉，如果问题还不够清楚。

我已经能够使用 lookbehind断言与 noceedy> Ungreedy search结合使用：

(?<=Average).*?(?<=</sup>)([0-9.]{5})

此工作示例在这里

说明

([0-9.]{5})：在以下三个点之后，寻找5个字符，结合0到9和点。
1. (?<=Average)：一词平均必须出现在
2. .*?：之间的任何数量的字符。非绿色（将尽可能少符合魅力）
3. (?<=</sup>)：标签</sup>必须出现在

您要寻找的数字将在第一个捕获组

中

Python - 使用通配符字符串匹配从网站的源代码中提取浮点数

相关内容

最新更新

热门标签：