如何从链接中提取然后打印某些单词

我正在为家庭作业制作天气预报程序，它需要打印：

Today's temperatures: maximum 2ºC, minimum -1ºC

目前它打印出来：

Today's temperatures:      <title>Thursday: Light Snow Shower, Maximum 
Temperature: 2Â°C (36Â°F) Minimum Temperature: -1Â°C (30Â°F)</title>.

如何确保它只打印正确的信息？这是我的代码：

import urllib
url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_connection = urllib.urlopen(url)
for line in web_connection.readlines():
    if line.find('Thursday:') != -1:
        print "Today's temperatures:" + line
web_connection.close()

您可以使用正则表达式来执行此操作

import re
TEMP_REGEX = "^.*Maximums+Temperature:s+(?P<max>([+-]?[0-9]*[.,]?[0-9]*)).*Minimums+Temperature:s+(?P<min>([+-]?[0-9]*[.,]?[0-9]*)).*$"
matched = re.match(TEMP_REGEX, line)
if matched:
    max = matched.groupdict()["max"]
    min = matched.groupdict()["min"]
.....

正确的方法是解析 RSS 文件，该文件采用 XML 格式。您可以从此处的 XML 模块文档开始。这里有一个小代码片段可以帮助您入门：

import urllib
from xml.etree import ElementTree as ET
url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_conn = urllib.urlopen(url)
rss = web_conn.read()
web_conn.close()
weather_data = ET.fromstring(rss)
for node in weather_data.iter():
    if node.tag == "item":
        title = node.find("title").text
        if title.find("Thursday") != -1:
            todays_weather = node.find("description").text.split(',')
            for entry in todays_weather:
                print entry.strip()

这输出：

Maximum Temperature: 2°C (36°F)
Minimum Temperature: -1°C (30°F)
Wind Direction: Westerly
Wind Speed: 6mph
Visibility: Very Good
Pressure: 977mb
Humidity: 87%
UV Risk: 1
Pollution: Low
Sunrise: 07:59 GMT
Sunset: 16:42 GMT

如何以及为什么？如果您在浏览器中打开 RSS 文件，您将看到它是 XML 格式的，这意味着它具有特定的结构。查看信息，您会看到每天的天气预报都包含在一个<item>中，该具有<title>和<description>以及其他信息。通过使用 XML 解析器，您将能够使用直观的方法轻松浏览结构，如 .find() ，.findall()并访问具有 .text 属性的数据。

您有三个问题要解决，首先找到一周中当天的名称，其次找到具有最小和最大温度的正确行，第三解析这些温度。我认为这应该有效：

import urllib
import re
url = 'http://open.live.bbc.co.uk/weather/feeds/en/2654993/3dayforecast.rss'
web_connection = urllib.urlopen(url)
for line in web_connection.readlines():
    day_of_the_week = time.strftime("%A")
    if '<title>'+ day_of_the_week +':' in line:
        m = re.match('.+Maximum Temperature:s(.+)°C.+Minimum Temperature:s(.+)°C.+', line)
        max_temp = m.group(1)
        min_temp = m.group(2)
print("Today's temperatures: maximum " + max_temp + "°C, minimum " + min_temp + "°C")
web_connection.close()

因此，要了解一周中的某一天，请查看 https://docs.python.org/2/library/time.html#time.strftime

然后我做了和你一样的事情来找到正确的行（只是使用了 Python 的"in"语句）

之后，我应用了一个带有组的正则表达式来解析数字（和符号！为了帮助您进行正则表达式设计，您可以尝试 https://regex101.com/#python

玩得愉快！

相关内容

最新更新

热门标签：