如何从现场获取天气数据?



我无法从站点获取Dew_Point和风数据 https://weather.gc.ca/city/pages/ab-52_metric_e.html

导入请求 从 lxml 导入 html

# Get the html page
resp=requests.get("https://weather.gc.ca/city/pages/ab-52_metric_e.html")
# Build html tree
html_tree=html.fromstring(resp.text)
#Dew_point=html_tree.xpath("//dd[@class='mrgn-bttm-0 wxo-metric-hide'][(parent::dl[@class='dl-horizontal wxo-conds-col2'])]//text()")[1].replace("Â", "")
# Print Dew_point
#print(f"Dew_point in {city_name} is {Dew_point}")  
#Wind=html_tree.xpath("//dd[@class='longContent mrgn-bttm-0 wxo-metric-hide'][(parent::dl[@class='dl-horizontal wxo-conds-col2'])]//text()")[0].replace("Â", "")
# Print Wind
#print(f"Wind in {city_name} is {Wind}")  

数据应采用以下格式: 露点:-2.3°C 风: 东北 9 公里/小时

风的方向可能会改变。

我不确定如何解析以下 HTML 代码,再次感谢您的帮助!

<dt>Temperature:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">13.2°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">55.8°
<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Dew point:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-2.3°<abbr title="Celsius">C</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">27.9°<abbr title="Fahrenheit">F</abbr>
</dd>
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">34%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="Northeast">NE</abbr> 9 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="Northeast">NE</abbr> 6 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">48 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">30 miles</dd>
</dl></div> 

最好的方法是使用BeautifulSoup来解析页面。

这是你想要的:

import requests
from bs4 import BeautifulSoup
resp = requests.get("https://weather.gc.ca/city/pages/ab-52_metric_e.html").content
soup = BeautifulSoup(resp, "html.parser")
all_dt = soup.find_all("dt")
#if you want more metrics, just add it to the list
metrics = ["Dew point:","Wind:","Pressure:","Condition:","Tendency:","Temperature:", "Humidity:", "Visibility:"]
data = {}
for metric in metrics:
data[metric] = []
for elem in all_dt:
if elem.text in metrics:
value = elem.next_sibling.next_sibling
data[elem.text].append(value.text.strip("n") if  value != None else "No Data")
print(data)

bs4 的 BeautifulSoup 通常用于解析 html 和提取数据。在您的情况下,它可以以多种方式使用;首先解析并获取 HTML 树 [soup]

import requests
from bs4 import BeautifulSoup
resp = requests.get("https://weather.gc.ca/city/pages/ab-52_metric_e.html")
soup = BeautifulSoup(resp.content, "html.parser")

[我喜欢使用select函数,如果您对任何选择器感到困惑,可以查阅此参考。


现在,如果您想要当前的天气,那么

selector = '#mainContent details.visible-xs dt+dd'
for s in soup.select(selector):
print(s.find_previous_sibling().text.strip(), s.text.strip(), end=' ')

将输出

Wind: NE 8 km/h Temperature: 18.2°C Pressure: 101.9 kPa Dew point: -8.1°C Visibility: 24 km Humidity: 16% Date: 8:00 PM MDT Wednesday 19 October 2022 Observed at: Calgary Int'l Airport 

但你也评论说想要条件和趋势,你可以简单地得到条件

print(f"Condition: {soup.select_one('details.visible-xs div>img+p')}")

但趋势实际上在另一个不可见的部分。您可以将selector更改为'#mainContent section:first-of-type dt+dd',甚至只是'dt+dd'并再次运行for s in soup.select(selector)....,但您将重复多个值和不需要的数据,因此我们可能需要一种更有条理的方法。



所以我定义了一个函数,它将获取树的一部分并为每个条件返回一个值 -

def getWeatherData(sectionSoup, pKeys='all', cwcSelector='dt+dd', pEnd=' '):
wc = []
for s in sectionSoup.select(cwcSelector):
wcName = s.find_previous_sibling().text.strip()
wcVal = s.text.strip()
vis = 'hidden' if s.find_parent(class_='hidden-xs') else 'visible'
wc.append((wcName, wcVal, vis)) 
wcs = sorted(wc, key=lambda c: c[2], reverse=True) # visible first
if pKeys == 'all' or type(pKeys) != list: 
pKeys = list(set([c[0] for c in wc]))
allKeys = True
else: allKeys = False
forOp = []
for k in pKeys:
kvp = [c for c in wcs if c[0].replace(':','')==k.replace(':','')]
if kvp == []:
# continue # if you want to skip
if pEnd is None or type(pEnd) == str: 
print(f'! UNAVAILABLE : "{k}" !', end=pEnd)
else: 
if allKeys: k = k.replace(':', '') # remove if you want to preserve originial text
forOp.append((k, kvp[0][1]))
if pEnd is None or type(pEnd) == str: 
print(kvp[0][0], kvp[0][1], end=pEnd) 

print()
return dict(forOp)

# if you want all the values, including repeats:
# return {'filtered': forOp, 'unfiltered': wc} 

现在,您可以定义所需的数据部分以及所需的顺序:

toPrint = [
'Condition', 'Pressure', 'Tendency', 'Temperature', 
'Dew point', 'Humidity', 'Wind', 'Visibility'
] # the parts you mentioned wanting
cwc = getWeatherData(soup.select_one('#mainContent section'), toPrint)
print(f'n##############nAs Dictionary:n{cwc}')

这将输出

Condition: Partly Cloudy Pressure: 101.3 kPa Tendency: Rising Temperature: 8.9°C Dew point: -0.3°C Humidity: 52% Wind: N 43  gust 54 km/h Visibility: 24 km 
##############
As Dictionary:
{'Condition': 'Partly Cloudy', 'Pressure': '101.3 kPa', 'Tendency': 'Rising', 'Temperature': '8.9°C', 'Dew point': '-0.3°C', 'Humidity': '52%', 'Wind': 'N 43  gust 54 km/h', 'Visibility': '24 km'}

[如果您希望将每个值打印在单独的行上,请发送pEnd='n';如果根本不想打印,则发送pEnd=False;如果要查看所有可用数据,请发送pKeys='all'(而不是toPrint)。



如果需要,您还可以使用

wcSects = [s.find_parent('details') for s in soup.select('details dd+dt')]
for s in list(set(wcSects)):
print('n#######', end=' ')  
h2 = s.select_one('summary h2')
print((h2 if h2 else s.summary).text.strip(), '#######')
getWeatherData(s, 'all', pEnd=' | ')

最新更新