<br> 使用美汤时拆分HTML文本



HTML 代码:

<td> <label class="identifier">Speed (avg./max):</label> </td> <td class="value"> <span class="block">4.5 kn<br>7.1 kn</span> </td>

我需要将值 4.5 kn 和 7.1 作为单独的列表项获取,以便我可以单独附加它们。我不想拆分它,我想使用 re.sub 拆分文本字符串,但它不起作用。我尝试使用替换来替换br,但它不起作用。任何人都可以提供任何见解吗?

蟒蛇代码:

  def NameSearch(shipLink, mmsi, shipName):
    from bs4 import BeautifulSoup
    import urllib2
    import csv
    import re
    values = []
    values.append(mmsi)
    values.append(shipName)
    regex = re.compile(r'[nrt]')
    i = 0
    with open('Ship_indexname.csv', 'wb')as f:
        writer = csv.writer(f)
        while True:
            try:
                shipPage = urllib2.urlopen(shipLink, timeout=5)
            except urllib2.URLError:
                continue
            except:
                continue
            break
        soup = BeautifulSoup(shipPage, "html.parser")  # Read the web page HTML
        #soup.find('br').replaceWith(' ')
        #for br in soup('br'):
            #br.extract()
        table = soup.find_all("table", {"id": "vessel-related"})  # Finds table with class table1
        for mytable in table:                                   #Loops tables with class table1
            table_body = mytable.find_all('tbody')                  #Finds tbody section in table
            for body in table_body:
                rows = body.find_all('tr')                #Finds all rows
                for tr in rows:                                 #Loops rows
                    cols = tr.find_all('td')                    #Finds the columns
                    for td in cols:                             #Loops the columns
                        checker = td.text.encode('ascii', 'ignore')
                        check = regex.sub('', checker)
                        if check == ' Speed (avg./max): ':
                            i = 1
                        elif i == 1:
                            print td.text
                            pat=re.compile('<brs*/>')
                            print pat.sub(" ",td.text)
                            values.append(td.text.strip("n").encode('utf-8'))  #Takes the second columns value and assigns it to a list called Values
                            i = 0
    #print values
    return values

NameSearch('https://www.fleetmon.com/vessels/kind-of-magic_0_3478642/','230034570','KIND OF MAGIC')

首先找到"速度(平均/最大)"标签,然后通过.find_next()转到该值:

from bs4 import BeautifulSoup   
data = '<td> <label class="identifier">Speed (avg./max):</label> </td> <td class="value"> <span class="block">4.5 kn<br>7.1 kn</span> </td>'
soup = BeautifulSoup(data, "html.parser")
label = soup.find("label", class_="identifier", text="Speed (avg./max):")
value = label.find_next("td", class_="value").get_text(strip=True)
print(value)  # prints 4.5 kn7.1 kn

现在,您可以从字符串中提取实际数字:

import re
speed_values = re.findall(r"([0-9.]+) kn", value)
print(speed_values)

打印['4.5', '7.1'] .

然后,您可以进一步将值转换为浮点数,并将解压缩为单独的变量:

avg_speed, max_speed = map(float, speed_values)

最新更新