如何在 python 中编码字符 '\u0107'



我正在尝试从维基百科页面抓取数据(它是某些年份前 100 首单曲的表格),同时将输出保存到 1951-1959 年的 csv,然后它给出了一个错误:

第 43 行,在 writer.writerow(songs) 文件 "C:\Python36_64\lib\encodings\cp1252.py",

第 19 行,在编码返回中 codecs.charmap_encode(输入,自我错误,encoding_table)[0]

UnicodeEncodeError:"charmap"编解码器无法在 中编码字符"\u0107" 位置 29:字符映射到<未定义>

法典:


from bs4 import BeautifulSoup
import requests
import csv
data = []

def scrape_data(search_year):
    year_data = []
    url = f'https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_{str(search_year)}'
    # Get a source code from url
    r = requests.get(url).text
    soup = BeautifulSoup(r, 'html.parser')
    # Isolate the table part from the source code
    table = soup.find('table', attrs={'class': 'wikitable'})
    # Extract every row of the table
    rows = table.find_all('tr')
    # Iterate through every row
    for row in rows[1:]:
        # Extract cols (with tags td and th)
        cols = row.find_all(['td', 'th'])
        # List comprehension (create a list of lists, list of rows, in which every row is a list of table text)
        year_data.append([col.text.replace('n', '') for col in cols])
    # Add the year, this data is from to the beginning of the list
    for n in year_data:
        n.insert(0, search_year)
    return year_data

for year in range(1951, 2019):
    try:
        data.append(scrape_data(year))
        print(f'Year {str(year)} Scrapped')
    except AttributeError as e:
        print(f'Year {str(year)} is not aviable')
writer = csv.writer(open('songs.csv', 'w'), delimiter=',', lineterminator='n', quotechar='"')
for year_data in data:
    for songs in year_data:
        writer.writerow(songs)
        print(songs)

我认为您可以在编写输出时使用正确的 unicode 编码来纠正这一点:

writer = csv.writer(open('songs.csv', 'w', encoding='utf-8'),
                    delimiter=',', lineterminator='n', quotechar='"')

相关内容

  • 没有找到相关文章

最新更新