列表的Unicode问题.无法在python中解析



我正在从使用pandas的网站中提取数据/数据框架,方法如下:

import pandas as pd
jockeys = 'https://race.kra.co.kr/globalEn/jockeysBusan.do'
jdf = pd.read_html(jockeys)[0]
jdf_list = jdf.values.tolist()
print(jdf_list)

我得到的结果如下(只添加前几个结果):

[[1,
'Chae Sang Hyun',
'FREE',
'2014/06/05',
'262 (16/19/22)',
'1789 (130/153/162)'],
[2,
'Choi Eun Gyeong',
'FREE',
'2016/06/18',
'317 (19/22/38)',
'1522 (90/120/140)'],
[3,
'Choi Si Dae',
'FREE',
'2007/05/18',
'409 (58/34/34)',
'5649 (750/658/594)'],
[4,
'Francisco Da Silva',
'FREE',
'2016/09/02',
'375 (61/45/42)',
'2255 (309/300/261)'],
[5,
'(-4)xa0Gwon O Chan',
'FREE',
'2021/07/15',
'154 (4/12/10)',
'200 (4/14/10)']]

我一直收到这个"(-4)xa0"之前的名字。我尝试了以下几种方法,但都是徒劳的:

jdf_list_new =  jdf_list.encode('ascii', 'ignore').decode('utf-8')

jdf_list_new = unicodedata.normalize("NFKC", jdf_list)

这里需要帮助!

xa0是Unicode字符'NO-BREAK SPACE'。在获得列表之前,您需要对数据框中的列进行编码和解码((-4)是网站中表的一部分)

jdf = pd.read_html(jockeys)[0]
jdf['(allowance)Name'] = jdf['(allowance)Name'].str.encode('ascii', 'ignore').str.decode('utf-8')

输出
[1, 'Chae Sang Hyun', 'FREE', '2014/06/05', '262 (16/19/22)', '1789 (130/153/162)']
[2, 'Choi Eun Gyeong', 'FREE', '2016/06/18', '317 (19/22/38)', '1522 (90/120/140)']
[3, 'Choi Si Dae', 'FREE', '2007/05/18', '409 (58/34/34)', '5649 (750/658/594)']
[4, 'Francisco Da Silva', 'FREE', '2016/09/02', '375 (61/45/42)', '2255 (309/300/261)']
[5, '(-4)Gwon O Chan', 'FREE', '2021/07/15', '154 (4/12/10)', '200 (4/14/10)']
[6, 'Jeon Jin Gu', 'FREE', '2017/06/02', '183 (2/12/7)', '914 (47/64/52)']
[7, 'Jeong Dong Cheol', 'FREE', '2011/08/24', '141 (6/3/4)', '2724 (169/183/195)']
[8, 'Jeong Woo Ju', 'FREE', '2018/06/14', '143 (2/6/8)', '987 (48/50/68)']
[9, 'Jo In Kwon', 'FREE', '2008/06/18', '355 (37/53/40)', '4592 (649/533/491)']
[10, 'Jung Do Yun', 'FREE', '2016/06/18', '260 (29/28/25)', '1921 (162/157/194)']
[11, 'Kim Cheol Ho', 'FREE', '2008/06/18', '164 (8/8/14)', '2640 (217/219/240)']
[12, 'Kim Eu Soo', 'FREE', '2005/05/04', '270 (11/13/14)', '4102 (243/306/344)']
[13, 'Kim Hye Sun', 'FREE', '2009/06/01', '415 (46/57/44)', '4275 (350/374/363)']
[14, '(-4)Lee Hong Rag', 'FREE', '2022/07/01', '91 (6/9/8)', '91 (6/9/8)']
[15, 'Lee Sung Jae', 'FREE', '2008/05/14', '396 (34/23/35)', '4244 (327/333/398)']
[16, 'Lim Sung Sil', 'FREE', '2002/09/13', '94 (5/8/14)', '2648 (353/296/279)']
[17, 'Mo Jun Ho', 'FREE', '2020/07/15', '340 (17/17/26)', '755 (45/54/64)']
[18, 'Park Jae I', 'FREE', '2015/06/17', '390 (62/52/50)', '2239 (167/223/227)']
[19, '(-4)Park Jong Ho', 'FREE', '2020/07/15', '74 (1/2/5)', '282 (8/7/14)']
[20, '(-2)Seo Gang Ju', 'FREE', '2021/07/15', '342 (28/41/40)', '385 (28/44/46)']
[21, 'Seo Seung Un', 'FREE', '2011/08/24', '368 (61/55/46)', '3973 (620/540/491)']
[22, '(-2)Shin Yun Seob', 'FREE', '2021/07/15', '313 (16/22/28)', '407 (24/26/38)']
[23, 'Song Kyeong Yun', 'FREE', '2007/05/18', '391 (39/34/40)', '4765 (361/450/461)']
[24, '(-3)Yoon Hyung Seok', 'FREE', '2021/07/15', '268 (13/19/23)', '317 (14/24/24)']
[25, 'You Hyun Myung', 'FREE', '2002/09/13', '387 (73/49/42)', '7104 (1199/940/750)']

最新更新