使用python,鉴于该字符串="tiësto& sewhn -boom(artelax remix)"包含非ascii字符,我该如何使用UnIdecode修复字符串,使字符串如此清洁如此清洁非ASCII字符?
string = random.choice(list(open('data.csv'))).rstrip()
print "[+] Starting search for:", string
artistname = string.rsplit(' - ', 1)[0]
songname = string.rsplit(' - ', 1)[1]
上面的剪辑给了我:ArtistName =Tiësto&7nsongname = boom(artelax remix)
您可以看到,ArtistName仍然包含非ASCII字符。我如何使用UniDecode解决此问题?
只需在字符串上调用 unidecode
>>> from unidecode import unidecode
>>> unidecode(string)
'Tiesto & Sevenn - BOOM (Artelax Remix)'
在归一化形式后,还有更长/较慢的删除字符的途径:
>>> import unicodedata
>>> ''.join(s for s in unicodedata.normalize('NFD', string) if not unicodedata.combining(s))
'Tiesto & Sevenn - BOOM (Artelax Remix)'