如何从utf-8列表中删除无用字符



我有以下片段。

def profile_details():  #function to fetch people
    payload = 'grab'
    global result_people 
    result_people = []
    for i in range(0,5):
        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
        page =  bs4.BeautifulSoup(rr.text,"lxml")
        page_parse = page.select('.user-list-info p')
        for i in range(len(page_parse)): 
                test = page_parse[i].text
                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):
                        a = result_people.append(page_parse[i].text.encode("utf-8"))
                else:
                        pass
profile_details()
for i in result_people:
        print(i)

输出看起来像这个

[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']

我想从列表中删除诸如\xf0\x9f\x8c\x9d\之类的字符。

输出看起来一团糟:

b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '

b'\n在亚马逊编码,以前是@Grab\n'b'\n软件工程师@grab\r\n以前@shopback\n'b'\n Front End@facebook\xf\xf\x8c\xd\xc2\xb7 Maintaining Docusaurus\xc2\xb0\xf\x87\xb\xf\x7\xac\r\n\r\n'b'\n在亚马逊编码,以前是@Grab\n'b'\n软件工程师@grab\r\n以前@shopback\n'

实现这一点最简单方便的方法是什么。

提前感谢

欢迎使用StackOverflow!

您可以通过从每个字符串中删除所有非ASCII字符来完成此操作

for i in result_people:
    print(i.decode('utf8').encode('ascii', errors='ignore'))

在ascii中使用ignore作为参数进行编码时忽略错误,从而解决了此问题&然后将其转换回utf-8。

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

最新更新