我有以下片段。
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people = []
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
输出看起来像这个
[b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n UX Engineer @ Grabn', b'n Designer at @Grab. Design Systems. Emerging tech (AR).n ', b'n Mobile Developer (iOS) @Grab. Previously Flipkart.n ', b'n Data science and engineering at Grabn', b'n Software Engineer @ Grab.n ', b"n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen ", b'n Frontend Software Engineer at Grabn', b'n Developer @Grab(GrabTaxi)n ', b'n Full Stack - Software Engineer @ Grab | AI Enthusiastn ', b'n Software Engineer at Grabn', b'n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn ', b'n Ex-Engineering Lead @grab, Ex-DoE @90secondsn ', b'n Software Engineer/ Gopher. Worked @grab, @microsoftn ']
我想从列表中删除诸如\xf0\x9f\x8c\x9d\之类的字符。
输出看起来一团糟:
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'\n在亚马逊编码,以前是@Grab\n'b'\n软件工程师@grab\r\n以前@shopback\n'b'\n Front End@facebook\xf\xf\x8c\xd\xc2\xb7 Maintaining Docusaurus\xc2\xb0\xf\x87\xb\xf\x7\xac\r\n\r\n'b'\n在亚马逊编码,以前是@Grab\n'b'\n软件工程师@grab\r\n以前@shopback\n'
实现这一点最简单方便的方法是什么。
提前感谢
欢迎使用StackOverflow!
您可以通过从每个字符串中删除所有非ASCII字符来完成此操作
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
在ascii中使用ignore作为参数进行编码时忽略错误,从而解决了此问题&然后将其转换回utf-8。
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))