我知道如何从WhatsApp中抓取表情符号,但前提是:
- 有一个单一的表情符号没有任何文字或
- 有文字带表情符号
但是当消息中有两个表情符号而没有任何文字时,我无法上网。这是消息"🎂">
的html<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_638-64.png" alt="🎂" draggable="false"
class="_2UdhN _1xeoG i0jNr selectable-text copyable-text" data-plain-text="🎂"
style="visibility: visible;">
</span>
</div>
和我尝试了这个代码来获取表情符号:
m = s.find_all('div', attrs={'class':'i0jNr'})
v = m.find('span', attrs={'class':'_3R6rC'})
for i in v.children:
if isinstance(i, NavigableString):
print(i)
elif isinstance(i, Tag):
print(i.attrs['alt'])
但是这段代码只在有单个表情符号时才有效,但是当消息中有两个表情符号时,它只打印一个,例如消息为"🔥🖐"输出为"🔥"(它只打印第一个表情符号)。这是消息
的html<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1749-40.png" alt="🔥" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="🔥"
style="visibility: visible;">
</span>
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1845-40.png" alt="🖐" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="🖐"
style="visibility: visible;">
</span>
</div>
我尝试了这个代码打印两个表情符号,但它不工作:
msglist = []
m = s.find_all('div', attrs={'class':'i0jNr'})
for b in m:
v = b.find_all('div', attrs={'class':'JwMbj'})
for x in v:
z = x.find_all('span', attrs={'class':'_3R6rC'})
for i in z.children:
if isinstance(i, NavigableString):
print(i)
elif isinstance(i, Tag):
print(i.attrs['alt'])
但是没有输出。
您可以将<img>
标记转换为纯文本,然后使用.get_text
正常获取文本。例如:
from bs4 import BeautifulSoup
html_doc = """
<div class="JwMbj i0jNr selectable-text copyable-text">
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1749-40.png" alt="🔥" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="🔥"
style="visibility: visible;">
</span>
<span class="_3R6rC">
<img crossorigin="anonymous"
src="/img/d07f9aca6938f691b840f97dd1cd67dd_w_1845-40.png" alt="🖐" draggable="false"
class="_2UdhN _3zyju i0jNr selectable-text copyable-text" data-plain-text="🖐"
style="visibility: visible;">
</span>
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
# select the main text div
text_div = soup.select_one(".copyable-text")
# convert all <img> to plain-text:
for img in text_div.select("img[data-plain-text]"):
img.replace_with(img["data-plain-text"])
# get text normally:
print(text_div.get_text(strip=True))
打印:
🔥🖐