美丽的grab-grab-gisible-webpage-text,没有文件以.html结尾



我喜欢我在此页面上得到的答案:beautifulsoup抓取可见的网页文本

但是我的页面没有以.html结尾,它是:https://biogmagscience.net

必须有一个简单的解决方案。

欢呼

您的URL中有一个错字,应该是https://biomagscience.net/此脚本将使用get_text()方法打印可打印文本:

import requests
from bs4 import BeautifulSoup
url = 'https://biomagscience.net/'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for tag in soup.select('style, script, [style*="display:none"]'):
    tag.extract()
print(soup.get_text(strip=True, separator='n'))

打印:

Best Magnets For Healing | Biomagnetic Therapy Products
The Future of Health & Well-Being —Today!
Advanced Therapy for Vitality, Nerve Regeneration & Pain Relief of Acute/Chronic Injuries & Illness
Acute Injuries
•
Alzheimer’s
•
Arthritis
•
Back Pain
•
Chronic Illness
•
EMF
•
Joint Pain
•
Muscle Pain
Magnet Therapy Articles
•
Products
BiomagScience
...and so on.

https://biogmagscience.net是URL,而不是文件名。转到您的网站,下载源代码,它将在HTML中。

最新更新