使用Beautiful Soup & Python从CSV中抓取多个URL



我需要抓取存储在CSV文件中的URL列表。

我对美丽汤很陌生

假设您的urls.csv文件如下所示:

https://stackoverflow.com;code site;
https://steemit.com;block chain social site;

以下代码将起作用:

#!/usr/bin/python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup #required to parse html
import requests #required to make request
#read file
with open('urls.csv','r') as f:
    csv_raw_cont=f.read()
#split by line
split_csv=csv_raw_cont.split('n')
#remove empty line
split_csv.remove('')
#specify separator
separator=";"
#iterate over each line
for each in split_csv:
    #specify the row index
    url_row_index=0 #in our csv example file the url is the first row so we set 0
    #get the url
    url = each.split(separator)[url_row_index] 
    #fetch content from server
    html=requests.get(url).content
    #soup fetched content
    soup=   BeautifulSoup(html)
    #show title from soup
    print soup.title.string

结果:

Stack Overflow - Where Developers Learn, Share, & Build Careers
Steemit

更多信息:美汤和要求

最新更新