在不同的列中单独的Python web抓取数据(Excel)



尊敬的Stackoverflow社区,

最近我开始玩Python。我通过观看YouTube视频和浏览这个平台学到了很多。但是我不能解决我的问题。

希望你们能帮我。

所以我试着用Python(Anaconda(从网站上抓取信息。并将这些信息放入CSV文件中。我试图通过在脚本中添加","来分隔列。但当我打开我的CSV文件时,所有数据都放在一列(A(中。相反,我希望数据被分隔在不同的列中(A&B(当我想添加信息时,还有C、D、E、F等(。

我必须在这个代码中添加什么:

filename = "brands.csv"
f = open(filename, "w")
headers = "brand, shippingn"
f.write(headers)
for container in containers:
brand_container = container.findAll("h2",{"class":"product-name"})
brand = brand_container[0].a.text
shipping_container = container.findAll("p",{"class":"availability in-stock"})
shipping = shipping_container[0].text.strip()
print("brand: " + brand)
print("shipping: " + shipping)
f.write(brand + "," + shipping +  "," + "n")
f.close()

谢谢你的帮助!

问候,


根据Game0ver的建议完成脚本:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.scraped-website.com'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")   
# grabs each product
containers = page_soup.findAll("li",{"class":"item last"})
container = containers[0]
import csv
filename = "brands.csv"
with open(filename, 'w') as csvfile:
fieldnames = ['brand', 'shipping']
# define your delimiter
writer = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
writer.writeheader()
for container in containers:
brand_container = container.findAll("h2",{"class":"product-name"})
brand = brand_container[0].a.text
shipping_container = container.findAll("p",{"class":"availability in-stock"})
shipping = shipping_container[0].text.strip()
print("brand: " + brand)
print("shipping: " + shipping)

正如我提到的,这个代码不起作用。我一定做错了什么?

您最好使用python的csv模块来实现这一点:

import csv
filename = "brands.csv"
with open(filename, 'w') as csvfile:
fieldnames = ['brand', 'shipping']
# define your delimiter
writer = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
writer.writeheader()
# write rows...

试着用双引号括起你的值,比如

f.write('"'+brand + '","' + shipping +  '"n')

尽管如此,还是有更多更好的方法来处理此通用任务和此功能。

您可以选择下面显示的任何一种方式。由于无法访问脚本中可用的url,我提供了一个有效的url。

import csv
import requests
from bs4 import BeautifulSoup
url = "https://yts.am/browse-movies"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
with open("movieinfo.csv", 'w', newline="") as f:
writer = csv.DictWriter(f, ['name', 'year'])
writer.writeheader()
for row in soup.select(".browse-movie-bottom"):
d = {}
d['name'] = row.select_one(".browse-movie-title").text
d['year'] = row.select_one(".browse-movie-year").text
writer.writerow(d)

或者你可以尝试如下:

soup = BeautifulSoup(response.content, 'lxml')
with open("movieinfo.csv", 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(['name','year'])
for row in soup.select(".browse-movie-bottom"):
name = row.select_one(".browse-movie-title").text
year = row.select_one(".browse-movie-year").text
writer.writerow([name,year])

最新更新