我试图从单个网页中抓取多个表,但无法将其保存到.csv文件中。下面保存的最后一张表是代码,请建议
import time
from selenium import webdriver
import pandas as pd
base_url = 'https://uk.insight.com/en_GB/shop/product/2W1F2EA%23ABU/HEWLETT-PACKARD-(HP-INC)/2W1F2EA%23ABU/HP-ProBook-440-G8--14"--Core-i7-1165G7--16-GB-RAM--1-TB-SSD--UK/'
print('Opening Chrome Browser Automatically in 5 secs')
time.sleep(5)
options = webdriver.ChromeOptions()
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=options)
driver.get(base_url)
df = pd.read_html(driver.page_source)
df2 = df[4:]
for table in df2:
df = pd.DataFrame(table)
df.to_csv('table.csv',index=False)
我不知道如何将所有数据帧保存到单个.csv中,如上所述,只有最后一个df被保存。
在Pandas.to_csv((文档中,您可以使用mode
参数来附加数据,而不是覆盖。默认设置为"w"。
如果你想附加数据,你可以把模式切换到";a";
df.to_csv('table.csv', mode='a', index=False)
需要注意的一点是,除非您设置了header = False
,否则列名也将被附加
下面是一个可快速复制的例子。
import uuid
import pandas as pd
dataframe = pd.DataFrame({
"person_id": [str(uuid.uuid4())[:7] for _ in range(6)],
"hours_worked": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
"wage_per_hour": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})
dataframe2 = pd.DataFrame({
"person_id2": [str(uuid.uuid4())[:7] for _ in range(6)],
"hours_worked2": [38.5, 41.25, "35.0", 27.75, 22.25, -20.5],
"wage_per_hour2": [15.1, 15, 21.30, 17.5, 19.50, 25.50],
})
dataframe.to_csv('TEST.csv', mode='w', index=False)
dataframe2.to_csv('TEST.csv', mode='a', index = False, header=False)
print(pd.read_csv('TEST.csv'))
输出
person_id hours_worked wage_per_hour
0 1aa66bc 38.50 15.1
1 b7abe05 41.25 15.0
2 15e1779 35.00 21.3
3 3c117d7 27.75 17.5
4 2e6494e 22.25 19.5
5 2a25e45 -20.50 25.5
6 b17d084 38.50 15.1
7 6ca361e 41.25 15.0
8 2cd18e4 35.00 21.3
9 9d120ff 27.75 17.5
10 a0b20d9 22.25 19.5
11 bf9a98d -20.50 25.5