将一组数据(url)放在一个空数据框架Python Pandas中



我在刮痧一系列URL,代码如下:

df1 = pd.DataFrame()
url = 'https://www.welcometothejungle.com/fr/jobs? 
page=1&refinementList%5Bprofession_name.fr.Tech%5D%5B%5D=Data%20Science'
path = '/Users/jdkj/desktop/chromedriver 3'
options = webdriver.ChromeOptions()
driver = webdriver.Chrome((path), chrome_options=options)
html = driver.get(url)
time.sleep(3)
elems = driver.find_elements_by_xpath("//article/div[2]/header/a['href']")
for elem in elems:
urls = elem.get_attribute("href")
print(urls)

这将返回我想要看到的正确结果,问题是,当我尝试将此"url "在我的空数据框架"df1"使用以下代码:

df_test = df1.append({'URLS' : urls}, ignore_index = True)
df_test.head()

它没有显示我想要的url(它没有返回一个错误,但结果真的没有意义)

我是从python开始的,所以我想我的问题可能有一些简单的答案,我希望我清楚

您的代码的问题是,您正在覆盖urls变量,然后追加到DataFrame只有最后抓取的URL。将df1.append语句移动到for块内部:

df1 = pd.DataFrame()
url = 'https://www.welcometothejungle.com/fr/jobs? 
page=1&refinementList%5Bprofession_name.fr.Tech%5D%5B%5D=Data%20Science'
path = '/Users/jdkj/desktop/chromedriver 3'
options = webdriver.ChromeOptions()
driver = webdriver.Chrome((path), chrome_options=options)
html = driver.get(url)
time.sleep(3)
elems = driver.find_elements_by_xpath("//article/div[2]/header/a['href']")
for elem in elems:
url = elem.get_attribute("href")  # <--- get the url from the <a> tag
df1 = df1.append({'URLS': url}, ignore_index=True) # <--- add the url to the dataframe in the URLS column

最新更新