如何使用python、beautifulsoup将excel表的名称拆分为3个单元格



我正在尝试从中提取名称,并将它们导入到excel工作表中,以便以后使用。问题是我需要在3个不同的小区中使用它们,firstlastinitial。脚本在本例中查找关键字est of,并打印整行,该行具有全名和";"的est";。我需要它:

  1. 从末尾删除的est
  2. 将全名拆分为3个,以便将其导出到图纸中

以下是代码:

#!python
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from random import randint
import pickle
import datetime
import os
import time
import sys
import openpyxl
from openpyxl import Workbook
import re
url = 'https://www.miamidade.gov/global/home.page'
current_time = datetime.datetime.now()
current_time.strftime("%m/%d/%Y")
options = webdriver.ChromeOptions()
options.headless = True
chromedriver = "chromedriver.exe"
number = "2080"
driver = webdriver.Chrome(chromedriver) #chromedriver
driver.get(url)
pickle.dump(driver.get_cookies() , open("cookies.pkl","wb"))
time.sleep(3)
nav1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/nav/div/div[1]/div/div[1]/a').click()
time.sleep(1)
nav2 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/button').click()
propsrch1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/ul/li[2]/ul/li[5]/a').click()
time.sleep(2)
propsrch2 = driver.find_element_by_xpath('/html/body/div[2]/div/main/div[2]/div/div[2]/div/div[1]/div[1]/ul/li[1]/span/a').click()
time.sleep(5)

subdivision = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/ul/li[3]/a').click()
searchbar = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/input')
time.sleep(2)
searchbar.send_keys("RICHMOND HGTS")
search = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/span/button/span').click()
time.sleep(10)
table = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[4]/a').click()
main_window_handle = None
while not main_window_handle:
main_window_handle = driver.current_window_handle
#driver.find_element_by_xpath(u'//a[text()="click here"]').click()
signin_window_handle = None
while not signin_window_handle:
for handle in driver.window_handles:
if handle != main_window_handle:
signin_window_handle = handle
break
driver.switch_to.window(signin_window_handle)
time.sleep(20)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
keyword = 'est of'
#keywords = soup.find(keyword)
counts = soup.find_all(text=re.compile("EST OF"))
for count in counts:
print(count)

现在它打印到cmd中,只是为了让我看到它在工作。它看起来像这样:

GRACE K ROLLE EST OF    
ETHEL H FIFE EST OF 
BARBARA J BROUSSARD EST OF  
CLEMENTINA D RAHMING EST OF 
CHARLES B  CAMBRIDGE JR EST OF  
EMILY STATEN EST OF 
HATTIE S KING  EST OF   

拆分名称的最佳方法是什么?

您可以使用拆分方法拆分以下空间

for count in counts:
count= count.split(' ')
First_name=counnt[0]
mid_name=count[1]
Last_name=count[2]

如果您知道它总是用空格分隔3个单词,那么您可以使用count.split(' ')[:3]

如果您不知道名称的长度,可以使用count.rstrip('EST OF').split(' ')

最新更新