美丽的汤自动转换字符串为时间格式



我正试图从网站上抓取一个包含"时间"信息的div(使用beautifulsoup+selenium(:

options = webdriver.ChromeOptions() 
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.add_experimental_option('useAutomationExtension', False)
options.binary_location='/usr/bin/google-chrome-stable'
chrome_driver_binary = "/usr/bin/chromedriver"
driver = webdriver.Chrome(chrome_driver_binary, 
chrome_options=options)
#Set base url (San Francisco)
base_url = 'https://www.bandsintown.com/?place_id=ChIJIQBpAG2ahYAR_6128GcTUEo&page='

events = []
eventContainerBucket = []
for i in range(1,35):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
# get events links
event_list = driver.find_elements_by_css_selector('div[class^=_3buUBPWBhUz9KBQqgXm-gf] a[class^=_3UX9sLQPbNUbfbaigy35li]')
# collect href attribute of events in even_list
events.extend(list(event.get_attribute("href") for event in event_list))

# iterate through all events and open them.
item = {}
allEvents = []
for event in events:
soup = bs(driver.find_element_by_css_selector('[class^=Y_sOCKLIZzxDZWauPTJlk]').get_attribute('outerHTML'))
soup2 = bs(driver.find_element_by_css_selector('[class^=_2j34xcqD4slSOyTCMbA1dY]').get_attribute('outerHTML'))

# Get time
time = soup.select_one('img + div + div').text
print(time)

当我不想的时候,它会一直将时间转换为UTC。我只想提取每次的原始文本,即晚上9:00。我已经尝试立即解析原始字符串,所以它只获取字符串:

time = soup.select_one('img + div + div').text
' '.join(time.split(' ')[0:2])
#time.replace('UTC','')
print(time)

但它仍然以UTC打印出来,即UTC凌晨2:00。

在将原始字符串自动转换为时间之前,有没有方法只提取原始字符串?我不想处理时区问题,我认为我不需要处理这个项目。只需要原始字符串。

我不知道您为什么要使用Beautiful Soupselect。您可以使用Selenium获取元素的文本吗?

for event in events:
# using locator from your example below, although it did not work for me
element = driver.find_element_by_css_selector('[class^=Y_sOCKLIZzxDZWauPTJlk]')
# Get time
time = element.text
print(time)

输出:

6:00 PM PDT

不确定这是你想要的,但希望这会有所帮助。祝你好运

最新更新