我正在使用Nokogiri从网页中抓取数据,我的印象是以下将抓取数据并返回为数组?相反,我得到一个大字符串,这引起了一些问题。
home_team = doc.css(".team-home.teams")
如果我要使用
home_team = doc.css(".team-home.teams").text
我可以理解作为字符串返回的数据。我看错了吗?
我甚至试过
home_team = doc.css(".team-home.teams").map(&:text)
,但这似乎是返回一个字符串?如果我得到一个数组在控制台返回它将在数组格式,是吗?
如果有人可以在他们的控制台尝试这个
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
#home_team = doc.css(".team-home.teams")
puts home_team
,确认两种情况下的输出都是字符串,以及两者之间的区别是什么。
谢谢
您正在获得一个数组。只是puts
做了一个to_s
。看看这个:
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
# home_team = doc.css(".team-home.teams")
puts home_team.class
puts home_team.map(&:strip).inspect
#=> Array
#=> ["Everton", "Aston Villa", "Southampton", "Stoke", "Swansea", "Man Utd", "Sunderland", "Tottenham", "Chelsea", "Wigan", "Sunderland", "Arsenal", "Man City", "Swansea", "West Ham", "Wigan", "Everton", "Aston Villa", "Southampton", "Fulham", "Reading", "Chelsea", "Newcastle", "Norwich", "Stoke", "West Brom", "Liverpool", "Tottenham", "QPR", "Man Utd", "Newcastle", "Arsenal", "Aston Villa", "Everton", "Reading", "Southampton", "Stoke", "Chelsea", "Arsenal", "Fulham", "Norwich", "QPR", "Sunderland", "Swansea", "West Brom", "West Ham", "Tottenham", "Liverpool", "Man Utd", "Man City", "Aston Villa", "Chelsea", "Everton", "Southampton", "Stoke", "Wigan", "Newcastle", "Reading", "Arsenal", "Fulham", "Liverpool", "Man Utd", "Norwich", "QPR", "Sunderland", "Swansea", "Tottenham", "West Brom", "West Ham", "Arsenal", "Aston Villa", "Everton", "Fulham", "Man Utd", "Norwich", "QPR", "Reading", "Stoke", "Sunderland", "Chelsea", "Liverpool", "Man City", "Newcastle", "Southampton", "Swansea", "Tottenham", "West Brom", "West Ham", "Wigan"]
数据中有很多空白。当我这样做时,我得到一个数组:
home_team = doc.css(".team-home.teams").map {|team| team.text.strip}