在构建CLI Google Card查看器时,我偶然发现了渲染的问题 html html在命令行中,例如浏览器 w3m 或 lynx 。我最接近的是使用 text 从诺科吉里吐出来:
Nokogiri::HTML::parse(card_snippet).text
,但它打印出如下:
"Albert EinsteinTheoretical PhysicistAlbert Einstein was a German-born theoretical physicist. He developed the general theory of relativity, one of the two pillars of modern physics. Einstein's work is also known for its influence on the philosophy of science. WikipediaBorn: March 14, 1879, Ulm, GermanyDied: April 18, 1955, Princeton, New Jersey, United StatesInfluenced: Satyendra Nath Bose, Wolfgang Pauli, Leo Szilard, moreInfluenced by: Isaac Newton, Mahatma Gandhi, moreBooksThe World as I See It1949Relativity: The Special a...1916Ideas and Opinions2000Out of My Later Years2006The Meaning of Relativity1922The Evolution of Physics1938People also search forIsaac NewtonEduard EinsteinSonStephen HawkingElsa EinsteinSpouseMileva MarićFormer spouseThomas Edison"
但使用lynx:
cat card_snippet.html | lyx -dump -stdin
Albert Einstein
Theoretical Physicist
Albert Einstein was a German-born theoretical physicist. He
developed the general theory of relativity, one of the two pillars
of modern physics. Einstein's work is also known for its influence
on the philosophy of science. Wikipedia
Born: March 14, 1879, Ulm, Germany
Died: April 18, 1955, Princeton, New Jersey, United States
Influenced: Satyendra Nath Bose, Wolfgang Pauli, Leo
Szilard,
注意:剥离一些噪音后。但是,线结尾是适当的。
对Ruby中类似解决方案的任何想法?HTML摘要:Pastebin链接。
这对我有用,
require 'nokogiri'
html = `curl http://pastebin.com/raw/pYKwACBp`
doc = Nokogiri::HTML(html)
puts doc.text.gsub(/[rn]+/,"n").strip