我一直在努力用一个简单的正则表达式来实现这一点,但它从来都不是很准确。它不一定是完美的。
Source包含
和
标记的组合。我不想数空行。
老办法:
self.words = rendered.gsub(/<p> </p>/,'').gsub(/<p><brs?/?>|(?:<brs?/?>){2,}/,'<br>').scan(/<br>|<br />|<p/).size+1
新方式(不起作用:尝试将所有
+转换为段落,然后将其放入nokogiri中,以计算其中包含3个以上字符的段落标记(我不知道怎么做?计算1个字母行也很好,但这在javascript中运行良好)
h = rendered
h.gsub!(/<br>s*<br>/gi,"<p>")
h.gsub!(/<br>/gi,"<p>") if h =~ /<br>s*<br>/
h.prepend "<p>" if !h =~ /^s*<p[^>]*>/i
h.replace(/<p>s*<p>/g,"<p> </p><p>")
Nokogiri::HTML(rendered)
# find+count p tags with at least 1-3 chars?
# this is javascript not ruby, but you get the idea
$('p', c).each(function(i) { // had to trim it to remove whitespaces from start/end.
if ($(this).children('img').length) return; // skip if it's just an image.
if ($.trim($(this).text()).length > 3)
$(this).append("<div class='num'>"+ (n += 1) +"</div>");
})
欢迎使用其他方法!
示例诗(http://allpoetry.com/poem/7429983-the_many_endings-by-Kevin)
<p>
from the other side of silence<br>
you met me with change and a pocket<br>
of unhappy apples.</p>
<p>
</p>
<p>
<br>
we bled together to black<br>
and chose the path carefully to<br>
france.<br><br>
sometimes when you smile<br>
your radiant footsteps fall<br>
and all around us is silence:<br>
each dream step is<br>
false but full of such glory</p>
<p>
</p>
<p>
<br>
unhappiness never made a student of you:<br>
just two by two by two. now three<br>
this great we that overflows our<br>
heart-cave<br><br>
each jewel-like addition to the delicate<br>
crown. but flowers fall and dreams,<br>
all dreams, come to and end with death.</p>
谢谢!
对于子孙后代,以下是我现在使用的内容,它似乎非常准确。非拉丁字符有时会在ckeditor中引起一些问题,所以我现在将其删除。
html = Nokogiri::HTML(rendered)
text = html.at('body').inner_text rescue nil
return self.words = rendered.gsub(/<p> </p>/,'').gsub(/<p><brs?/?>|(?:<brs?/?>){2,}/,'<br>').scan(/<br>|<br />|<p/).size+1 if !text
#bonus points to strip lines entirely non-letter. idk
#d "text is", text.gsub!(/([x09|x0D|t])|(xc2xa0){1,}|[^A-z]/u,'')
text.gsub!(/[^A-zn]/u,'')
#d "text is", text
self.words = text.strip.scan(/(s*ns*)+/).size+1