如何删除带有特定类的HTML标记



我有这样的HTML代码:

<div id="printready">
  <div class="box-single"></div>
  <div class="marker"></div>
    <h2>sometext</h2>
    <div id="news-single-img"></div>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <span class="cl"></span>
    ... (remove everything since the last paragraph)
</div>

删除这些标签的最佳方法是什么,.box-single, .marker, h2, #news-single-img,然后我想保留所有段落并从最后一段删除其余部分。

我尝试了Nokogiri,但没有找到一个好的解决方案。我使用的框架是Ruby on Rails!

删除标签

doc.search('.box-single', '.marker', 'h2', '#news-single-img').remove

删除最后一个p后面的节点

while node = doc.at('p:last').next
  node.remove
end

您想要做的事情有些含糊不清,所以这里是第一次传递:

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div id="printready">
  <div class="box-single"></div>
  <div class="marker"></div>
    <h2>sometext</h2>
    <div id="news-single-img"></div>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <span class="cl"></span>
    ... (remove everything since the last paragraph)
</div>
EOT
%w[.box-single .marker].each do |klass|
  doc.search(klass).each do |tag|
    tag['class'] = nil
  end
end
doc.at('h2').remove
%w[#news-single-img].each do |tag_id|
  doc.at(tag_id)['id'] = nil
end
loop do 
  next_tag = doc.at('span.cl').next_sibling
  break unless next_tag
  next_tag.remove
end
puts doc.to_html

Running that give me:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="printready">
  <div class=""></div>
  <div class=""></div>
    <div id=""></div>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <span class="cl"></span>
</div></body></html>

如果您想完全删除classid参数:

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div id="printready">
  <div class="box-single"></div>
  <div class="marker"></div>
    <h2>sometext</h2>
    <div id="news-single-img"></div>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <span class="cl"></span>
    ... (remove everything since the last paragraph)
</div>
EOT
%w[.box-single .marker].each do |klass|
  doc.search(klass).remove_attr('class')
end
doc.at('h2').remove
%w[#news-single-img].each do |tag_id|
  doc.search(tag_id).remove_attr('id')
end
loop do 
  next_tag = doc.at('span.cl').next_sibling
  break unless next_tag
  next_tag.remove
end
puts doc.to_html

运行后参数消失:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="printready">
  <div></div>
  <div></div>
    <div></div>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
    <span class="cl"></span>
</div></body></html>

使用javascript可以这样做:

<script type="text/javascript">
    $(function () {
        $("button").click(function () {
            $(".box-single").remove();
        });
    });
</script>

相关内容

  • 没有找到相关文章

最新更新