使用 JSOUP 获取 HTML 字符串中的第一个有效图像



这是我的代码,但有时它无法获得某些HTML字符串的有效图像。

如果我在循环中记录每个图像变量,我可以看到存在一个有效的图像,但我的代码使用另一个有时无效的图像或函数返回空图像。

怎么了?谢谢

    private String getFirstImage(String htmlString){
    if(htmlString==null) return null;
    String img ="";
    Document doc = Jsoup.parse(htmlString);
    Elements imgs = doc.getElementsByTag("img");

     for (Element imageElement : imgs) {
         if(imageElement!=null){
         //for each element get the srs url
         img = imageElement.absUrl("src");
            if( !img.contains("doubleclick.net") &&
                !img.contains("feedburner.com") &&
                !img.contains("feedsportal.com") &&
                !img.contains("ads"))
                    return img;
         }
     }
     return null;
}

/***** 编辑 --> 示例

htmlString <p><a rel="attachment wp-att-182120" href="http://apple.hdblog.it/2013/09/08/arrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c/logic5c/"><img height="390" alt="logic5c" width="520" src="http://apple.hdblog.it/wp-content/uploads/2013/09/logic5c-520x390.jpg"></a> <p>Nella rubrica “componenti leaked quotidiani” fa arrivo la scheda logica di iPhone 5C, grazie a diversi scatti fotografici diffusi tramite il social network cinese Weibo. Ormai sembra assodato che per Apple sia impossibile limitare le fughe di notizie quando sono previsti lanci di prodotti a breve distanza dalla loro presentazione. Le fabbriche cinesi alle quali Apple si affida sono armai un “colabrodo” di foto e notizie, che fanno perdere un po’ la <em>magia</em> dei keynote Apple. <p>Purtroppo o per fortuna le fotografie in questione non rivelano molto riguardo l’hardware interno dell’iPhone 5C e lasciano spazio alle speculazioni. La disposizione dei fori delle viti e dei connettori nella parte superiore è uguale a <a href="http://apple.hdblog.it/2013/06/21/nuove-foto-mostrano-il-prossimo-iphone-5s-rumor/">quella già vista sulla scheda madre di iPhone 5S.</a> <p>Ma questo non è abbastanza per far pensare ad un hardware uguale tra iPhone 5S e 5C. Mentre il primo nasconderà sotto la sua scocca le novità ingegneristiche di Apple -leggi chip A7- l’iPhone 5C dovrebbe essere <strong>niente di più che un iPhone 5 con scocca in plastica</strong>. Ma siamo pronti a sorprese <a title="Ufficiale | Evento Apple il 10 settembre!" href="http://apple.hdblog.it/2013/09/03/da-confermare-evento-apple-10-settembre-secondo-nuove-informazioni/">per il prossimo 10 settembre</a>! <div><div></div></div>  <img height="1" width="1" src="http://rss.feedsportal.com/c/33112/f/537497/s/30f29429/sc/28/mf.gif" border="0"><br clear="all"><div><table border="0"><tr><td valign="middle"><a target="_blank" href="http://share.feedsportal.com/share/twitter/?u=http%3A%2F%2Fapple.hdblog.it%2F2013%2F09%2F08%2Farrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c%2F&t=Arrivano+le+prime+immagini+rubate+della+Scheda+Logica+di+iPhone+5C"><img src="http://res3.feedsportal.com/social/twitter.png" border="0"></a> <a target="_blank" href="http://share.feedsportal.com/share/facebook/?u=http%3A%2F%2Fapple.hdblog.it%2F2013%2F09%2F08%2Farrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c%2F&t=Arrivano+le+prime+immagini+rubate+della+Scheda+Logica+di+iPhone+5C"><img src="http://res3.feedsportal.com/social/facebook.png" border="0"></a> <a target="_blank" href="http://share.feedsportal.com/share/linkedin/?u=http%3A%2F%2Fapple.hdblog.it%2F2013%2F09%2F08%2Farrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c%2F&t=Arrivano+le+prime+immagini+rubate+della+Scheda+Logica+di+iPhone+5C"><img src="http://res3.feedsportal.com/social/linkedin.png" border="0"></a> <a target="_blank" href="http://share.feedsportal.com/share/gplus/?u=http%3A%2F%2Fapple.hdblog.it%2F2013%2F09%2F08%2Farrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c%2F&t=Arrivano+le+prime+immagini+rubate+della+Scheda+Logica+di+iPhone+5C"><img src="http://res3.feedsportal.com/social/googleplus.png" border="0"></a> <a target="_blank" href="http://share.feedsportal.com/share/email/?u=http%3A%2F%2Fapple.hdblog.it%2F2013%2F09%2F08%2Farrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c%2F&t=Arrivano+le+prime+immagini+rubate+della+Scheda+Logica+di+iPhone+5C"><img src="http://res3.feedsportal.com/social/email.png" border="0"></a></td><td valign="middle"></td></tr></table></div><br><br><a href="http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/1/rc.htm"><img src="http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/1/rc.img" border="0"></a><br><a href="http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/2/rc.htm"><img src="http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/2/rc.img" border="0"></a><
09-08 12:04:08.736: D/GoogleReader(539): image http://apple.hdblog.it/wp-content/uploads/2013/09/logic5c-520x390.jpg
09-08 12:04:08.747: D/GoogleReader(539): image http://rss.feedsportal.com/c/33112/f/537497/s/30f29429/sc/28/mf.gif
09-08 12:04:08.775: D/GoogleReader(539): image http://res3.feedsportal.com/social/twitter.png
09-08 12:04:08.775: D/GoogleReader(539): image http://res3.feedsportal.com/social/facebook.png
09-08 12:04:08.775: D/GoogleReader(539): image http://res3.feedsportal.com/social/linkedin.png
09-08 12:04:08.785: D/GoogleReader(539): image http://res3.feedsportal.com/social/googleplus.png
09-08 12:04:08.785: D/GoogleReader(539): image http://res3.feedsportal.com/social/email.png
09-08 12:04:08.785: D/GoogleReader(539): image http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/1/rc.img
09-08 12:04:08.866: D/GoogleReader(539): image http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/2/rc.img
09-08 12:04:08.866: D/GoogleReader(539): image http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/sc/28/rc/3/rc.img
09-08 12:04:08.866: D/GoogleReader(539): image http://da.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/a2.img
09-08 12:04:08.916: D/GoogleReader(539): image http://pi.feedsportal.com/r/174726694371/u/49/f/537497/c/33112/s/30f29429/a2t.img
09-08 12:04:08.916: D/GoogleReader(539): image http://feeds.feedburner.com/~r/hd-blog/~4/4dvgXhJMTt8

在这个例子中,"http://apple.hdblog.it/wp-content/uploads/2013/09/logic5c-520x390.jpg"是可以的,但我的代码不使用它:S...我不知道为什么

在此示例中, "http://apple.hdblog.it/wp-content/uploads/2013/09/logic5c-520x390.jpg" 没关系,但我的代码不使用它:S...我不知道为什么

您需要更改过滤器,因为

            !img.contains("ads"))

http://apple.hdblog.it/wp-content/uplo**ads**/2013/09/logic5c-520x390.jpg

似乎链接包含"广告"并被过滤。

也许像这样:

            !img.contains("/ads/"))

为什么要添加此筛选器?也许我可以提供一个比这个更好的建议。

if 语句

后有一个空格,不带大括号的 if 语句只读取下一行。尝试在内部 if 语句中添加括号。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class StackoverflowTest {
    //The url of the website. This is just an example
    private static final String webSiteURL = "http://apple.hdblog.it/2013/09/08/arrivano-le-prime-immagini-rubate-della-scheda-logica-di-iphone-5c/logic5c/";
    public static void main(String[] args) {
        StackoverflowTest sot = new StackoverflowTest();
        sot.getFirstImage(webSiteURL);
    }
    private String getFirstImage(String htmlString){
        try {
            //Connect to the website and get the html
            Document doc = Jsoup.connect(webSiteURL).get();
            //Get all elements with img tag ,
            Elements img = doc.getElementsByTag("img");
            for (Element el : img) {
                //for each element get the srs url
                String src = el.absUrl("src");
                if(!src.contains("doubleclick.net") &&
                        !src.contains("feedburner.com") &&
                        !src.contains("feedsportal.com") &&
                        !src.contains("ads"))
                System.out.println("Image Found!");
                System.out.println("src attribute is : "+src);
                return src;
            }
            return null;
        } catch (IOException ex) {
            System.err.println("There was an error");
            ex.printStackTrace();
        }
        return null;
    }

}

最新更新