Spark Scala异常Java.net.malformedurlexception:无协议:



我有一个带有边缘列表的rdd,comma分隔(source_url,destination_url(。我必须从source_url提取源主机。我尝试了以下代码:

val edges = links.flatMap{case (src, dst) =>
if (!src.startsWith("http://") || !src.startsWith("https://"))
  { val src_url = "http://" + src 
    val url = new java.net.URL(src_url)
    val uri = url.getHost
    scala.util.Try {
        Some(uri,dst)}
        .getOrElse(None)}
else 
   { val src_url = src
    val url = new java.net.URL(src_url)
    val uri = url.getHost
    scala.util.Try {
        Some(uri,dst)}
        .getOrElse(None)}

}

输入样本:

http://www.belvini.de/weingut/mID/2530/max-markert.html,http://www.belvini.de/content.php/coID/299/kundenmeinungen.html
http://www.belvini.de/weingut/mID/2530/max-markert.html,http://www.belvini.de/weingueter
http://www.belvini.de/weingut/mID/2530/max-markert.html,http://www.belvini.de/filter/cID/10/country/suedafrika.137.html

所需的输出:

www.belvini.de,http://www.belvini.de/content.php/coID/299/kundenmeinungen.html
www.belvini.de,http://www.belvini.de/weingueter
www.belvini.de,http://www.belvini.de/filter/cID/10/country/suedafrika.137.html

运行代码时,我会得到一个例外:

 Job aborted due to stage failure: Task 935 in stage 3.0 failed 4 times, most recent failure: Lost task 935.3 in stage 3.0 (TID 1883, node27.ib, executor 248): 
java.net.MalformedURLException: For input string: "RC-a-shops.de"
at java.net.URL.<init>(URL.java:627)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)

rdd的边缘约为100万,我正在一个集群中运行它。有人可以建议如何摆脱此例外

编辑:编辑问题,以在畸形感受感受中包括一个看起来像良好的URL。无论如何,我的答案是站立的。URL的文档表明,当URL在某种程度上无效时,它只会引发畸形感应感。更完整的输出将有助于调试此问题。

MalformedURLException - if no protocol is specified, or an unknown protocol is found, or spec is null.

看来您的src不包括URL的协议。您需要

之类的东西
http://whatever.com/nlp-agm.php

不仅仅是nlp-agm.php

url必须是形式的

<scheme>://<authority><path>?<query>#<fragment>

需要<scheme>。如果该方案无效或未指定,则new java.net.URL将投掷MalformedURLException。请参阅此处:https://docs.oracle.com/javase/7/docs/api/java/net/net/url.html.html#url(java.lang.string(

java.net.malformedurlexception:当您的字符串中的引号:

new Url(""http:www.example.com"")

最新更新