value join不是org.apache.spark.rdd.rdd的成员

我得到这个错误：

value join is not a member of 
    org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
        forSome { type _0 <: (String, Double) }]

我找到的唯一建议是import org.apache.spark.SparkContext._我已经在这么做了。

我做错了什么？

EDIT：更改代码以消除forSome（即，当对象的类型为org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))时）解决了问题。这是Spark中的一个bug吗？

join是org.apache.spark.rdd.PairRDDFunctions的成员。那么，为什么隐式类不触发呢？

scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
 --- because ---
argument expression's type is not compatible with formal parameter type;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)
               ^
<console>:25: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(K, V)]
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)

我相信其他人都清楚这个错误消息，但就我自己而言，让我们试着理解一下。PairRDDFunctions有两个类型参数，K和V。您的forSome适用于整个配对，因此不能将其拆分为单独的K和V类型。没有K和V与RDD[(K, V)]的RDD类型相等。

但是，forSome可以只应用于密钥，而不是整个密钥对。Join现在有效，因为这种类型可以分为K和V。

scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26

考虑将两个Spark RDD连接在一起。。

比方说，rdd1.first是(Int, Int, Float) = (1,957,299.98)的形式而CCD_ 20类似于CCD_。

scala>rdd1.join（rdd2）---导致错误：**：error：value join不是org.apache.spark.rdd.rdd[（Int，Int，浮动）]

原因

两个RDD都应采用键值对的形式。

这里，rdd2——形式为（1957299.98）——不符合这个规则。。而rdd1的形式是（25876,1）。

分辨率

将第一个RDD的输出从(1,957,299.98)转换为(1,(957,299.98))形式的键值对，然后将其与rdd2连接，如下所示：

scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD
scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))

顺便说一句，join是org.apache.spark.rdd.PairRDDFunctions的成员。因此，请确保将其导入Eclipse或IDE，无论您想在哪里运行代码。

文章也在我的博客：

https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html

相关内容

最新更新

热门标签：