在Scala中，数据不会被加载到表中

我使用下面的代码片段将数据加载到表中。但是数据不能被加载到表中

import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import java.text.SimpleDateFormat
import java.util.Calendar
import sqlContext.implicits._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, FloatType, DoubleType}
import org.apache.spark.sql.functions.rand
import scala.io.Source
val sqlContext = new SQLContext(sc)
val TextFiledata= sc.textFile("wasb://Test.txt")
val schema = StructType(
    Array(
      StructField("ABC", StringType, true),
      StructField("XYZ", StringType, true)
    )
)
val mapped = TextFiledata
  .map(_.split("#|#"))
  .filter(r => r(0) != "ABC")
  .map(p => Row(p(0), p(1))

val DF = sqlContext.createDataFrame(mapped ,schema)
DF.registerTempTable("Table")

根据您的代码val TextFiledata= sc.textFile("wasb://Test.txt")，我认为基于Azure Blob存储的HDFS的文件路径不正确。

WASB URI语法为:

wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>

因此，您应该将该文件称为wasbs:///Test.txt或wasbs://<ContainerName>@<StorageAccountName>.blob.core.windows.net/Test.txt。

当使用wasb:// URI方案时，Spark使用未加密的HTTP从Azure Storage Blobs端点访问数据。我们可以使用wasbs://来确保通过HTTPS访问数据。

相关内容

最新更新

热门标签：