Spark SQL-使用SQL语句而不是表名通过JDBC加载数据

我想我错过了什么，但不知道是什么。我想使用SQLContext和JDBC加载数据，使用特定的sql语句像

select top 1000 text from table1 with (nolock)
where threadid in (
  select distinct id from table2 with (nolock)
  where flag=2 and date >= '1/1/2015' and  userid in (1, 2, 3)
)

我应该使用SQLContext的哪种方法？我看到的例子总是指定表名和上下边距。

提前谢谢。

您应该传递一个有效的子查询作为dbtable参数。例如Scala:

val query = """(SELECT TOP 1000 
  -- and the rest of your query
  -- ...
) AS tmp  -- alias is mandatory*"""   
val url: String = ??? 
val jdbcDF = sqlContext.read.format("jdbc")
  .options(Map("url" -> url, "dbtable" -> query))
  .load()

*配置单元语言手册子查询：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+子查询

val url = "jdbc:postgresql://localhost/scala_db?user=scala_user"
Class.forName(driver)
val connection = DriverManager.getConnection(url)
val df2 = spark.read
      .format("jdbc")
      .option("url", url)
      .option("dbtable", "(select id,last_name from emps) e")
      .option("user", "scala_user")
      .load()

关键是"（select id，last_name from emps）e"，在这里您可以编写一个子查询来代替table_name。

相关内容

最新更新

热门标签：