我正在尝试使用 hiveContext 插入数据:
/* table filedata
CREATE TABLE `filedata`(
`host_id` string,
`reportbatch` string,
`url` string,
`datatype` string,
`data` string,
`created_at` string,
`if_del` boolean)
*/
hiveContext.sql("insert into filedata (host_id, data) values ("a1e1", "welcome")")
错误并尝试使用"选择":
hiveContext.sql("select "a1e1" as host_id, "welcome"as data").write.mode("append").saveAsTable("filedata")
/*
stack trace
java.lang.ArrayIndexOutOfBoundsException: 2
*/
它需要这样的所有列:
hc.sql("select "a1e1" as host_id,
"xx" as reportbatch,
"xx" as url,
"xx" as datatype,
"welcome" as data,
"2017" as created_at,
1 as if_del").write.mode("append").saveAsTable("filedata")
有没有办法插入指定的列?例如,仅插入列" host_id"one_answers" data"。
据我所知,Hive不支持值插入某些列
来自文档
值子句中列出的每一行都插入表格tablename。
必须为表中的每个列提供值。标准SQL语法允许用户仅将值插入一些列尚未支持。为了模仿标准SQL,无效为列提供了用户不希望为。
分配值
因此,您应该尝试以下操作:
val data = sqlc.sql("select 'a1e1', null, null, null, 'welcome', null, null, null")
data.write.mode("append").insertInto("filedata")
参考此处
,如果您使用的是Row columnar文件格式(例如ORC),则可以执行此操作。请参阅下面的工作示例。这个示例是在Hive中,但可以与HiveContext
一起工作。
hive> use default;
OK
Time taken: 1.735 seconds
hive> create table test_insert (a string, b string, c string, d int) stored as orc;
OK
Time taken: 0.132 seconds
hive> insert into test_insert (a,c) values('x','y');
Query ID = user_20171219190337_b293c372-5225-4084-94a1-dec1df9e930d
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1507021764560_1375895)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.06 s
--------------------------------------------------------------------------------
Loading data to table default.test_insert
Table default.test_insert stats: [numFiles=1, numRows=1, totalSize=417, rawDataSize=254]
OK
Time taken: 6.828 seconds
hive> select * from test_insert;
OK
x NULL y NULL
Time taken: 0.142 seconds, Fetched: 1 row(s)
hive>