在 apache 演练中调整 CTAS 镶木地板表的模式:使元素成为必需元素而不是可选元素



我想使用 apache drill 生成一个具有非常特定模式的镶木地板文件。我使用 CTAS 连接两个表,例如:

CREATE TABLE synthetic1 AS (
SELECT e1.returneddocids AS returneddocids, e1.pathinfo AS pathinfo, c1.counters AS counters
FROM dfs.`/tmp/tier1.parquet` e1 LEFT JOIN dfs.tmp.shadow3 c1 ON TRUE LIMIT 100
);

生成的文件架构如下所示:

message root {
optional group returneddocids {
repeated group list {
optional binary element (UTF8); // need this one as required, not optional
}
}
optional binary pathinfo (UTF8);
optional group counters {
repeated group list {
optional group element {        // need this as required
optional binary name (UTF8);  // need this as required
optional int32 value;         // need this as required
}
}
}
}

我想知道如何调整 CTAS 查询optional以便将上面的元素更改为required

这非常复杂,您可以使用创建或替换架构来应用约束。就我而言,这种工作(不完全是,尽管可能对其他遇到类似问题的人有所帮助(:

ALTER SESSION SET `store.table.use_schema_file` = true;
ALTER SESSION SET `exec.storage.enable_v3_text_reader` = true;
CREATE OR REPLACE SCHEMA (
returneddocids STRUCT<`list` STRUCT<`element` ARRAY<VARCHAR>>> NOT NULL,
pathinfo VARCHAR NOT NULL,
counters STRUCT<`list` STRUCT<`element` ARRAY<VARCHAR>>>
) FOR TABLE synthetic1;

最新更新