Pivotal HDB-投诉;数据线太长.可能是由于无效的csv数据“;



我们有一个小型的关键Hadoop hawq集群。我们在上面创建了external表,并指向hadoop文件。

给定环境:

产品版本:(HAWQ 1.3.0.2 build 14421)在x86_64-unknown-linux-gnu上,由GCC GCC(GCC)4.4.2 编译

已尝试:

当我们试图使用命令从外部表读取时。即

test=# select count(*) from EXT_TAB ; GETTING following error : ERROR: data line too long. likely due to invalid csv data (seg0 slice1 SEG0.HOSTNAME.COM:40000 pid=447247) 
DETAIL: External table trcd_stg0, line 12059 of pxf://hostname/tmp/def_rcd/?profile=HdfsTextSimple: "2012-08-06 00:00:00.0^2012-08-06 00:00:00.0^6552^2016-01-09 03:15:43.427^0005567^COMPLAINTS ..."  :

附加信息:

外部表的DDL为:

CREATE READABLE EXTERNAL TABLE sysprocompanyb.trcd_stg0
(
    "DispDt" DATE,
    "InvoiceDt" DATE,
    "ID" INTEGER,
    time timestamp without time zone,
    "Customer" CHAR(7),
    "CustomerName" CHARACTER VARYING(30),
    "MasterAccount" CHAR(7),
    "MasterAccName" CHAR(30),
    "SalesOrder" CHAR(6),
    "SalesOrderLine" NUMERIC(4, 0),
    "OrderStatus" CHAR(200),
    "MStockCode" CHAR(30),
    "MStockDes" CHARACTER VARYING(500),
    "MWarehouse" CHAR(200),
    "MOrderQty" NUMERIC(10, 3),
    "MShipQty" NUMERIC(10, 3),
    "MBackOrderQty" NUMERIC(10, 3),
    "MUnitCost" NUMERIC(15, 5),
    "MPrice" NUMERIC(15, 5),
    "MProductClass" CHAR(200),
    "Salesperson" CHAR(200),
    "CustomerPoNumber" CHAR(30),
    "OrderDate" DATE,
    "ReqShipDate" DATE,
    "DispatchesMade" CHAR(1),
    "NumDispatches" NUMERIC(4, 0),
    "OrderValue" NUMERIC(26, 8),
    "BOValue" NUMERIC(26, 8),
    "OrdQtyInEaches" NUMERIC(21, 9),
    "BOQtyInEaches" NUMERIC(21, 9),
    "DispQty" NUMERIC(38, 3),
    "DispQtyInEaches" NUMERIC(38, 9),
    "CustomerClass" CHAR(200),
    "MLineShipDate" DATE
)
LOCATION (
    'pxf://HOSTNAME-HA/tmp/def_rcd/?profile=HdfsTextSimple'
)
FORMAT 'CSV' (delimiter '^' null '' escape '"' quote '"')
ENCODING 'UTF8';

如果有任何帮助,我们将不胜感激?

基于源代码:https://github.com/apache/incubator-hawq/blob/e48a07b0d8a5c8d41d2d4aaaa70254867b11ee11/src/backend/commands/copy.c

cstate->line_buf.len >= gp_max_csv_line_length为true时,会发生错误。根据:http://hawq.docs.pivotal.io/docs-hawq/guc_config-gp_max_csv_line_length.html

csv的默认长度为1048576字节。您是否检查了csv文件长度并尝试增加此设置的值?

检查第12059行的列数是否与分隔字段数匹配。如果在解析过程中某些行被分组在一起,那么我们可能会超过最大行长度。这种情况通常是因为数据不正确echo$LINE|awk-F"^"'(总计=总计+NF);END{print total}'

相关内容

  • 没有找到相关文章

最新更新