Troubles with BlazingText jsonlines Batch Transform



我有一个jsonlines文件,看起来像这样:

{"id":123,"source":"this is a text string"}
{"id":456,"source":"this is another text string"}
{"id":789,"source":"yet another string"}

当我在仅包含源的文件上运行 BlazingText 批量转换作业时,它可以工作。尝试连接输入和输出时,我得到Customer Error: Unable to decode payload: Incorrect data format. (caused by AttributeError).

有什么建议吗?

法典:

bt_transformer = bt_model.transformer(
instance_count = 1,
instance_type = "ml.m4.xlarge",
assemble_with = "Line",
output_path = s3_batch_out_data,
accept = "application/jsonlines"
)
bt_transformer.transform(
s3_batch_in_data, 
content_type = "application/jsonlines",
split_type = "Line", 
input_filter = "$.source", 
join_source = "Input", 
output_filter = "$['id', 'SageMakerOutput']"
)
bt_transformer.wait()

当在 {"id":123,"source":"这是一个文本字符串"} 上应用 "$.source" 时,输出是"这是一个文本字符串"而不是 {"source":"这是一个文本字符串"},这可能就是您遇到格式错误的原因。我想知道为什么您需要对 JSON 输入进行此类过滤 - 算法不会自动忽略无法识别的 JSON 字段吗?

最新更新