SageMaker批处理转换未正确加载CSV



我正在运行一个批量转换作业,我们从CSV上传数据。CSV被格式化为这样的

"乔·安内斯步枪配件折扣;

"出售的可爱小狗";

"两个家伙谈论体育;

"史密斯;Wesson M&P 500评论";

"格洛克vs 1911手枪";

我创建批量转换的代码在下面

elec_model = HuggingFaceModel(
model_data='s3://some_path/binary-model', 
role=role, 
entry_point='torchserve_.py',
source_dir='source_dir',
transformers_version="4.17.0",
pytorch_version='1.10.2',
py_version='py38'
)
nl_detector = elec_model.transformer(
instance_type = 'ml.g4dn.xlarge',
strategy="MultiRecord",
assemble_with="Line",
output_path = "s3://some_path/trash_output"
)
nl_detector.transform(
"s3://some_bucket/trash",
content_type="text/csv",
split_type="Line"
)

当我运行这段代码而不是批处理作业时,采用CSV并用每个空格分解示例,这就是

split_type="Line"

是告诉算法这样做,但它只是吸收上面CSV中的所有句子,并输出1个概率。

当我打印输入有效负载时,它看起来像这样的EDIT:同时打印所有三个句子。

"joe annes rifle accesories discount"
2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
"cute puppies for sale"
2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
"Two dudes talk about sports"
2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle

其中,每个句子都是一个推理示例,并由以下语句分隔

2022-10-26T21:03:04,265 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle

因此,似乎sagemaker正在分离推理示例。但是,当我试图将这些句子传递给一个表情符号化器时,符号化器会将它们标记为一个推理示例,而它们应该是三个不同的推理示例。我也尝试过在标记处打印和/或返回上面的对象,希望指向标记的对象能返回一个句子。但它只是返回一个字符,这似乎证实了上面用三个句子表示的对象被视为一个字符串,而不是三个

编辑1:

这是我的推理逻辑torch_serve_.py 的代码

import numpy as np
import os
import sys
import logging
import json
import torch
from sagemaker_inference import content_types, decoder
import subprocess
#subprocess.check_call([sys.executable, "-m", "pip", "install", 'transformers'])
from transformers import ElectraTokenizer

def model_fn(model_dir):
"""
Load the model for inference
"""
model= torch.load(model_dir + "/arms_ammunition.pth")
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

#use_cuda = torch.cuda.is_available()
#if use_cuda:
#    model.cuda()
##return model.to(device)
return model


def predict_fn(input_data, model):
"""
Apply model to the incoming request
"""
print('listy!!!!!!!!!:   ', input_data)

tokenizer = ElectraTokenizer.from_pretrained('google/electra-base-discriminator')
sm = torch.nn.Softmax(dim=2)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#model.to(device)
#model.eval()
encoded_inputs = tokenizer(input_data, max_length=220, padding = 'longest', truncation = True)
print('encoded_inputs', encoded_inputs)
tokened_words = encoded_inputs['input_ids']
#print('tokened_wordssssssss:   ', tokened_words)
attention_mask = encoded_inputs['attention_mask']
#print('attention_maskkkkkkkk:   ', attention_mask)

with torch.no_grad():
outputs = model(torch.tensor(tokened_words).unsqueeze(0).to(device), torch.tensor(attention_mask).unsqueeze(0).to(device))
#print('mmooddeell!!!!!!!!!  ', model)
#probs = sm(outputs.logits)
probs = torch.nn.functional.softmax(outputs.logits, 1)

return probs
#return model(input_data.float()).numpy()[0]


def input_fn(request_body, request_content_type):
"""
Deserialize and prepare the prediction input
"""
print("type of jawn!!!!!!!!!!!!!!!!!!!!", type(request_body))
print('request_content_type', request_content_type)
print('requ_body', request_body)
return request_body



def output_fn(prediction_output, accept):
print("prediction_output.tolist()", prediction_output)
return prediction_output.tolist()
#return json.dumps(prediction_output.tolist())

编辑2:下面的日志片段

2022-10-27T19:52:00,401 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242a9fffefeff83-0000001c-00000001-1c705d0aba004d53-1d21a7ab
2022-10-27T19:52:00,402 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 4800
2022-10-27T19:52:00,404 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1
2022-10-27T19:52:00,406 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - type of jawn!!!!!!!!!!!!!!!!!!!! <class 'str'>
2022-10-27T19:52:00,407 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - request_content_type text/csv
2022-10-27T19:52:00,407 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - requ_body "New Delhi, June 04 (ANI): Food Safety and Standards Association of India (FSSAI) CEO Yudhvir Singh Malik on Thursday said that the reports submitted by Delhi and Kerala are fully authentic while that submitted by Goa was inappropriate. He also said that Goa food officials are asked to submit their proper report in 2-3 days time. Further, Malik said that the food business manufacturer should have their own plans for checking of their products so as to avoid such lapses. He also said that the food safety commissioners are asked to not only focus on Nestle but also to pick up samples from other manufacturers."
2022-10-27T19:52:00,408 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "MW 328 अश्लील फोटोचा धाक दाखवून विवाहितेवर सलग 2 दिवस अत्याचार | औरंगाबाद |\n\nGAON MAJHA NEWS is one of Most Watched Marathi News channel. \n\nHere you can watch live Marathi news, breaking news, politics news, latest news, entertainment news, tech news, auto news, lifestyle news & more. TODAYS MARATHI NEWS, Trending News,  RECENT NEWS, CURRENT NEWS,   WORLD NEWS, DAILY NEWS, latest news, Marathi news, Latest Marathi News, Latest News in Marathi, Marathi news live, News Marathi, Breaking News in Marathi, Live Marathi News, News in Marathi, ताज्या मराठी बातम्या , Marathi Batmya.\n\n\nSUBSCRIBE TO GAON MAJHA NEWS :- \nFacebook page:-\nhttps://www.facebook.com/gaonmajhanews/\n\nTwitter:-\nhttps://twitter.com/gaonmajhanews1\n\nGaonmajha  Official Website:-\nhttp://www.gaonmajha.in/\n\nGoogle+:-\nhttps://plus.google.com/u/0/\n\nYoutube:-\nhttps://www.youtube.com/channel/UCbVX89IreHNyjcHuDTIqicQ?view_as=subscriber\n\n\nStay tuned for all the breaking news in Marathi  with GAON MAJHA NEWS!"
2022-10-27T19:52:00,409 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Subscribe our channel for More Video & Updates: https://go./j07htL\r\n\r\nFollow us on\r\nFacebook : https://www.facebook.com/gstv.news\r\nTwitter : https://twitter.com/gstv_news\r\n#gujaratinews #gstv #gstvnews  #GSTVLIVE #gujaratinewslive #gujaratinewspaper #gujaratinews2020 \r\n#gujaratinewscoronavirustoday #covind19\r\n#GujaratSamacharLiveTV #GujaratSamachar"
2022-10-27T19:52:00,409 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Minutes count when you're having a heart attack. But it took the Chicago Fire Department more than 40 minutes to get a 56-year-old man to a hospital for help. CBS 2's Pam Zekman reports."
2022-10-27T19:52:00,409 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Margie Pargie & The Yoga Hangout SRQ kicked off the Sexy Transformation Challenge with a visit to local boxing gym, Uppercut Boxing & Fitness. We joined Aaron Jaco, Pro Boxer and owner of the gym for a kickass, heart pumping, beast mode workout. The atmosphere, intense yet playful, kept everyone pushing hard until the last minute. Traditional elements of boxing mixed with plyometrics, weight training, and high intensity cardio blended for a solid 500-1000 calorie burn session. If you are looking for results, a supportive community, and a workout built for the most hardcore, you should visit Aaron! He will show you all the moves to be your sexiest self!\n\nwww.aerialyogaschool.com to enter the Sexy Transformation Challenge \nwww.uppercutboxingsarasota.com for info about taking Aaron Jaco's class"
2022-10-27T19:52:00,409 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,410 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - listy!!!!!!!!!:    "New Delhi, June 04 (ANI): Food Safety and Standards Association of India (FSSAI) CEO Yudhvir Singh Malik on Thursday said that the reports submitted by Delhi and Kerala are fully authentic while that submitted by Goa was inappropriate. He also said that Goa food officials are asked to submit their proper report in 2-3 days time. Further, Malik said that the food business manufacturer should have their own plans for checking of their products so as to avoid such lapses. He also said that the food safety commissioners are asked to not only focus on Nestle but also to pick up samples from other manufacturers."
2022-10-27T19:52:00,410 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "MW 328 अश्लील फोटोचा धाक दाखवून विवाहितेवर सलग 2 दिवस अत्याचार | औरंगाबाद |\n\nGAON MAJHA NEWS is one of Most Watched Marathi News channel. \n\nHere you can watch live Marathi news, breaking news, politics news, latest news, entertainment news, tech news, auto news, lifestyle news & more. TODAYS MARATHI NEWS, Trending News,  RECENT NEWS, CURRENT NEWS,   WORLD NEWS, DAILY NEWS, latest news, Marathi news, Latest Marathi News, Latest News in Marathi, Marathi news live, News Marathi, Breaking News in Marathi, Live Marathi News, News in Marathi, ताज्या मराठी बातम्या , Marathi Batmya.\n\n\nSUBSCRIBE TO GAON MAJHA NEWS :- \nFacebook page:-\nhttps://www.facebook.com/gaonmajhanews/\n\nTwitter:-\nhttps://twitter.com/gaonmajhanews1\n\nGaonmajha  Official Website:-\nhttp://www.gaonmajha.in/\n\nGoogle+:-\nhttps://plus.google.com/u/0/\n\nYoutube:-\nhttps://www.youtube.com/channel/UCbVX89IreHNyjcHuDTIqicQ?view_as=subscriber\n\n\nStay tuned for all the breaking news in Marathi  with GAON MAJHA NEWS!"
2022-10-27T19:52:00,411 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Subscribe our channel for More Video & Updates: https://go/j07htL\r\n\r\nFollow us on\r\nFacebook : https://www.facebook.com/gstv.news\r\nTwitter : https://twitter.com/gstv_news\r\n#gujaratinews #gstv #gstvnews  #GSTVLIVE #gujaratinewslive #gujaratinewspaper #gujaratinews2020 \r\n#gujaratinewscoronavirustoday #covind19\r\n#GujaratSamacharLiveTV #GujaratSamachar"
2022-10-27T19:52:00,412 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Minutes count when you're having a heart attack. But it took the Chicago Fire Department more than 40 minutes to get a 56-year-old man to a hospital for help. CBS 2's Pam Zekman reports."
2022-10-27T19:52:00,412 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - "Margie Pargie & The Yoga Hangout SRQ kicked off the Sexy Transformation Challenge with a visit to local boxing gym, Uppercut Boxing & Fitness. We joined Aaron Jaco, Pro Boxer and owner of the gym for a kickass, heart pumping, beast mode workout. The atmosphere, intense yet playful, kept everyone pushing hard until the last minute. Traditional elements of boxing mixed with plyometrics, weight training, and high intensity cardio blended for a solid 500-1000 calorie burn session. If you are looking for results, a supportive community, and a workout built for the most hardcore, you should visit Aaron! He will show you all the moves to be your sexiest self!\n\nwww.aerialyogaschool.com to enter the Sexy Transformation Challenge \nwww.uppercutboxingsarasota.com for info about taking Aaron Jaco's class"
2022-10-27T19:52:00,464 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,470 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]
2022-10-27T19:52:00,470 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading: 100%|██████████| 226k/226k [00:00<00:00, 40.3MB/s]
2022-10-27T19:52:00,561 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,562 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading:   0%|          | 0.00/27.0 [00:00<?, ?B/s]
2022-10-27T19:52:00,562 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading: 100%|██████████| 27.0/27.0 [00:00<00:00, 38.7kB/s]
2022-10-27T19:52:00,603 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,604 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading:   0%|          | 0.00/666 [00:00<?, ?B/s]
2022-10-27T19:52:00,605 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading: 100%|██████████| 666/666 [00:00<00:00, 593kB/s]
2022-10-27T19:52:00,650 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,652 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - encoded_inputs {'input_ids': [101, 1000, 2047, 6768, 1032, 1010, 2238, 5840, 1006, 2019, 2072, 1007, 1024, 2833, 3808, 1998, 4781, 2523, 1997, 2634, 1006, 1042, 11488, 2072, 1007, 5766, 9805, 16425, 21663, 5960, 14360, 2006, 9432, 2056, 2008, 1996, 4311, 7864, 2011, 6768, 1998, 8935, 2024, 3929, 14469, 2096, 2008, 7864, 2011, 15244, 2001, 15884, 1012, 2002, 2036, 2056, 2008, 15244, 2833, 4584, 2024, 2356, 2000, 12040, 2037, 5372, 3189, 1999, 1016, 1011, 1017, 2420, 2051, 1012, 2582, 1032, 1010, 14360, 2056, 2008, 1996, 2833, 2449, 7751, 2323, 2031, 2037, 2219, 3488, 2005, 9361, 1997, 2037, 3688, 2061, 2004, 2000, 4468, 2107, 10876, 2229, 1012, 2002, 2036, 2056, 2008, 1996, 2833, 3808, 12396, 2024, 2356, 2000, 2025, 2069, 3579, 2006, 9089, 2571, 2021, 2036, 2000, 4060, 2039, 8168, 2013, 2060, 8712, 1012, 1000, 1000, 12464, 25256, 1311, 29872, 29870, 29878, 29870, 100, 1326, 29876, 29851, 1325, 29876, 29852, 29871, 29863, 1335, 29877, 29871, 29876, 29875, 29877, 29859, 29871, 29869, 1338, 29870, 29853, 1016, 1325, 29877, 29871, 29874, 1311, 29859, 29868, 29876, 29854, 29876, 29869, 1064, 100, 1064, 1032, 1032, 1050, 1032, 1032, 12835, 7113, 2078, 16686, 3270, 2739, 2003, 2028, 1997, 2087, 3427, 18388, 2739, 3149, 1012, 1032, 1032, 1050, 1032, 1032, 18699, 7869, 2017, 2064, 3422, 2444, 18388, 2739, 1032, 1010, 4911, 2739, 1032, 1010, 4331, 2739, 1032, 1010, 6745, 2739, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
2022-10-27T19:52:00,605 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading: 100%|██████████| 666/666 [00:00<00:00, 593kB/s]
2022-10-27T19:52:00,650 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
2022-10-27T19:52:00,652 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - encoded_inputs {'input_ids': [101, 1000, 2047, 6768, 1032, 1010, 2238, 5840, 1006, 2019, 2072, 1007, 1024, 2833, 3808, 1998, 4781, 2523, 1997, 2634, 1006, 1042, 11488, 2072, 1007, 5766, 9805, 16425, 21663, 5960, 14360, 2006, 9432, 2056, 2008, 1996, 4311, 7864, 2011, 6768, 1998, 8935, 2024, 3929, 14469, 2096, 2008, 7864, 2011, 15244, 2001, 15884, 1012, 2002, 2036, 2056, 2008, 15244, 2833, 4584, 2024, 2356, 2000, 12040, 2037, 5372, 3189, 1999, 1016, 1011, 1017, 2420, 2051, 1012, 2582, 1032, 1010, 14360, 2056, 2008, 1996, 2833, 2449, 7751, 2323, 2031, 2037, 2219, 3488, 2005, 9361, 1997, 2037, 3688, 2061, 2004, 2000, 4468, 2107, 10876, 2229, 1012, 2002, 2036, 2056, 2008, 1996, 2833, 3808, 12396, 2024, 2356, 2000, 2025, 2069, 3579, 2006, 9089, 2571, 2021, 2036, 2000, 4060, 2039, 8168, 2013, 2060, 8712, 1012, 1000, 1000, 12464, 25256, 1311, 29872, 29870, 29878, 29870, 100, 1326, 29876, 29851, 1325, 29876, 29852, 29871, 29863, 1335, 29877, 29871, 29876, 29875, 29877, 29859, 29871, 29869, 1338, 29870, 29853, 1016, 1325, 29877, 29871, 29874, 1311, 29859, 29868, 29876, 29854, 29876, 29869, 1064, 100, 1064, 1032, 1032, 1050, 1032, 1032, 12835, 7113, 2078, 16686, 3270, 2739, 2003, 2028, 1997, 2087, 3427, 18388, 2739, 3149, 1012, 1032, 1032, 1050, 1032, 1032, 18699, 7869, 2017, 2064, 3422, 2444, 18388, 2739, 1032, 1010, 4911, 2739, 1032, 1010, 4331, 2739, 1032, 1010, 6745, 2739, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
2022-10-27T19:51:58.086:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=50, BatchStrategy=MULTI_RECORD
2022-10-27T19:52:01,360 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - prediction_output.tolist() tensor([[0.9766, 0.0234]], device='cuda:0')
2022-10-27T19:52:01,360 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.04482269287109375 ms
2022-10-27T19:52:01,361 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 951.9212245941162 ms
2022-10-27T19:52:01,361 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 1.8284320831298828 ms
2022-10-27T19:52:01,362 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 956
2022-10-27T19:52:01,362 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:37830 "POST /invocations HTTP/1.1" 200 3223
2022-10-27T19:52:01,360 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - prediction_output.tolist() tensor([[0.9766, 0.0234]], device='cuda:0')
2022-10-27T19:52:01,360 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 0.04482269287109375 ms
2022-10-27T19:52:01,361 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 951.9212245941162 ms
2022-10-27T19:52:01,361 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 1.8284320831298828 ms
2022-10-27T19:52:01,362 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 956
2022-10-27T19:52:01,362 [INFO ] W-9000-model ACCESS_LOG - /169.254.255.130:37830 "POST /invocations HTTP/1.1" 200 3223

在SageMaker TensorFlow容器中,我创建了一个输入的numpy数组,然后允许我访问每个记录。我怀疑您可以在HuggingFace容器中执行类似的操作。

例如,

def read_csv(csv):
return np.array([[float(j) for j in i.split(",")] for i in csv.splitlines()])

def input_handler(data, context):
"""Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data stream
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == "text/csv":
payload = data.read().decode("utf-8")
inputs = read_csv(payload)
input = {"inputs": inputs.tolist()}
return json.dumps(input)

https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_batch_transform/custom_tensorflow_inference_script_csv_and_tfrecord/custom_tensorflow_inference_script_csv_and_tfrecord.ipynb

最新更新