我的ETL进程有问题。我有ETL过程,用python写的,它工作得很好,但操作一个接一个地开始,所以整个过程持续很长时间。我在Apache气流有点新,但我做了一个挖掘,有一个问题与他)我得到一个错误:
File "/usr/lib/python3.8/encodings/utf_16_le.py", line 15, in decode
def decode(input, errors='strict'):
File "/usr/local/lib/python3.8/dist-packages/airflow/models/taskinstance.py", line 1543, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
The above exception was the direct cause of the following exception:
airflow.exceptions.AirflowException: decoding with 'utf-16le' codec failed (AirflowException: Task received SIGTERM signal)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 716, in do_execute
cursor.execute(statement, parameters)
SystemError: <class 'pyodbc.Error'> returned a result with an error set
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 896, in _rollback_impl
self.engine.dialect.do_rollback(self.connection)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 666, in do_rollback
dbapi_connection.rollback()
pyodbc.OperationalError: ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]Communication link failure (0) (SQLEndTran)')
这是我的任务代码。一次最多可以有10个连接:
def update_from_gladiator_ost(market_id):
query = "DELETE from [stage].[dbo].[rests_by_docs_temp] where market_id = %d" % market_id
execute_query_dwh(query)
engine = dwh_conn()
connection = engine.raw_connection()
abc = connection.cursor()
# abc.execute("DELETE from [stage].[dbo].[sell_movement_temp]; DELETE from [stage].[dbo].[rests_by_docs_temp]")
df_op = pd.read_sql(
"SET NOCOUNT ON exec [dbo].[mp_report_finance_agent_enhanced_basis_transport_royalty_NC_ost_by_docs4] @pmarket_id = %d, @pstart_date = '%s', @pend_date = '%s', @pselect = '1'" % (
market_id, z, w), gladiator_conn())
df_op = df_op.fillna(value=0)
for row_count in range(0, df_op.shape[0]):
chunk = df_op.iloc[row_count:row_count + 1, :].values.tolist()
tuple_of_tuples = tuple(tuple(x) for x in chunk)
abc.executemany(
"insert into stage.dbo.rests_by_docs_temp" + " ([date_start],[market_id],[good_id],[agent_id],[doc_id],[tstart_qty],[tstart_amt],[IMP],[doc_name]) values (?,?,?,?,?,?,?,?,?)",
tuple_of_tuples)
abc.commit()
connection.close()
如您所见,我从数据库中获取数据并将其插入我的DWH
下面是我的连接:
def dwh_conn():
mySQL = '192.168.240.1'
myDB = 'DWH'
login = 'sa'
PWD = '....'
Encrypt = 'No'
Certificate = 'Yes'
params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
"SERVER=" + mySQL + ";"
"SERVER=" + mySQL + ";"
"Port=1433" + ";"
"DATABASE=" + myDB + ";"
"UID=" + login + ";"
"PWD=" + PWD + ";"
"Encrypt=" + Encrypt + ";"
"TrustServerCertificate=" + Certificate + ";")
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
return engine
def gladiator_conn():
mySQL = '...'
myDB = '...'
login = '...'
PWD = '...'
Encrypt = 'No'
Certificate = 'Yes'
params = urllib.parse.quote_plus("DRIVER={ODBC Driver 18 for SQL Server};"
"SERVER=" + mySQL + ";"
"Port=1433" + ";"
"DATABASE=" + myDB + ";"
"UID=" + login + ";"
"PWD=" + PWD + ";"
"Encrypt=" + Encrypt + ";"
"TrustServerCertificate=" + Certificate + ";")
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect={}?charset=utf8mb4'.format(params), fast_executemany=True)
return engine
我认为unixODBC的问题。因为当我在Windows上用Pycharm编写整个代码时-一切都很好。但是在docker Ubuntu/Airflow上,它有时会失败。我可以重新启动失败的任务,它可以正常运行,但可能再次失败
更新:我想我找到了一个解决办法,但我不能在我的情况下实现它。
def decode_sketchy_utf16(raw_bytes):
s = raw_bytes.decode("utf-16le", "ignore")
try:
n = s.index('u0000')
s = s[:n] # respect null terminator
except ValueError:
pass
return s
# ...
prev_converter = cnxn.get_output_converter(pyodbc.SQL_WVARCHAR)
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, decode_sketchy_utf16)
col_info = crsr.columns("Clients").fetchall()
cnxn.add_output_converter(pyodbc.SQL_WVARCHAR, prev_converter) # restore previous behaviour
帮助我如何使它在我的代码工作?我应该在哪里实现它?
找到答案了。当我记忆力差的时候,这些问题就会出现。特别是当几个容器打开时,它可能会出现这个错误