在Azure数据工厂的管道中运行U-SQL活动时出错



在ADF:中的管道中运行USQL活动时,我收到以下错误

活动错误:

{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
"source":"USER","message":"syntax error.
Final statement did not end with a semicolon","details":"at token 'txt', line 3rnnear the ###:rn**************rnDECLARE @in string = "/demo/SearchLog.txt";nDECLARE @out string = "/scripts/Result.txt";nSearchLogProcessing.txt ### n",
"description":"Invalid syntax found in the script.",
"resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
"startOffset":109,"endOffset":112}].

这是我试图在管道中执行的输出数据集、管道和USQL脚本的代码。

输出数据集:

{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}

管道:

{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"script": "SearchLogProcessing.txt",
"scriptPath": "scripts\",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-01-03T12:01:05.53Z",
"end": "2017-01-03T13:01:05.53Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}

这是我的USQL脚本,我正试图使用"DataLakeAnalyticsU SQL"活动类型来执行它。

@searchlog =
EXTRACT UserId          int,
Start           DateTime,
Region          string,
Query           string,
Duration        int?,
Urls            string,
ClickedUrls     string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";

OUTPUT @rs1   
TO @out
USING Outputters.Text(delimiter:'|');

请建议我如何解决这个问题。

您的脚本缺少scriptLinkedService属性。您(目前)还需要将U-SQL脚本放置在Azure Blob存储中才能成功运行。因此,您还需要AzureStorage链接服务,例如:

{
"name": "StorageLinkedService",
"properties": {
"description": "",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
}
}
}

创建此链接服务,将Blob存储名称myAzureBlobStorageAccount替换为您的相关Blob存储帐户,然后将U-SQL脚本(SearchLogProcessing.txt)放在那里的容器中,然后重试。在下面的示例管道中,我的Blob存储中有一个名为adlascripts的容器,脚本就在其中:

确保scriptPath是完整的,正如Alexandre提到的。管道起点:

{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...

inputoutput.tsv文件可以位于数据湖中,并使用AzureDataLakeStoreLinkedService链接服务。

我可以看到你正试图从以下位置观看演示:https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-定义。这不是最直观的演示,似乎存在一些问题,比如StorageLinkedService的定义在哪里?,SearchLogProcessing.txt在哪里?好的,我是通过谷歌搜索找到的,但网页上应该有一个链接。我开始工作了,但感觉有点像《混血王子》中的哈利·波特。

删除U-SQL活动定义中的script属性,并在scriptPath属性中提供脚本的完整路径(包括文件名)。

参考:https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity

我遇到了一个类似的问题,Azure数据工厂无法识别我的脚本文件。在不必粘贴大量代码的情况下,避免整个问题的一种方法是注册存储过程。你可以这样做:

DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test() 
AS  
BEGIN 
@searchlog =
EXTRACT UserId          int,
Start           DateTime,
Region          string,
Query           string,
Duration        int?,
Urls            string,
ClickedUrls     string
FROM @in
USING Extractors.Text(delimiter:'|');
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "kota";

OUTPUT @rs1   
TO @out
USING Outputters.Text(delimiter:'|');
END;

运行后,您可以使用

"script": "master.dbo.sp_test()"

在JSON管道定义中。每当更新U-SQL脚本时,只需重新运行过程的定义即可。这样就不需要将脚本文件复制到Blob存储。

相关内容

  • 没有找到相关文章