Azure 数据工厂启动管道不同于启动作业



我在这个问题上很疯狂,我正在运行 Azure 数据工厂 V1,我需要从 2009 年 3 月 1 日到 2009 年 1 月 31 日每周计划一个复制作业,所以我在管道上定义了这个计划:

    "start": "2009-01-03T00:00:00Z",
    "end": "2009-01-31T00:00:00Z",
    "isPaused": false,

监视管道,数据工厂计划在以下日期:

12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009

而不是这个想要的时间表:

01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009

为什么管道上定义的开始日期与监视器上的计划日期不对应?

非常感谢!

下面是 JSON 管道:

{
"name": "CopyPipeline-blob2datalake",
"properties": {
    "description": "copy from blob storage to datalake directory structure",
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "script/dat230.usql",
                "scriptLinkedService": "AzureStorageLinkedService",
                "degreeOfParallelism": 5,
                "priority": 100,
                "parameters": {
                    "salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
                    "lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
                }
            },
            "inputs": [
                {
                    "name": "InputDataset-dat230"
                }
            ],
            "outputs": [
                {
                    "name": "OutputDataset-dat230"
                }
            ],
            "policy": {
                "timeout": "01:00:00",
                "concurrency": 1,
                "retry": 1
            },
            "scheduler": {
                "frequency": "Day",
                "interval": 7
            },
            "name": "DataLakeAnalyticsUSqlActivityTemplate",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2009-01-03T00:00:00Z",
    "end": "2009-01-11T00:00:00Z",
    "isPaused": false,
    "hubName": "edxlearningdf_hub",
    "pipelineMode": "Scheduled"
}
}

这里是数据集:

{
"name": "InputDataset-dat230",
"properties": {
    "structure": [
        {
            "name": "Date",
            "type": "Datetime"
        },
        {
            "name": "StoreID",
            "type": "Int64"
        },
        {
            "name": "StoreName",
            "type": "String"
        },
        {
            "name": "ProductID",
            "type": "Int64"
        },
        {
            "name": "ProductName",
            "type": "String"
        },
        {
            "name": "Color",
            "type": "String"
        },
        {
            "name": "Size",
            "type": "String"
        },
        {
            "name": "Manufacturer",
            "type": "String"
        },
        {
            "name": "OnHandQuantity",
            "type": "Int64"
        },
        {
            "name": "OnOrderQuantity",
            "type": "Int64"
        },
        {
            "name": "SafetyStockQuantity",
            "type": "Int64"
        },
        {
            "name": "UnitCost",
            "type": "Double"
        },
        {
            "name": "DaysInStock",
            "type": "Int64"
        },
        {
            "name": "MinDayInStock",
            "type": "Int64"
        },
        {
            "name": "MaxDayInStock",
            "type": "Int64"
        }
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "Source-BlobStorage-dat230",
    "typeProperties": {
        "fileName": "*.txt.gz",
        "folderPath": "dat230/{year}/{month}/{day}/",
        "format": {
            "type": "TextFormat",
            "columnDelimiter": "t",
            "firstRowAsHeader": true
        },
        "partitionedBy": [
            {
                "name": "year",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "yyyy"
                }
            },
            {
                "name": "month",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "MM"
                }
            },
            {
                "name": "day",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "dd"
                }
            }
        ],
        "compression": {
            "type": "GZip"
        }
    },
    "availability": {
        "frequency": "Day",
        "interval": 7
    },
    "external": true,
    "policy": {}
}
}
{
"name": "OutputDataset-dat230",
"properties": {
    "structure": [
        {
            "name": "Date",
            "type": "Datetime"
        },
        {
            "name": "StoreID",
            "type": "Int64"
        },
        {
            "name": "StoreName",
            "type": "String"
        },
        {
            "name": "ProductID",
            "type": "Int64"
        },
        {
            "name": "ProductName",
            "type": "String"
        },
        {
            "name": "Color",
            "type": "String"
        },
        {
            "name": "Size",
            "type": "String"
        },
        {
            "name": "Manufacturer",
            "type": "String"
        },
        {
            "name": "OnHandQuantity",
            "type": "Int64"
        },
        {
            "name": "OnOrderQuantity",
            "type": "Int64"
        },
        {
            "name": "SafetyStockQuantity",
            "type": "Int64"
        },
        {
            "name": "UnitCost",
            "type": "Double"
        },
        {
            "name": "DaysInStock",
            "type": "Int64"
        },
        {
            "name": "MinDayInStock",
            "type": "Int64"
        },
        {
            "name": "MaxDayInStock",
            "type": "Int64"
        }
    ],
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "Destination-DataLakeStore-dat230",
    "typeProperties": {
        "fileName": "txt.gz",
        "folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
        "format": {
            "type": "TextFormat",
            "columnDelimiter": "t"
        },
        "partitionedBy": [
            {
                "name": "year",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "yyyy"
                }
            },
            {
                "name": "month",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "MM"
                }
            },
            {
                "name": "day",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "dd"
                }
            }
        ]
    },
    "availability": {
        "frequency": "Day",
        "interval": 7
    },
    "external": false,
    "policy": {}
}
}

您需要查看数据集的时间片和活动。

管道计划(名称不正确(仅定义任何活动可用于预配和运行时间片的开始和结束时间段。

ADFv1 不使用像 SQL Server 代理那样的递归计划。每次执行都必须按照您创建的时间线(计划(按一定间隔进行预置。

例如,如果管道开始和结束时间为 1 年。但是您的数据集和活动的频率为每月,间隔为 1 个月,无论发生什么,您只会获得 12 次执行。

抱歉,如果您还不熟悉时间片的概念,那么解释时间片的概念有点困难。也许阅读这篇文章:https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/

希望这有帮助。

你能和我们分享数据集和管道的json吗?帮助您拥有它会更容易。

同时,检查您是否在活动的调度程序属性中使用了"style":"StartOfInterval",并检查是否使用了偏移量。

干杯!

最新更新