使用数据工厂创建管道,并将活动从 Azure Blob 存储复制到数据湖存储



>我正在尝试使用数据工厂创建一个管道,并将活动从 azure blob 存储复制到数据湖存储。

但是在运行管道时,它显示状态失败并出现以下错误:-

复制活动在源端遇到用户错误:错误代码=用户错误源BlobNotExist,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=缺少所需的Blob。ContainerName: https://**********, ContainerExist: true, BlobPrefix: , BlobCount: 0.,Source=Microsoft.DataTransfer.ClientLibrary,'.

我按照 Azure 官方教程将数据工厂与从 Azure Blob 存储复制到 Azure Lake 存储的活动一起使用。它在我这边工作正常。我们可以使用 Azure 门户、Visual Studio 或 powershell 创建管道。我们可以按照教程逐步做到这一点。教程还提供了以下代码。

  • Azure Storage 类型的链接服务。
{
"name": "StorageLinkedService",
"properties": {
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
}
}
}
  • Azure DataLakeStore 类型的链接服务。
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
"servicePrincipalId": "<service principal id>",
"servicePrincipalKey": "<service principal key>",
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
"subscriptionId": "<subscription of ADLS>",
"resourceGroupName": "<resource group of ADLS>"
}
}
}
  • Azure Blob 类型的输入数据集。
{
"name": "AzureBlobInput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}",
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
}
]
},
"external": true,
"availability": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"externalData": {
"retryInterval": "00:01:00",
"retryTimeout": "00:10:00",
"maximumRetry": 3
}
}
}
}
  • Azure DataLakeStore 类型的输出数据集。
{
"name": "AzureDataLakeStoreOutput",
"properties": {
"type": "AzureDataLakeStore",
"linkedServiceName": "AzureDataLakeStoreLinkedService",
"typeProperties": {
"folderPath": "datalake/output/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
  • 具有使用 BlobSource 和 AzureDataLakeStoreSink 的复制活动的管道。
{  
"name":"SamplePipeline",
"properties":
{  
"start":"2014-06-01T18:00:00",
"end":"2014-06-01T19:00:00",
"description":"pipeline with copy activity",
"activities":
[  
{
"name": "AzureBlobtoDataLake",
"description": "Copy Activity",
"type": "Copy",
"inputs": [
{
"name": "AzureBlobInput"
}
],
"outputs": [
{
"name": "AzureDataLakeStoreOutput"
}
],
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "AzureDataLakeStoreSink"
}
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 0,
"timeout": "01:00:00"
}
}
]
}
}

相关内容

  • 没有找到相关文章

最新更新