通过连接到Databricks CLI与Github动作更新Databricks工作区Repo &g



我试图在每次向repo进行新的推送时自动将最新版本的GitHub repo拉入我的Databricks工作区。一切工作正常,直到Databricks CLI请求主机URL,之后它失败了&;Error: Process completed with exit code 1.&;我假设这是我的令牌和主机凭据作为秘密存储的问题,没有正确加载到环境中。根据Databricks, CLI 0.8.0及以上版本支持以下环境变量:DATABRICKS_HOST, DATABRICKS_USERNAME, DATABRICKS_PASSWORD, DATABRICKS_TOKEN"我已经添加了DATABRICKS_HOST和DATABRICKS_TOKEN作为存储库机密,所以我不确定我做错了什么。

on:
push:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: setup python
uses: actions/setup-python@v2
with:
python-version: 3.8 #install the python version needed
- name: execute py
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
run: |
python -m pip install --upgrade databricks-cli
databricks configure --token
databricks repos update --repo-id REPOID-ENTERED --branch "Development"

错误:

Successfully built databricks-cli
Installing collected packages: tabulate, certifi, urllib3, six, pyjwt, oauthlib, idna, click, charset-normalizer, requests, databricks-cli
Successfully installed certifi-2021.10.8 charset-normalizer-2.0.12 click-8.1.3 databricks-cli-0.16.6 idna-3.3 oauthlib-3.2.0 pyjwt-2.4.0 requests-2.27.1 six-1.16.0 tabulate-0.8.9 urllib3-1.26.9
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.8.12/x64/bin/python -m pip install --upgrade pip' command.
Aborted!
Databricks Host (should begin with https://): 
Error: Process completed with exit code 1.

从命令中删除databricks configure --token-这不是必需的。在这种情况下,Databricks CLI将使用环境变量。在这里查看Azure DevOps的工作管道。

我认为直接调用api而不使用客户端效果最好。下面是来自azure devops的代码。也可以用于github操作。

import requests
import sys
from adal import AuthenticationContext
user_parameters = {
"tenant" : "$(SP_TENANT_ID)",
"client_id" : "$(SP-CLIENT-ID)", 
"redirect_uri" : "http://localhost",
"client_secret": "$(SP-CLIENT-SECRET)"   
}

authority_host_url = "https://login.microsoftonline.com/"
azure_databricks_resource_id = "put_here"
authority_url = authority_host_url + user_parameters['tenant']

# supply the refresh_token (whose default lifetime is 90 days or longer [token lifetime])
def refresh_access_token(refresh_token):
context = AuthenticationContext(authority_url)
# function link
token_response = context.acquire_token_with_refresh_token(
refresh_token,
user_parameters['client_id'],
azure_databricks_resource_id,
user_parameters['client_secret'])

# the new 'refreshToken' and  'accessToken' will be returned
return (token_response['refreshToken'], token_response['accessToken'])

(refresh_token, access_token) = refresh_access_token("$(AAD-REFRESH-TOKEN)")
print('##vso[task.setvariable variable=ACCESS_TOKEN;]%s' % (access_token))
- bash: |
# Write your commands here

echo 'Patching Repo $(DB_WORKSPACE_HOST/$(REPO_ID)'
# Update the repo to the given tag

echo 'https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) $(Build.SourceBranchName)'

curl -n -X PATCH -o "/tmp/db_patch-out.json" https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) 
-H 'Authorization: Bearer $(ACCESS_TOKEN)' 
-d '{"branch": "$(Build.SourceBranchName)"}'
cat "/tmp/db_patch-out.json"
grep -v error_code "/tmp/db_patch-out.json"
displayName: 'Update DataBricks Repo'

如果有网络连接到你的git提供程序的databricks,这个工作。如果你在同一个网络上有adf,没有网络连接,你可以1)启动一个api网关来保护和桥接你的网络调用,或者2)你可以对adf做一个异步触发,并通过在azure存储https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger?tabs=data-factory中删除一个文件来调用数据块。或发送电子邮件或其他事件触发。

如果有真正的IP地址限制,虽然上述方法有效,但似乎呼叫我的问题只是CDC证书未正确验证。您可以在本地使用pip-system-certs或通过从浏览器导出证书并指定pem文件来覆盖此设置。

相关内容

  • 没有找到相关文章

最新更新