我尝试使用gcp代码进行dlp:
代码很容易从这个gcp官方网站文档中找到(除了输入我自己的凭据外,没有做任何更改(:
def deidentify_with_mask(
project, input_str, info_types, replacement_str="REPLACEMENT_STR",
):
"""Uses the Data Loss Prevention API to deidentify sensitive data in a
string by replacing matched input values with a value you specify.
Args:
project: The Google Cloud project id to use as a parent resource.
input_str: The string to deidentify (will be treated as text).
info_types: A list of strings representing info types to look for.
replacement_str: The string to replace all values that match given
info types.
Returns:
None; the response from the API is printed to the terminal.
"""
import google.cloud.dlp
# Instantiate a client
dlp = google.cloud.dlp_v2.DlpServiceClient(credentials=credentials)
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Construct inspect configuration dictionary
inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}
# Construct deidentify configuration dictionary
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"replace_config": {
"new_value": {"string_value": replacement_str}
}
}
}
]
}
}
# Construct item
item = {"value": input_str}
# Call the API
response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": deidentify_config,
"inspect_config": inspect_config,
"item": item,
}
)
# Print out the results.
print(response.item.value)
我收到一个错误:
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "/usr/local/airflow/src/task/src_task.py", line 133, in <lambda>
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task info_types=info_types))
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "/usr/local/airflow/src/task/src_task.py", line 89, in deidentify_with_mask
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task parent = dlp.project_path(project)
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task AttributeError: 'DlpServiceClient' object has no attribute 'project_path'
[2020-09-10 00:18:26,263] {{logging_mixin.py:95}} INFO - [2020-09-10 00:18:26,261] {{jobs.py:2627}} INFO - Task exited with return code 1
我不明白为什么我会收到这个错误,因为当我在本地尝试时,它有效,但在气流中无效。
这个问题已经解决了!
问题是由于版本不兼容。
pip install google-cloud-dlp==1.0.0
上面的代码解决了我的问题。