GCP Dataflow Kafka和丢失的SSL证书



我正在尝试使用GCP Dataflow从Kafka获取数据到Bigquery。我的Dataflow模板是基于Python SDK 2.42 + Container注册表+ apache_beam.io.kafka.

这是我的管道:

def run(
        bq_dataset,
        bq_table_name,
        project,
        pipeline_options
        ):
    with Pipeline(options=pipeline_options) as pipeline:
        kafka = pipeline | ReadFromKafka(
            consumer_config={
                'bootstrap.servers': 'remote.kafka.aws',
                'security.protocol': "SSL",
                'ssl.truststore.location': "/usr/lib/jvm/java-11-openjdk-amd64/lib/security/cacerts",
                'ssl.truststore.password': "changeit",
                'ssl.keystore.location': "/opt/apache/beam/kafka.keystore.jks",
                'ssl.keystore.password': "kafka",
                "ssl.key.password": "kafka",
                "ssl.client.auth": "required"
            },
            topics=["mytopic"]
        )
        kafka | beam.io.WriteToBigQuery(bq_table_name, bq_dataset, project)

if __name__ == "__main__":
    logger = get_logger('beam-kafka')
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--bq_dataset',
        type=str,
        default='',
        help='BigQuery Dataset to write tables to. '
             'If set, export data to a BigQuery table instead of just logging. '
             'Must already exist.')
    parser.add_argument(
        '--bq_table_name',
        default='',
        help='The BigQuery table name. Should not already exist.')
    known_args, pipeline_args = parser.parse_known_args()
    pipeline_options = PipelineOptions(
        pipeline_args, save_main_session=True, streaming=True)
    project = pipeline_options.view_as(GoogleCloudOptions).project
    if project is None:
        parser.print_usage()
        print(sys.argv[0] + ': error: argument --project is required')
        sys.exit(1)
    run(
        known_args.bq_dataset,
        known_args.bq_table_name,
        project,
        pipeline_options
    )

下面是我如何执行和运行这个管道:

python stream_kafka.py                                                                   
    --bq_dataset=test_ds 
    --bq_table_name=test_topic_data 
    --project=xxxx 
    --region=us-east4 
    --runner=DataflowRunner 
    --experiments=use_runner_v2 
    --sdk_container_image=$IMAGE 
    --job_name="test_kafka" 
    --no_use_public_ips 
    --disk_size_gb=100 

我添加到Dockerfile中的所有证书:

COPY --chmod=0755 truststore.der /etc/ssl/certs/truststore.der
COPY --chmod=0755 kafka.keystore.p12   /opt/apache/beam/kafka.keystore.p12
RUN keytool -import -trustcacerts -file truststore.der -keystore $JAVA_HOME/lib/security/cacerts -alias kafka 
        -deststorepass changeit -noprompt
RUN keytool -importkeystore -srckeystore kafka.keystore.p12 
                        -srcstorepass kafka 
                        -srcstoretype pkcs12 
                        -destkeystore /opt/apache/beam/kafka.keystore.jks 
                        -deststorepass kafka 
                        -keypass kafka 
                        -deststoretype jks

问题是当我试图运行数据流时,它找不到kafka.keystore.jks:

org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:69) ... 43 more Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /opt/apache/beam/kafka.keystore.jks of type JKS org.apache.kafka.common.security.ssl.SslEngineBuilder$SecurityStore.load(SslEngineBuilder.java:292) org.apache.kafka.common.security.ssl.SslEngineBuilder.createSSLContext(SslEngineBuilder.java:144) ... 46 more Caused by: java.nio.file.NoSuchFileException: /opt/apache/beam/kafka.keystore.jks java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)

我找到了解决方案。应该将证书摄取到Java SDK中,而不是Python中。所以,我创建了另一个docker映像,但基于Java SDK:

FROM openjdk:11
COPY --from=apache/beam_java11_sdk:2.42.0 /opt/apache/beam /opt/apache/beam
COPY ./ca.txt /usr/src/ca.txt
COPY ./cert.txt /usr/src/cert.txt
COPY ./key.txt /usr/src/key.txt
ENV CA_CERTS="/usr/local/openjdk-11/lib/security/cacerts" 
ENV ROOT_FILE=/usr/src/ca.txt
ENV CERT_FILE=/usr/src/cert.txt
ENV KEY_FILE=/usr/src/key.txt
COPY ./entrypoint.sh /scripts/entrypoint.sh
RUN chmod +x /scripts/entrypoint.sh
ENTRYPOINT [ "/scripts/entrypoint.sh" ]

之后,我在entrypoint.sh文件中实现了将我的证书转换为Java格式(JKS)。并且在运行数据流时使用一个附加参数来覆盖Java(线束)映像:--sdk_harness_container_image_overrides=".*java.*,${IMAGE_JAVA}"

希望对大家有所帮助。

相关内容

  • 没有找到相关文章

最新更新