person 表有一个描述列,其中包含 html 格式的文本。
查询应返回每个人的 id 和说明,删除所有 html 格式。
select id, REGEXP_EXTRACT(description, r"REGEX_GOES_HERE") from person;
其中一个描述的示例:
<p class="position-body__description" data-section="pastPositions">• Lorem Ipsum dorem posum <br>This is my test html regex.<br> • Provided general HR support of multiple team members’ development, technical and certification training.<br>• Demonstrated experience in an administrative support role managing multiple requests simultaneously.</p>
预期产出:
1, Lorem Ipsum dorem posum这是我的测试 html 正则表达式。 • 为多个团队成员的发展、技术和认证培训提供一般人力资源支持。• 具有同时管理多个请求的行政支持角色的经验。
下面是 BigQuery Standard SQL
#standardSQL
SELECT id,
REGEXP_REPLACE(description, r'<.*?>', '') description_without_formatting
FROM `project.dataset.person`
您可以使用问题中的虚拟数据进行测试,玩上面,如下例所示
#standardSQL
WITH `project.dataset.person` AS (
SELECT 1 id, '<p class="position-body__description" data-section="pastPositions">• Lorem Ipsum dorem posum <br>This is my test html regex.<br> • Provided general HR support of multiple team members’ development, technical and certification training.<br>• Demonstrated experience in an administrative support role managing multiple requests simultaneously.</p>' description
)
SELECT id,
REGEXP_REPLACE(description, r'<.*?>', '') description_without_formatting
FROM `project.dataset.person`
有结果
Row id description_without_formatting
1 1 • Lorem Ipsum dorem posum This is my test html regex. • Provided general HR support of multiple team members’ development, technical and certification training.• Demonstrated experience in an administrative support role managing multiple requests simultaneously.