从合流页面的公共url获取其页面id

如何在给定page_url的情况下得到合流page_id。如:

如果这是显示URL: https://confluence.som.yale.edu/display/SC/Finding+the+Page+ID+of+a+Confluence+Page

我想使用Confluence REST API获得它的page_id

更多细节在这里

使用atlassian-python-api吗?

在这种情况下，您可以解析您的url以获得合流空间(SC)和页面标题(Finding the Page ID of a Confluence Page)，然后使用confluence.get_page_id(space, title)。

from atlassian import Confluence
page_url = "https://confluence.som.yale.edu/display/SC/Finding+the+Page+ID+of+a+Confluence+Page"
confluence = Confluence(
url='https://confluence.som.yale.edu/',
username=user,
password=pwd)

space, title = page_url.split("/")[-2:]
title = title.replace("+", " ")
page_id = confluence.get_page_id(space, title)

请注意，当你的标题包含一个特殊字符(+或ü,ä…)，你的页面url将已经包含这样的id:https://confluence.som.yale.edu/pages/viewpage.action?pageId=1234567890，所以你可能要先检查它。

编辑:这是你的函数的版本:

from atlassian import Confluence
import re
import urllib
# regex pattern to match pageId if already in url
page_id_in_url_pattern = re.compile(r"?pageId=(d+)")
def get_page_id_from_url(confluence, url):
page_url = urllib.parse.unquote(url) #unquoting url to deal with special characters like '%'
space, title = page_url.split("/")[-2:]
if re.search(page_id_in_url_pattern, title):
return re.search(page_id_in_url_pattern, title).group(1)

else:
title = title.replace("+", " ")
return confluence.get_page_id(space, title)

if __name__ == "__main__":
from getpass import getpass
user = input('Login: ')
pwd = getpass('Password: ')
page_url = "https://confluence.som.yale.edu/display/SC/Finding+the+Page+ID+of+a+Confluence+Page"
confluence = Confluence(
url='https://confluence.som.yale.edu/',
username=user,
password=pwd)
print(get_page_id_from_url(confluence, page_url))

不幸的是，Atlassian Python客户端非常有限(例如，无法从页面共享链接访问微小的url)。如果你得到一个API密钥，你可以假设做一些REST调用来下载任意页面的标题，并从中获取pageId。

如果你正在使用Python客户端，但是，你只能从URL中获得pageId，如果它是1)已经在URL中，或者2)你在URL中有一个正确格式的空格键和标题。

在Tranbi提供的函数的基础上，这里是一个改进的函数，它试图从URL中提取pageId。不过，请注意，它并不是对所有的url都有效。

from atlassian import Confluence
import re
import urllib

CONFLUENCE_HOSTNAME = 'confluence.som.yale.edu'
PAGEID_RE = re.compile(r"pageId=(d+)")
SPACEKEY_RE = re.compile(r"spaceKey=([a-zA-Z0-9~]+)")
TITLE_RE = re.compile(r"title=([^#&=]+)")

def get_pageid_from_url(client, raw_url):
scheme, netloc, path, params, query, fragment = urllib.parse.urlparse(raw_url)
if netloc != CONFLUENCE_HOSTNAME:
raise ValueError(f"Only Confluence URLs are supported in this script. You supplied a URL with netloc={netloc}")
# Special handling for login redirect URLs
fix_title = False
if path == '/login.action':
fix_title = True
pretty_url = urllib.parse.unquote(raw_url) #unquoting url to deal with special characters like '%'
scheme, netloc, path, params, query, fragment = urllib.parse.urlparse(pretty_url)
# Get the pageId directly from the URL if available
pageid_match = re.search(PAGEID_RE, query)
if pageid_match:
return pageid_match.group(1)
# Otherwise, get the spaceKey and title from the URL, and then make a separate call to the API to get the pageId
if path.startswith('/display/'):
path_pieces = path.split('/')
assert len(path_pieces) == 4, f"Expected 4 forward-slashes in path, but found {path}"
_, _, space, title = path_pieces
# Fix title; order of operations matters!
title = title.replace("+", " ")
title = urllib.parse.unquote(title, encoding='utf-8', errors='replace')
return client.get_page_id(space, title)
spacekey_match = re.search(SPACEKEY_RE, query)
title_match = re.search(TITLE_RE, query)
if spacekey_match and title_match:
space = spacekey_match.group(1)
# Fix title; order of operations matters!
title = title_match.group(1)
title = title.replace("+", " ")
if fix_title:
title = title.split(' - ')[0]
title = urllib.parse.unquote(title, encoding='utf-8', errors='replace')
return client.get_page_id(space, title)
# Unfortunately this URL style is not supported by the Python Atlassian client :(
raise ValueError(f"Cannot parse (pageId) or (spaceKey and title) from URL: {raw_url}")

相关内容

最新更新

热门标签：