获取远程 Git 存储库中前 N 个提交的元数据



使用以下GitHub API,可以在存储库中获取提交的元数据,按从最新到最旧的顺序排列。

https://api.github.com/repos/git/git/commits

有没有办法获取类似的元数据,但按提交的相反时间顺序,即从存储库中最早的提交开始?

注意:我想获取此类元数据,而无需下载完整的存储库。

谢谢

这可以使用使用 GraphQL API 的解决方法来实现。此方法与在存储库中获取第一次提交本质上相同:

获取最后一次提交并返回totalCountendCursor

{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}

它为光标和pageInfo对象返回类似的东西:

"totalCount": 950329,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

我没有关于游标字符串格式的任何来源b961f8dc8976c091180839f4483d67b7c2ca2578 0但我已经用其他一些超过 1000 次提交的存储库进行了测试,似乎它总是这样格式化:

<static hash> <incremented_number>

为了从第一次提交迭代到最新提交,您需要从第 1 页开始totalCount - 1 - <number_perpage>*<page>

例如,为了从 linux 存储库中获取前 20 个提交:

{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 20, after: "fc4f28bb3daf3265d6bc5f73b497306985bb23ab 950308") {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}

请注意,此存储库中的提交总计数随时间而变化,因此需要在运行查询之前获取总计数值。

下面是一个 python 示例,它迭代了 Linux 存储库的前 300 次提交(从最旧的开始):

import requests
token = "YOUR_ACCESS_TOKEN"
name = "linux"
owner = "torvalds"
branch = "master"
iteration = 3
per_page = 100
commits = []
query = """
query ($name: String!, $owner: String!, $branch: String!){
repository(name: $name, owner: $owner) {
ref(qualifiedName: $branch) {
target {
... on Commit {
history(first: %s, after: %s) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
"""
def getHistory(cursor):
r = requests.post("https://api.github.com/graphql",
headers = {
"Authorization": f"Bearer {token}"
},
json = {
"query": query % (per_page, cursor),
"variables": {
"name": name,
"owner": owner,
"branch": branch
}
})
return r.json()["data"]["repository"]["ref"]["target"]["history"]
#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
cursor = history["pageInfo"]["endCursor"].split(" ")
for i in range(1, iteration + 1):
cursor[1] = str(totalCount - 1 - i*per_page)
history = getHistory(f""{' '.join(cursor)}"")
commits += history["nodes"][::-1]
else:
commits = history["nodes"]
print(commits)

最新更新