健壮地检索SHA,并将内容与Git的指责(Python3)对齐



我正在为一个使用git blame检索文件信息的包(Python>=3.5(做贡献。我正在用自定义代码取代GitPython依赖项,这些代码只支持我们实际需要的一小部分功能(并以我们实际所需的形式提供数据(。

我发现git blame -lts最接近我所需要的,即检索文件中每一行的提交SHA和行内容。这给了我像一样的输出

82a3e5021b7131e31fc5b110194a77ebee907955 books/main/docs/index.md  5) Softwareplattform [ILIAS](https://www.ilias.de/), die an zahlreichen

我已经用处理过了

line_pattern = re.compile('(.*?)s.*s*d)(s*.*)')
for line in cmd.stdout():
m = line_pattern.match(line)
if m:
sha = m.group(1)
content = m.group(2).strip()

其工作良好。然而,该软件包的维护人员正确地警告说:"这可能会给特定的用户组带来难以调试的错误。可能需要在多个操作系统和GIT版本中进行大量的单元测试。">

我之所以采用这种方法,是因为我发现git blame --porcelain的输出解析起来有些乏味。

30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 1 1
author XY
author-mail <XY>
author-time 1580742131
author-tz +0100
committer XY
committer-mail <XY>
committer-time 1580742131
committer-tz +0100
summary Stub-Outline-Dateien
filename home/docs/README.md
hero: abcdefghijklmnopqrstuvwxyz
82a3e5021b7131e31fc5b110194a77ebee907955 18 18
82a3e5021b7131e31fc5b110194a77ebee907955 19 19
---
82a3e5021b7131e31fc5b110194a77ebee907955 20 20
...

我不喜欢那种对字符串列表的迭代所涉及的内务管理。

我的问题是:

1( 我是否应该更好地使用--porcelain输出,因为它明确用于机器消耗?2( 我能指望这种格式在Git版本和操作系统上是健壮的吗?我可以假设以TAB字符开头的一行是内容行,这是源行输出的最后一行,并且该选项卡之后的任何内容都是原始行内容吗?

不知道这是否是最好的解决方案,我在这里没有等待答案就尝试了一下。我假定我的两个问题的答案是"是"。

下面的代码可以在这里的上下文中看到:https://github.com/uliska/mkdocs-git-authors-plugin/blob/6f5822c641452cea3edb82c2bbb9ed63bd254d2e/mkdocs_git_authors_plugin/repo.py#L466-L565

def _process_git_blame(self):
"""
Execute git blame and parse the results.
This retrieves all data we need, also for the Commit object.
Each line will be associated with a Commit object and counted
to its author's "account".
Whether empty lines are counted is determined by the
count_empty_lines configuration option.
git blame --porcelain will produce output like the following
for each line in a file:
When a commit is first seen in that file:
30ed8daf1c48e4a7302de23b6ed262ab13122d31 1 2 1
author John Doe
author-mail <j.doe@example.com>
author-time 1580742131
author-tz +0100
committer John Doe
committer-mail <j.doe@example.com>
committer-time 1580742131
summary Fancy commit message title
filename home/docs/README.md
line content (indicated by TAB. May be empty after that)
When a commit has already been seen *in that file*:
82a3e5021b7131e31fc5b110194a77ebee907955 4 5
line content
In this case the metadata is not repeated, but it is guaranteed that
a Commit object with that SHA has already been created so we don't
need that information anymore.
When a line has not been committed yet:
0000000000000000000000000000000000000000 1 1 1
author Not Committed Yet
author-mail <not.committed.yet>
author-time 1583342617
author-tz +0100
committer Not Committed Yet
committer-mail <not.committed.yet>
committer-time 1583342617
committer-tz +0100
summary Version of books/main/docs/index.md from books/main/docs/index.md
previous 1f0c3455841488fe0f010e5f56226026b5c5d0b3 books/main/docs/index.md
filename books/main/docs/index.md
uncommitted line content
In this case exactly one Commit object with the special SHA and fake
author will be created and counted.
Args:
---
Returns:
--- (this method works through side effects)
"""
re_sha = re.compile('^w{40}')
cmd = GitCommand('blame', ['--porcelain', str(self._path)])
cmd.run()
commit_data = {}
for line in cmd.stdout():
key = line.split(' ')[0]
m = re_sha.match(key)
if m:
commit_data = {
'sha': key
}
elif key in [
'author',
'author-mail',
'author-time',
'author-tz',
'summary'
]:
commit_data[key] = line[len(key)+1:]
elif line.startswith('t'):
# assign the line to a commit
# and create the Commit object if necessary
commit = self.repo().get_commit(
commit_data.get('sha'),
# The following values are guaranteed to be present
# when a commit is seen for the first time,
# so they can be used for creating a Commit object.
author_name=commit_data.get('author'),
author_email=commit_data.get('author-mail'),
author_time=commit_data.get('author-time'),
author_tz=commit_data.get('author-tz'),
summary=commit_data.get('summary')
)
if len(line) > 1 or self.repo().config('count_empty_lines'):
author = commit.author()
if author not in self._authors:
self._authors.append(author)
author.add_lines(self, commit)
self.add_total_lines()
self.repo().add_total_lines()

相关内容

  • 没有找到相关文章

最新更新