出于兴趣,我想将视频持续时间从YouTubes ISO 8601
转换为秒。为了将来证明我的解决方案,我选择了一个很长的视频来测试它。
API在其持续时间内提供此功能- "duration": "P1W2DT6H21M32S"
我试着用dateutil
解析这个持续时间,如stackoverflow.com/questions/969285所建议的。
import dateutil.parser
duration = = dateutil.parser.parse('P1W2DT6H21M32S')
抛出异常
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
我错过了什么?
Python的内置dateutil模块只支持解析ISO 8601日期,不支持解析ISO 8601持续时间。为此,您可以使用"isodate"。库(在pypi中:https://pypi.python.org/pypi/isodate——通过PIP或easy_install安装)。这个库完全支持ISO 8601持续时间,将它们转换为日期时间。timedelta对象。因此,一旦导入了库,就像这样简单:
import isodate
dur = isodate.parse_duration('P1W2DT6H21M32S')
print(dur.total_seconds())
适用于python 2.7+。这是Youtube v3问题的JavaScript一行代码。
import re
def YTDurationToSeconds(duration):
match = re.match('PT(d+H)?(d+M)?(d+S)?', duration).groups()
hours = _js_parseInt(match[0]) if match[0] else 0
minutes = _js_parseInt(match[1]) if match[1] else 0
seconds = _js_parseInt(match[2]) if match[2] else 0
return hours * 3600 + minutes * 60 + seconds
# js-like parseInt
# https://gist.github.com/douglasmiranda/2174255
def _js_parseInt(string):
return int(''.join([x for x in string if x.isdigit()]))
# example output
YTDurationToSeconds(u'PT15M33S')
# 933
处理iso8061持续时间格式,以扩展Youtube使用长达小时
这是我的答案,它需要9000的正则表达式解决方案(谢谢你-对正则表达式的惊人掌握!)并完成了原始海报的YouTube用例的工作,即将小时,分钟和秒转换为秒。我使用.groups()
而不是.groupdict()
,后面跟着几个精心构造的列表推导。
import re
def yt_time(duration="P1W2DT6H21M32S"):
"""
Converts YouTube duration (ISO 8061)
into Seconds
see http://en.wikipedia.org/wiki/ISO_8601#Durations
"""
ISO_8601 = re.compile(
'P' # designates a period
'(?:(?P<years>d+)Y)?' # years
'(?:(?P<months>d+)M)?' # months
'(?:(?P<weeks>d+)W)?' # weeks
'(?:(?P<days>d+)D)?' # days
'(?:T' # time part must begin with a T
'(?:(?P<hours>d+)H)?' # hours
'(?:(?P<minutes>d+)M)?' # minutes
'(?:(?P<seconds>d+)S)?' # seconds
')?') # end of time part
# Convert regex matches into a short list of time units
units = list(ISO_8601.match(duration).groups()[-3:])
# Put list in ascending order & remove 'None' types
units = list(reversed([int(x) if x != None else 0 for x in units]))
# Do the maths
return sum([x*60**units.index(x) for x in units])
很抱歉没有张贴更高-这里仍然是新的,没有足够的声望积分来添加评论。
视频不是1周2天6小时21分32秒吗?
Youtube显示为222小时21分17秒;1 * 7 * 24 + 2 * 24 + 6 = 222。我不知道17秒和32秒的差异从何而来;也可能是舍入误差。
在我看来,为此编写解析器并不难。不幸的是,dateutil
似乎不能解析时间间隔,只能解析日期时间点。
更新:
我看到有一个用于此的包,但只是作为regexp功能强大,简洁和难以理解的语法的示例,这里为您提供一个解析器:
import re
# see http://en.wikipedia.org/wiki/ISO_8601#Durations
ISO_8601_period_rx = re.compile(
'P' # designates a period
'(?:(?P<years>d+)Y)?' # years
'(?:(?P<months>d+)M)?' # months
'(?:(?P<weeks>d+)W)?' # weeks
'(?:(?P<days>d+)D)?' # days
'(?:T' # time part must begin with a T
'(?:(?P<hours>d+)H)?' # hourss
'(?:(?P<minutes>d+)M)?' # minutes
'(?:(?P<seconds>d+)S)?' # seconds
')?' # end of time part
)
from pprint import pprint
pprint(ISO_8601_period_rx.match('P1W2DT6H21M32S').groupdict())
# {'days': '2',
# 'hours': '6',
# 'minutes': '21',
# 'months': None,
# 'seconds': '32',
# 'weeks': '1',
# 'years': None}
我故意没有从这些数据中计算确切的秒数。它看起来微不足道(见上文),但实际上并非如此。例如,从1月1日开始,2个月的距离是58天(30+28)或59天(30+29),具体取决于年份,而从3月1日开始,总是61天。适当的日历实施应该考虑到所有这些因素;对于Youtube剪辑长度的计算,它必须是过量的。
每次解析输入字符串1个字符,如果字符是数字,它只是将其添加(字符串添加,而不是数学添加)到正在解析的当前值。如果它是'wdhms'之一,则将当前值分配给适当的变量(周,日,小时,分钟,秒),然后将值重置为准备接受下一个值。最后,将5个解析值的秒数相加。
def ytDurationToSeconds(duration): #eg P1W2DT6H21M32S
week = 0
day = 0
hour = 0
min = 0
sec = 0
duration = duration.lower()
value = ''
for c in duration:
if c.isdigit():
value += c
continue
elif c == 'p':
pass
elif c == 't':
pass
elif c == 'w':
week = int(value) * 604800
elif c == 'd':
day = int(value) * 86400
elif c == 'h':
hour = int(value) * 3600
elif c == 'm':
min = int(value) * 60
elif c == 's':
sec = int(value)
value = ''
return week + day + hour + min + sec
所以这就是我想到的-一个自定义解析器来解释时间:
def durationToSeconds(duration):
"""
duration - ISO 8601 time format
examples :
'P1W2DT6H21M32S' - 1 week, 2 days, 6 hours, 21 mins, 32 secs,
'PT7M15S' - 7 mins, 15 secs
"""
split = duration.split('T')
period = split[0]
time = split[1]
timeD = {}
# days & weeks
if len(period) > 1:
timeD['days'] = int(period[-2:-1])
if len(period) > 3:
timeD['weeks'] = int(period[:-3].replace('P', ''))
# hours, minutes & seconds
if len(time.split('H')) > 1:
timeD['hours'] = int(time.split('H')[0])
time = time.split('H')[1]
if len(time.split('M')) > 1:
timeD['minutes'] = int(time.split('M')[0])
time = time.split('M')[1]
if len(time.split('S')) > 1:
timeD['seconds'] = int(time.split('S')[0])
# convert to seconds
timeS = timeD.get('weeks', 0) * (7*24*60*60) +
timeD.get('days', 0) * (24*60*60) +
timeD.get('hours', 0) * (60*60) +
timeD.get('minutes', 0) * (60) +
timeD.get('seconds', 0)
return timeS
现在它可能是超级不酷等等,但它是有效的,所以我分享,因为我关心你们的人
延伸9000的答案,显然Youtube的格式使用周,而不是月,这意味着总秒数可以很容易地计算出来。
这里没有使用命名组,因为我最初需要它与PySpark一起工作。
from operator import mul
from itertools import accumulate
import re
from typing import Pattern, List
SECONDS_PER_SECOND: int = 1
SECONDS_PER_MINUTE: int = 60
MINUTES_PER_HOUR: int = 60
HOURS_PER_DAY: int = 24
DAYS_PER_WEEK: int = 7
WEEKS_PER_YEAR: int = 52
ISO8601_PATTERN: Pattern = re.compile(
r"P(?:(d+)Y)?(?:(d+)W)?(?:(d+)D)?"
r"T(?:(d+)H)?(?:(d+)M)?(?:(d+)S)?"
)
def extract_total_seconds_from_ISO8601(iso8601_duration: str) -> int:
"""Compute duration in seconds from a Youtube ISO8601 duration format. """
MULTIPLIERS: List[int] = (
SECONDS_PER_SECOND, SECONDS_PER_MINUTE, MINUTES_PER_HOUR,
HOURS_PER_DAY, DAYS_PER_WEEK, WEEKS_PER_YEAR
)
groups: List[int] = [int(g) if g is not None else 0 for g in
ISO8601_PATTERN.match(iso8601_duration).groups()]
return sum(g * multiplier for g, multiplier in
zip(reversed(groups), accumulate(MULTIPLIERS, mul)))
延伸StanleyZheng的回答…不需要_js_parseInt函数
import re
def YTDurationToSeconds(duration):
match = re.match('PT((d+)H)?((d+)M)?((d+)S)?', duration).groups()
hours = int(match[1]) if match[1] else 0
minutes = int(match[3]) if match[3] else 0
seconds = int(match[5]) if match[5] else 0
return hours * 3600 + minutes * 60 + seconds
# example output
YTDurationToSeconds('PT15M33S')
# 933