我们的一些客户提交时间戳，如٢٠١٥-١٠-٠٣١٩：٠١：٤٣谷歌翻译为"03/10/2015 19：01：43"。链接在这里。

如何在 Python 中实现相同的目标？

还有来自 https://pypi.python.org/pypi/Unidecode 的unidecode库。

在 Python 2 中：

>>> from unidecode import unidecode
>>> unidecode(u"۰۱۲۳۴۵۶۷۸۹")
'0123456789'

在 Python 3 中：

>>> from unidecode import unidecode
>>> unidecode("۰۱۲۳۴۵۶۷۸۹")
'0123456789'

要将时间字符串转换为日期时间对象(Python 3(：

>>> import re
>>> from datetime import datetime
>>> datetime(*map(int, re.findall(r'd+', ' ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣')))
datetime.datetime(2015, 10, 3, 19, 1, 43)
>>> str(_)
'2015-10-03 19:01:43'

如果您只需要数字：

>>> list(map(int, re.findall(r'd+', ' ٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣')))
[2015, 10, 3, 19, 1, 43]

我的解决方案在不同的时间戳下失败：u'۲۰۱۵-۱۰-۱۸ ۰۸：۲۲：۱۱'。选择J.F. Sebastian或jimhark的解决方案。

使用ord获取 unicode 代码点。数字从 1632 (0( 开始。

d = u'٢٠١٥-١٠-٠٣ ١٩:٠١:٤٣'
s = []
for c in d:
    o = ord(c)
    print '%s -> %s, %s - 1632 = %s' %(c, o, o, o - 1632)
    if 1631 < o < 1642:
        s.append(str(o - 1632))
        continue
    s.append(c)   
print ''.join(s)
#or as a one liner:
print ''.join([str(ord(c)-1632) if 1631 < ord(c) < 1642 else c for c in d])

下面是 for 循环的输出：

٢ -> 1634, 1634 - 1632 = 2
٠ -> 1632, 1632 - 1632 = 0
١ -> 1633, 1633 - 1632 = 1
٥ -> 1637, 1637 - 1632 = 5
- -> 45, 45 - 1632 = -1587
١ -> 1633, 1633 - 1632 = 1
٠ -> 1632, 1632 - 1632 = 0
- -> 45, 45 - 1632 = -1587
٠ -> 1632, 1632 - 1632 = 0
٣ -> 1635, 1635 - 1632 = 3
  -> 32, 32 - 1632 = -1600
١ -> 1633, 1633 - 1632 = 1
٩ -> 1641, 1641 - 1632 = 9
: -> 58, 58 - 1632 = -1574
٠ -> 1632, 1632 - 1632 = 0
١ -> 1633, 1633 - 1632 = 1
: -> 58, 58 - 1632 = -1574
٤ -> 1636, 1636 - 1632 = 4
٣ -> 1635, 1635 - 1632 = 3
2015-10-03 19:01:43

虽然受到其他一些答案的启发(谢谢@kev(，但我采取了不同的方法。

(嘟！我刚刚注意到@kev也问了这个问题。

您专门询问了阿拉伯字符，但它简化了处理所有 Unicode 数字的过程。

注意：我处理相同的日期字符串，但使用 Unicode 转义序列指定 Unicode 字符，因为这在我的系统上更容易。

import unicodedata
unicodeDate = u'u0662u0660u0661u0665-u0661u0660-u0660u0663 u0661u0669:u0660u0661:u0664u0663'
converted = u''.join([unicode(unicodedata.decimal(c, c)) for c in unicodeDate])
print converted

unicodedata.decimal 的第二个参数是第一个参数未映射到 Unicode 十进制时返回的默认值。为两个参数传递相同字符的效果是，任何 Unicode 十进制都转换为等效的 ASCII 十进制，所有其他字符传递不变。

我的原始答案

converted = ''.join([str(unicodedata.digit(c, c)) for c in unicodeDate])

@J.F. Sebastian，提供了一个有用的注释，指出上面的代码不能正确处理超级脚本，例如u'\u00b2'。同一组中还有上标："\u00b3"，u'\u00b9"。我发现这也会影响以下一些代码点：

上标和下标 (2070–209F(
封闭式字母数字 (2460–24FF(
丁蝙蝠 (2700–27BF(

显然，unicodedata.digit()试图从装饰的数字中提取一个数字，这在这里可能不可取。但unicodedata.decimal似乎它完全符合要求(假设您不想转换装饰数字(。

在 Python 中将阿拉伯字符(东方阿拉伯数字)转换为阿拉伯数字

我的原始答案

相关内容

最新更新

热门标签：