使用Python从文本中提取IBAN



我想用Python从文本中提取IBAN号码。这里的挑战是,IBAN本身可以用数字之间的空格以多种方式书写,我发现很难将其转换为有用的正则表达式模式。

我写了一个演示版本,它试图从文本匹配所有德国和奥地利的IBAN号码。

^DE([0-9a-zA-Z]s?){20}$
我在stackoverflow上看到过类似的问题。然而,结合不同的方式来写IBAN号码,并从文本中提取这些数字,使得解决我的问题变得非常困难。

希望你能帮我!

<表类> ISO landcode 验证# 银行# 账户# tbody> <<tr> 德国2 2 n 8 n 10 n 奥地利 2 2 n 5 n 11 n

一般来说,匹配德国和奥地利的IBAN代码,可以使用

codes = re.findall(r'b(DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18})b(?!s*[0-9])', text)

:

  • b -字边界
  • (DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18}) -组1:DE和20个重复的数字之间任意数量的空白,或AT和18个重复的单个数字最终用任意数量的空白分隔
  • b(?!s*[0-9]) -字边界后面没有紧跟着0个或多个空格和一个ASCII数字。

查看这个正则表达式演示。

对于您在问题中显示的包含不正确IBAN代码的数据,可以使用

b(?:DE|AT)(?:s?[0-9a-zA-Z]){18}(?:(?:s?[0-9a-zA-Z]){2})?b

参见regex演示。细节:

  • b -字边界
  • (?:DE|AT) - DEAT
  • (?:s?[0-9a-zA-Z]){18} -可选的空格出现18次,然后是字母数字字符
  • (?:(?:s?[0-9a-zA-Z]){2})? -一个可选空格和一个字母数字字符的两个序列的可选出现
  • b -字边界。

假设在self类中使用这种验证。输入作为输入字符串,使用以下代码。虽然如果你只想验证德国和奥地利的IBAN,我建议从字典中删除所有其他国家:

country_dic = {
                "AL": [28, "Albania"],
                "AD": [24, "Andorra"],
                "AT": [20, "Austria"],
                "BE": [16, "Belgium"],
                "BA": [20, "Bosnia"],
                "BG": [22, "Bulgaria"],
                "HR": [21, "Croatia"],
                "CY": [28, "Cyprus"],
                "CZ": [24, "Czech Republic"],
                "DK": [18, "Denmark"],
                "EE": [20, "Estonia"],
                "FO": [18, "Faroe Islands"],
                "FI": [18, "Finland"],
                "FR": [27, "France"],
                "DE": [22, "Germany"],
                "GI": [23, "Gibraltar"],
                "GR": [27, "Greece"],
                "GL": [18, "Greenland"],
                "HU": [28, "Hungary"],
                "IS": [26, "Iceland"],
                "IE": [22, "Ireland"],
                "IL": [23, "Israel"],
                "IT": [27, "Italy"],
                "LV": [21, "Latvia"],
                "LI": [21, "Liechtenstein"],
                "LT": [20, "Lithuania"],
                "LU": [20, "Luxembourg"],
                "MK": [19, "Macedonia"],
                "MT": [31, "Malta"],
                "MU": [30, "Mauritius"],
                "MC": [27, "Monaco"],
                "ME": [22, "Montenegro"],
                "NL": [18, "Netherlands"],
                "NO": [15, "Northern Ireland"],
                "PO": [28, "Poland"],
                "PT": [25, "Portugal"],
                "RO": [24, "Romania"],
                "SM": [27, "San Marino"],
                "SA": [24, "Saudi Arabia"],
                "RS": [22, "Serbia"],
                "SK": [24, "Slovakia"],
                "SI": [19, "Slovenia"],
                "ES": [24, "Spain"],
                "SE": [24, "Sweden"],
                "CH": [21, "Switzerland"],
                "TR": [26, "Turkey"],
                "TN": [24, "Tunisia"],
                "GB": [22, "United Kingdom"]
        } # dictionary with IBAN-length per country-code
    def eval_iban(self):
        # Evaluates how many IBAN's are found in the input string
        try:
            if self.input:
                hits = 0
                for word in self.input.upper().split():
                    iban = word.strip()
                    letter_dic = {ord(d): str(i) for i, d in enumerate(
                        string.digits + string.ascii_uppercase)} # Matches letter to number for 97-proof method
                    correct_length = country_dic[iban[:2]]
                    if len(iban) == correct_length[0]: # checks whether country-code matches IBAN-length
                        if int((iban[4:] + iban[:4]).translate(letter_dic)) % 97 == 1:
                            # checks whether converted letters to numbers result in 1 when divided by 97
                            # this validates the IBAN
                            hits += 1
                return hits
            return 0
        except KeyError:
            return 0
        except Exception:
             # logging.exception('Could not evaluate IBAN')
            return 0

(? & lt; =(?我)伊班人。)CH w {19}如果iban不在chain ==>CH w {19}

最新更新