我想用Python从文本中提取IBAN号码。这里的挑战是,IBAN本身可以用数字之间的空格以多种方式书写,我发现很难将其转换为有用的正则表达式模式。
我写了一个演示版本,它试图从文本匹配所有德国和奥地利的IBAN号码。
^DE([0-9a-zA-Z]s?){20}$
我在stackoverflow上看到过类似的问题。然而,结合不同的方式来写IBAN号码,并从文本中提取这些数字,使得解决我的问题变得非常困难。
希望你能帮我!
一般来说,匹配德国和奥地利的IBAN代码,可以使用
codes = re.findall(r'b(DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18})b(?!s*[0-9])', text)
:
-
b
-字边界 -
(DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18})
-组1:DE
和20个重复的数字之间任意数量的空白,或AT
和18个重复的单个数字最终用任意数量的空白分隔 -
b(?!s*[0-9])
-字边界后面没有紧跟着0个或多个空格和一个ASCII数字。
查看这个正则表达式演示。
对于您在问题中显示的包含不正确IBAN代码的数据,可以使用
b(?:DE|AT)(?:s?[0-9a-zA-Z]){18}(?:(?:s?[0-9a-zA-Z]){2})?b
参见regex演示。细节:
-
b
-字边界 -
(?:DE|AT)
-DE
或AT
-
(?:s?[0-9a-zA-Z]){18}
-可选的空格出现18次,然后是字母数字字符 -
(?:(?:s?[0-9a-zA-Z]){2})?
-一个可选空格和一个字母数字字符的两个序列的可选出现 -
b
-字边界。
假设在self类中使用这种验证。输入作为输入字符串,使用以下代码。虽然如果你只想验证德国和奥地利的IBAN,我建议从字典中删除所有其他国家:
country_dic = {
"AL": [28, "Albania"],
"AD": [24, "Andorra"],
"AT": [20, "Austria"],
"BE": [16, "Belgium"],
"BA": [20, "Bosnia"],
"BG": [22, "Bulgaria"],
"HR": [21, "Croatia"],
"CY": [28, "Cyprus"],
"CZ": [24, "Czech Republic"],
"DK": [18, "Denmark"],
"EE": [20, "Estonia"],
"FO": [18, "Faroe Islands"],
"FI": [18, "Finland"],
"FR": [27, "France"],
"DE": [22, "Germany"],
"GI": [23, "Gibraltar"],
"GR": [27, "Greece"],
"GL": [18, "Greenland"],
"HU": [28, "Hungary"],
"IS": [26, "Iceland"],
"IE": [22, "Ireland"],
"IL": [23, "Israel"],
"IT": [27, "Italy"],
"LV": [21, "Latvia"],
"LI": [21, "Liechtenstein"],
"LT": [20, "Lithuania"],
"LU": [20, "Luxembourg"],
"MK": [19, "Macedonia"],
"MT": [31, "Malta"],
"MU": [30, "Mauritius"],
"MC": [27, "Monaco"],
"ME": [22, "Montenegro"],
"NL": [18, "Netherlands"],
"NO": [15, "Northern Ireland"],
"PO": [28, "Poland"],
"PT": [25, "Portugal"],
"RO": [24, "Romania"],
"SM": [27, "San Marino"],
"SA": [24, "Saudi Arabia"],
"RS": [22, "Serbia"],
"SK": [24, "Slovakia"],
"SI": [19, "Slovenia"],
"ES": [24, "Spain"],
"SE": [24, "Sweden"],
"CH": [21, "Switzerland"],
"TR": [26, "Turkey"],
"TN": [24, "Tunisia"],
"GB": [22, "United Kingdom"]
} # dictionary with IBAN-length per country-code
def eval_iban(self):
# Evaluates how many IBAN's are found in the input string
try:
if self.input:
hits = 0
for word in self.input.upper().split():
iban = word.strip()
letter_dic = {ord(d): str(i) for i, d in enumerate(
string.digits + string.ascii_uppercase)} # Matches letter to number for 97-proof method
correct_length = country_dic[iban[:2]]
if len(iban) == correct_length[0]: # checks whether country-code matches IBAN-length
if int((iban[4:] + iban[:4]).translate(letter_dic)) % 97 == 1:
# checks whether converted letters to numbers result in 1 when divided by 97
# this validates the IBAN
hits += 1
return hits
return 0
except KeyError:
return 0
except Exception:
# logging.exception('Could not evaluate IBAN')
return 0
(? & lt; =(?我)伊班人。)CH w {19}如果iban不在chain ==>CH w {19}