如何在包含特殊字符的文本中提取n位数字



我有一个充满正则表达式的文本,我想提取有4位数字的数字,

mytext ="""A text including special characters like 1000+(100)=1100 """
numbers = []
seperators=[
'(', ')',  '[',  ']',  '{',  '}',   ';',   ':',  '=', '+',   '-',  '/', '*', '&', '%', '$',  '@', '#',   '^',   '*',  '~',  '`', '"',  '>',   '|',   '\', '?',  '.',  '<',  "'"]

如何使用split函数提取数字?

for word in mytext2.split(seperators):

if word.isdigit():
numbers.append(int(word))

#print(numbers)
for mynumbers in numbers:
if mynumbers >999 and 10000>mynumbers: #for 4 digits
print(mynumbers)

#this should print all the 4 digit numbers
text = "A text including special characters like 1000+(100)=1100 "
import re
numbers = [int(number) for number in re.findall(r'bd{4}b', text)]
print(numbers)
# Outputs [1000, 1001]
mytext ="""Alain Fabien Maurice Marcel Delon (French: [al d l ] ɛ̃ ə ɔ̃; born 8 November 1935) is a French actor and businessman. He is known as
one of Europe's most prominent actors and screen sex symbols from the 1960s and 1970s. He achieved critical acclaim for roles in
films such as Rocco and His Brothers (1960), Plein Soleil (1960), L'Eclisse (1962), The Leopard (1963), The Yellow Rolls-
Royce (1965), Lost Command (1966), and Le Samouraï (1967). Over the course of his career Delon worked with many wellknown directors, including Luchino Visconti, Jean-Luc Godard, Jean-Pierre Melville, Michelangelo Antonioni, and Louis Malle. He
acquired Swiss citizenship in 1999"""

numbers = []
seperators=['#','(',')','$','%','^','&','*','+']
mytext2=mytext
mytext2=mytext2.replace('(',' ' )
mytext2=mytext2.replace(')',' ' )
mytext2=mytext2.replace('[',' ' )
mytext2=mytext2.replace(']',' ' )
mytext2=mytext2.replace('{',' ' )
mytext2=mytext2.replace('}',' ' )
mytext2=mytext2.replace(';',' ' )
mytext2=mytext2.replace(':',' ' )
mytext2=mytext2.replace('=',' ' )
mytext2=mytext2.replace('+',' ' )
mytext2=mytext2.replace('-',' ' )
mytext2=mytext2.replace('/',' ' )
mytext2=mytext2.replace('*',' ' )
mytext2=mytext2.replace('&',' ' )
mytext2=mytext2.replace('%',' ' )
mytext2=mytext2.replace('$',' ' )
mytext2=mytext2.replace('@',' ' )
mytext2=mytext2.replace('#',' ' )
mytext2=mytext2.replace('^',' ' )
mytext2=mytext2.replace('*',' ' )
mytext2=mytext2.replace('~',' ' )
mytext2=mytext2.replace('`',' ' )
mytext2=mytext2.replace('"',' ' )
mytext2=mytext2.replace('>',' ' )
mytext2=mytext2.replace('|',' ' )
mytext2=mytext2.replace('\',' ' )
mytext2=mytext2.replace('?',' ' )
mytext2=mytext2.replace('.',' ' )
mytext2=mytext2.replace('<',' ' )
mytext2=mytext2.replace("'",' ' )
#print(mytext2)
for word in mytext2.split():

if word.isdigit():
numbers.append(int(word))

#print(numbers)
for mynumbers in numbers:
if mynumbers >999 and 10000>mynumbers:
print(mynumbers)

此代码打印文本中的所有n位数字,如果您的文本中有更多特殊字符,则应将它们添加到要替换的第一部分中。

最新更新