Access中的复杂文本分析



一两年前,我非常熟练地使用ArcGIS中的数据模型来处理属性表。ArcGIS使用Access类型的数据库,用户可以添加、删除和修改字段。在ArcGIS界面中,我可以创建一个新列,并使用Python在该列上运行一个名为"字段计算器"的脚本。例如,下面的算法将"23004街"之类的字符串作为输入,并将其转换为"2304街"。它还进行了一些其他适合该项目的替换。

def calc(f1,f2,f3):
#where f1 is address number, f2 is address name and f3 is address suffix
#Strip trailing and ending spaces from all fields 
    #Define list
    remove_list = ['01ST', '02ND', '03RD', '04TH', '05TH', '06TH', '07TH', '08TH', '09TH']
    #Homogenize single digit address names
    if f2 in remove_list:
        f2 = f2.replace('0','')
    else:
        f2 = f2
    f1 = str(f1).replace('.0','')
    #strip leading and trailing spaces
    f1 = f1.strip()
    f2 = f2.strip()
    f3 = f3.strip()     
    #adapt address number to string format
    #Concatenate full address name
    x = f1+' '+f2+' '+f3
    #Return Full address name
    return x

我希望Access也有类似的灵活性。Access中最接近ArcGIS"字段计算器"的是"表达式生成器"。这个表达式构建器可以进行简单的修改,但对于更复杂的字符串解析算法来说,使用起来相当麻烦。Access中有任何东西为单个字段提供这种脚本吗?我们可以在表达式生成器中使用VBA或其他语言吗?也许只有SQL才是解决方案?

如果您有完整的UI.exe程序,用户定义的函数可以集成在MS Access中。只需在Module对象中编写函数脚本,然后在VBA或SQL:中调用它

Public Function Calc(f1 As String, f2 As String, f3 As String) As String
    ' where f1 is address number, f2 is address name and f3 is address suffix '
    Dim remove_list() As Variant
    Dim r As Variant
    Dim x As String
    ' Define list '
    remove_list = Array("01ST", "02ND", "03RD", "04TH", "05TH", _
                        "06TH", "07TH", "08TH", "09TH")
    ' Homogenize single digit address names '
    For Each r In remove_list
        If f2 Like "*r*" Then
            f2 = Replace(f2, r, Right(r, 3))
        Else
            f2 = f2
        End If
    Next r
    f1 = Replace(f1, ".0", "")
    ' strip leading and trailing spaces '
    f1 = Trim(f1)
    f2 = Trim(f2)
    f3 = Trim(f3)
    ' Concatenate full address name '
    x = f1 & " " & f2 & " " & f3
    ' Return Full address name '
    Calc = x
End Function

VBA

Public Sub CleanAddress()
    Dim fullAddress As String
    Dim db As Database, rst As Recordset
    Set db = CurrentDb
    Set rst = db.OpenRecordset("TableName")
    If rst.RecordCount = 0 Then Exit Sub
    Do While Not rst.EOF
        fullAddress = Calc(rst!f1, rst!f2, rst!f3)
        rst.MoveNext
    Loop
    rst.Close
    Set rst = Nothing
    Set db = Nothing
End Sub

SQL

SELECT t.*, Calc(t.f1, t.f2, t.f3) As fullAddress
FROM TableName t

由于函数并不太复杂,您可以使用嵌套的字符串函数在SQL中处理所有这些:

SQL

SELECT t.*,
    Trim(t.f1, Replace(t.f1, '.0', '')) & 
    Replace(
        Replace(
            Replace(
                Replace(
                    Replace(
                        Replace(
                            Replace(
                                Replace(
                                    Replace(Trim(t.f2), '01ST', '1ST'), 
                                '02ND', '2ND'), 
                            '03RD', '3RD'),
                        '04TH', '4TH'),
                    '05TH', '5TH'),
               '06TH', '6TH'),
            '07TH', '7TH'),
        '08TH', '8TH'),
    '09TH', '9TH') &
    Trim(t.f3)  As FullAddress
FROM TableName As t

输入类似"23004th street"的字符串,并将其转换为"2304th街道"

该示例可以通过使用正则表达式的自定义VBA函数在Access中实现。

以下是在Access Immediate窗口中测试的一个功能:

? Field_Calculator("230 04th street")
230 4th street

请注意,该模式将不匹配文本段,如"34th"《045th》"04abc"。因此,这些将不受函数的影响:

? Field_Calculator("230 34th street")
230 34th street
? Field_Calculator("230 045th street")
230 045th street
? Field_Calculator("230 04abc street")
230 04abc street

这是一个函数,编写时使用后期绑定:

Public Function Field_Calculator(ByVal pInput As String) As String
    Const cstrPattern As String = "b(0)(dw{2})b"
    Dim objRegExp As Object
    Set objRegExp = CreateObject("VBScript.RegExp")
    With objRegExp
        .Pattern = cstrPattern
        .IgnoreCase = True
        Field_Calculator = .Replace(pInput, "$2")
    End With
End Function

如果您喜欢早期绑定,请设置对"Microsoft VBScript正则表达式"的引用,并进行以下更改:

'Dim objRegExp As Object
'Set objRegExp = CreateObject("VBScript.RegExp")
Dim objRegExp As RegExp
Set objRegExp = New RegExp

该函数可以从另一个VBA过程中调用。当从Access会话中运行时,它也可以在Access SQL中使用:

UPDATE YourTable
SET [street_address] = Field_Calculator([street_address]);

但是,使用自定义VBA函数的查询不能在Access会话之外运行。

最新更新