如何从文本文件中提取特定的单词到xls电子表格

我是新的VBA。在把我的问题贴在这里之前，我花了将近3天的时间上网。

我有300多个文本文件(使用OCR从PDF转换的文本)，从文本文件。我需要得到包含"字母"one_answers"数字"的所有单词(例如KT315A, KT-315-a等)以及源参考(txt文件名)。

我需要的是

1。添加"智能过滤器"，只复制包含
的单词"字母"one_answers"数字"

将复制的数据粘贴到A列

在B列添加参考文件名

我发现下面的代码可以复制所有数据从文本文件到excel电子表格。

文本文件看起来像

"线从252年ddddd - 552 a , ,,, @,@, rrrr, 22岁,…"Kt3443, fff…等"

XLS的最终结果应该是

A | B

252A-552A | file1

kt3443 | file1

 Option Explicit

Const sPath = "C:outp" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab  'for TAB delimited text files

Sub ImportMultipleTextFiles()
   Dim wb As Workbook
   Dim sFile As String
   Dim inputRow As Long
   RefreshSheet
   On Error Resume Next
   sFile = Dir(sPath & "*.txt")
   Do Until sFile = ""
      inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
      'open the text file
'format=6 denotes a text file
      Set wb = Workbooks.Open(Filename:=sPath & sFile, _
         Format:=6, _
         Delimiter:=delim)
      'copy and paste
      wb.Sheets(1).Range("A1").CurrentRegion.Copy _
         Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
      wb.Close SaveChanges:=False
      'get next text file
      sFile = Dir()
   Loop
   Set wb = Nothing
End Sub

Sub RefreshSheet()
   'delete old sheet and add a new one
   On Error Resume Next
   Application.DisplayAlerts = False
      Sheets("Temp").Delete
   Application.DisplayAlerts = True
   Worksheets.Add
   ActiveSheet.Name = "Temp"
   On Error GoTo 0
End Sub

谢谢!

从您的示例中确切地判断一个单词的构成有点困难。它显然可以包含字母和数字以外的字符(例如破折号)，但有些项目前面有点，因此它不能被定义为由space分隔。

我定义了一个"word"作为字符串

以字母或数字开头，以字母或数字结尾
由字母和数字组成
- 还可以包含除逗号以外的任何其他非空格字符

要做到这一点，我首先用空格替换所有的逗号，然后应用一个适当的正则表达式。但是，这可能会接受不需要的字符串，因此您可能需要更具体地定义什么是单词。

此外，通过使用FileSystemObject，我们可以一次处理一行，而不是将整个文件读取到Excel工作簿中，而无需将300个文件读取到Excel中。基本文件夹是由VBA代码中的常量设置的。

但是还有其他方法可以做到这一点。

请确保为早期绑定设置引用，如代码中所述:

Option Explicit
'Set References to:
'  Microsoft Scripting Runtime
'  Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
    Dim FSO As FileSystemObject
    Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim WS As Worksheet, RW As Long
    Const sPath As String = "C:UsersRonDesktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
    WS.Columns.Clear
Set RE = New RegExp
With RE
    .Global = True
    .Pattern = "(?:d(?=S*[a-z])|[a-z](?=S*d))+S*[a-zd]"
    .IgnoreCase = True
End With
For Each FI In FO.Files
    If FI.Name Like "*.txt" Then
        Set TS = FI.OpenAsTextStream(ForReading)
        Do Until TS.AtEndOfStream
            'Change .ReadLine to .ReadAll *might* make this run faster
            ' but would need to be tested.
            Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
            If MC.Count > 0 Then
                For Each M In MC
                    RW = RW + 1
                    WS.Cells(RW, 1) = M
                    WS.Cells(RW, 2) = FI.Name
                Next M
            End If
        Loop
    End If
Next FI
End Sub

相关内容

最新更新

热门标签：