我是新的VBA。在把我的问题贴在这里之前,我花了将近3天的时间上网。
我有300多个文本文件(使用OCR从PDF转换的文本),从文本文件。我需要得到包含"字母"one_answers"数字"的所有单词(例如KT315A, KT-315-a等)以及源参考(txt文件名)。
我需要的是
1。添加"智能过滤器",只复制包含
的单词"字母"one_answers"数字"
-
将复制的数据粘贴到A列
-
在B列添加参考文件名
我发现下面的代码可以复制所有数据从文本文件到excel电子表格。
文本文件看起来像
"线从252年ddddd - 552 a , ,,, @,@, rrrr, 22岁,…"Kt3443, fff…等"
XLS的最终结果应该是
A | B
252A-552A | file1
kt3443 | file1
Option Explicit
Const sPath = "C:outp" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
谢谢!
从您的示例中确切地判断一个单词的构成有点困难。它显然可以包含字母和数字以外的字符(例如破折号),但有些项目前面有点,因此它不能被定义为由space
分隔。
我定义了一个"word"作为字符串
- 以字母或数字开头,以字母或数字结尾
- 由字母和数字组成
- 还可以包含除逗号以外的任何其他非空格字符
此外,通过使用FileSystemObject
,我们可以一次处理一行,而不是将整个文件读取到Excel工作簿中,而无需将300个文件读取到Excel中。基本文件夹是由VBA代码中的常量设置的。
但是还有其他方法可以做到这一点。
请确保为早期绑定设置引用,如代码中所述:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:UsersRonDesktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:d(?=S*[a-z])|[a-z](?=S*d))+S*[a-zd]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub