使用 SQL 查询导入具有重复列名称的 CSV



我有 3 个 csv 文件位于不同的网络文件夹中。网络文件夹/子文件夹中可能包含空格。我想联接这 3 个 csv 文件以创建包含所需列的单个 ADO 记录集。

Test1.csv(我已经从所有csv中排除了不必要的列(

T1Id  | Gpos | lbl
-----------------------
1001  |  0   | Innovate
1002  |  1   | Buys
1003  |  2   | Sales
1004  |  3   | Forecasts
1005  |  4   | Usage
1006  |  5   | Forum

Test2.csv:(我已经从所有csv中排除了不必要的列(

T2Id |  T1Id |  Apos | tval
-----------------------------------
382  |  1001 |  1   | my life my rules.
203  |  1001 |  2   | earth wind rain and fire.
658  |  1002 |  1   | wealth power blood desire.
200  |  1003 |  1   | one good to live for.
301  |  1003 |  2   | before we die.
439  |  1004 |  1   | one taste to glory
795  |  1004 |  2   | one mouthful of sky.
494  |  1004 |  3   | some other text.

Test3.csv:(我已经从所有csv中排除了不必要的列(

(blank)  Aggregate (blank) Aggregate  149_SG_Bryl_Cream   891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
X0.1     0.422300          0.424658    0.458014          0.434639
X0.2     0.318628          0.345475    0.334548          0.333675
X0.3     0.274694          0.274643    0.243424          0.286865
X0.4     0.294568          0.346758    0.276552          0.366648
X1.1     0.565734          0.293436    0.283564          0.235366
X1.2     0.286657          0.755456    0.283233          0.310544
X2.1     0.234643          0.245459    0.245434          0.343423
X2.2     0.343645          0.455659    0.343282          0.334343
X2.3     0.234643          0.245459    0.245434          0.343423

如您所见,Test3.csv 有 4 个问题:

  1. 它具有带有空白标题的列,但包含数据,例如X0.1.
  2. 它具有带有空白标题的列,但带有no data.
  3. 它有 2 个同名的Aggregate列。
  4. 它有SG列(许多(,我需要提取only the part starting from 'SG'.

要求

  1. Test1&Test2CSV 需要加入T1Id
  2. 只需要保留这些文件的lbl列和tval列。
  3. AposGpos将用于连接从文件中的第一个空白标题列(包含X0.1等值(创建的 2 列Test3.csv

法典:

Sub Doit(cFiles As Collection)

Application.DisplayAlerts = False
Application.ScreenUpdating = False

Dim strSQL1$, TempSG$,sFullDirectory$

sFullDirectory = `\xxx.xxx.xxxclient name`

strFF1 = "Test1.csv"
strFF2 = "Test2.csv"
strFF3 = "Test3.csv"

' Test1.csv Path: `\xxx.xxx.xxxclient nameXYZsub-folder1 name`
strF1 = cFiles(strFF1) 
' Test1.csv Path: `\xxx.xxx.xxxclient nameABCsub-folder2 name`
strF2 = cFiles(strFF2)
' Test1.csv Path: `\xxx.xxx.xxxclient nameDEFGHIsub-folder1 name`
strF3 = cFiles(strFF3)
Set oCon = CreateObject("ADODB.Connection")
Set oRs = CreateObject("ADODB.Recordset")
strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"

' Select TOP 1 row with headers from `Test3.csv`
strSQL = "SELECT TOP 1 * FROM " & strF3 & strFF3
oCon.Open strCon
Set oRs = oCon.Execute(strSQL)

i = 1
strSQL = "SELECT "
For Each Fld In oRs.Fields
Select Case True
Case Is = Fld.Name = "NoName"  'Blank header columns
If Fld.Value <> vbNullString Then
strSQL = strSQL & " CLng(Replace(Left(" & Fld.Name & ", InStr(" & Fld.Name & ", '.') - 1), 'X', ''))" & " AS [gval],"
strSQL = strSQL & " CLng(Right(" & Fld.Name & ", Len(" & Fld.Name & ") - InStr(" & Fld.Name & ", '.')))" & " AS [pos],"
Else
' Do nothing here
End If
Case Is = Fld.Name = "Aggregate"
strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [Aggregate " & i & "],"
i = i + 1
Case Is = InStr(1, Fld.Name, "SG") > 0
TempSG = Trim(Mid(Fld.Name, InStr(1, Fld.Name, "SG"), Len(Fld.Name)))
strSQL = strSQL & " CDbl([" & Fld.Name & "]) AS [" & TempSG & "], "
End Select
Next Fld

If Right(Trim(strSQL), 1) = "," Then strSQL = Left(Trim(strSQL), Len(Trim(strSQL)) - 1)
strSQL = strSQL & " FROM " & strF3 & strFF3
strSQL = strSQL & " WHERE ((NoName) <> 'Base Sizes')"

oRs.Close

' This `strSQL1` will be used to join `strSQL`.
strSQL1 = "SELECT  G.[lbl], A.[tval], Q.*"
strSQL1 = strSQL1 & " FROM "
strSQL1 = strSQL1 & " (SELECT G.[Gpos], A.[Apos], G.[lbl], A.[tval] FROM " & strF1 & strFF1 & "  G," & strF2 & strFF2 & "  A WHERE G.[T1Id] = A.[T1Id])  T, (" & strSQL & ")  Q "
strSQL1 = strSQL1 & " WHERE (CLng(T.[G].[Gpos]) = CLng(Q.[gval])) AND (CLng(T.[A].[Apos]) = CLng(Q.[pos]))"
strSQL1 = strSQL1 & " ORDER BY CLng(Q.[gval]), CDbl(Q.[Aggregate 1]) DESC, G.[lbl];"

'CREATE RECORDSET FROM SQL STRING
Set oRs = oCon.Execute(strSQL1)

ExitSub:
oRs.Close
oCon.Close
Set oRs = Nothing
Set oCon = Nothing

Application.ScreenUpdating = True
Application.DisplayAlerts = True

Exit Sub
ErrorHandler:
MsgBox "Error No: " & Err.Number & vbCrLf & "Description: " & Err.Description, vbCritical + vbOKOnly, "An Error occurred!"
Err.Clear
On Error GoTo 0
Resume ExitSub
End Sub

拆分测试3后.csv表如下所示:

gval   pos    (blank)  Aggregate (blank) Aggregate  149_SG_Bryl_Cream   891_SG_Myo__Sky_Blue_Dress
------------------------------------------------------------------------------
0      1     0.422300          0.424658    0.458014          0.434639
0      2     0.318628          0.345475    0.334548          0.333675
0      3     0.274694          0.274643    0.243424          0.286865
0      4     0.294568          0.346758    0.276552          0.366648
1      1     0.565734          0.293436    0.283564          0.235366
1      2     0.286657          0.755456    0.283233          0.310544
2      1     0.234643          0.245459    0.245434          0.343423
2      2     0.343645          0.455659    0.343282          0.334343
2      3     0.234643          0.245459    0.245434          0.343423

最终表:(简短示例(

lbl       txval                     gval    pos  Aggregate 1       Aggregate 2   SG_Bryl_Cream    SG_Myo__Sky_Blue_Dress
-------------------------------------------------------------------------------------------------------------------
Innovate  My life my rules.          0      1    0.422300          0.424658    0.458014          0.434639
Innovate  earth wind rain and fire.  0      2    0.318628          0.345475    0.334548          0.333675
Buys      my life my rules.          1      1    0.565734          0.293436    0.283564          0.235366
Buys      earth wind rain and fire.  1      2    0.286657          0.755456    0.283233          0.310544
Sales     my life my rules.          2      1    0.234643          0.245459    0.245434          0.343423
Sales     earth wind rain and fire.  2      2    0.343645          0.455659    0.343282          0.334343
Sales     Some other text.           2      3    0.234643          0.245459    0.245434          0.343423
...

问题

  1. 有没有办法通过SQL获取重复列Aggregate列?
  2. 有没有办法从Q中仅选择SG列。Test3.csv通过 SQL ?

例如

strSQL1 = "SELECT  G.[lbl], A.[tval], Q.* "

相反:

strSQL1 = "SELECT  G.[lbl], A.[tval], Q.* LIKE 'SG' "

考虑通过使用所需的列别名直接查询 CSV 文件来运行纯 SQL:

SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:FolderToCSV With Spaces].[my File.csv] AS t;

此外,查询可以集成到操作查询中:

生成表查询

SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage]
INTO [myNewtable]
FROM [text;database=C:FolderToCSV With Spaces].[my File.csv] AS t;

追加查询

INSERT INTO myFinalTable (Food, Bev, Meds, Average, Midpoint, OtherAverage)
SELECT t.Food, t.Bev, t.Meds, t.Average, t.Midpoint, t.Average AS [OtherAverage]
FROM [text;database=C:FolderToCSV With Spaces].[my File.csv] AS t;

若要在 VBA 中使用 ADO 运行,请将 Jet/ACE SQL 引擎与 Excel 或 Access ODBC 驱动程序一起使用,其中工作簿或数据库文件源无关紧要,因为您远程连接到 CSV:

Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
' EXCEL DRIVER
conn.Open "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
& "DBQ=" & ThisWorkbook.FullName & ";"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [text;database=C:FolderToCSV With Spaces].[my File.csv] AS t", conn 
' ACCESS DRIVER
conn.Open "Driver={Microsoft Access Driver (*.mdb, *.accdb)};" _
& "DBQ=C:PathToAnyDatabase.accdb"
rst.Open "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [text;database=C:FolderToCSV With Spaces].[my File.csv] AS t", conn 

使用 Access ODBC 文本驱动程序

Set oCon = CreateObject("ADODB.Connection")
Set oRs = CreateObject("ADODB.Recordset")
sFullDirectory = "C:FolderToCSV With Spaces"
strCon = "Driver=Microsoft Access Text Driver (*.txt, *.csv);" _
& "Dbq=" & sFullDirectory & ";Extensions=asc,csv,tab,txt;HDR=Yes;"
strSQL = "SELECT t.Food, t.Bev, t.Meds, t.[Average], t.Midpoint, t.Average AS [OtherAverage] " _
& " FROM [my File.csv] AS t"
oCon.Open strCon
Set oRs = oCon.Execute(strSQL)

最新更新