如何在数据表c#中按需加载excel行



我有一个要求,在中我必须从Microsoft excel表中填写dataTable。

工作表可能有很多数据,因此要求在数据表上迭代foreach循环时,应该根据需要填充该表,该数据表应该包含Microsoft excel工作表中的数据。

这意味着,如果工作表中有1000000条记录,则数据表应根据循环中foreach当前项的当前位置,以100为一批的方式获取数据。

任何指示或建议都将不胜感激。

我建议您使用OpenXML来解析和读取文件中的excel数据。这也将允许您读出工作簿中的特定章节/区域。

您将在以下链接中找到更多信息和示例:Microsoft文档-分析和阅读大型电子表格文档(Open XML SDK(

这将比使用官方的microsoft office excel interop更高效、更容易开发。

**我不在装有Visual stuido的电脑附近,因此此代码未经测试,可能会出现语法错误,直到我稍后可以对其进行测试。

它仍然会给你需要做什么的主要想法。

private void ExcelDataPages(int firstRecord, int numberOfRecords)
{

Excel.Application dataApp = new Excel.Application(); 
Excel.Workbook dataWorkbook = new Excel.Workbook();
int x = 0;

dataWorkbook.DisplayAlerts = false;
dataWorkbook.Visible = false;
dataWorkbook.AutomationSecurity = Microsoft.Office.Core.MsoAutomationSecurity.msoAutomationSecurityLow;
dataWorkbook = dataApp.Open(@"C:TestYourWorkbook.xlsx");

try
{
Excel.Worksheet dataSheet = dataWorkbook.Sheet("Name of Sheet");

while (x < numberOfRecords)
{
Range currentRange = dataSheet.Rows[firstRecord + x]; //For all columns in row 

foreach (Range r in currentRange.Cells) //currentRange represents all the columns in the row
{
// do what you need to with the Data here.
}
x++;
}
}
catch (Exception ex)
{
//Enter in Error handling
}
dataWorkbook.Close(false); //Depending on how quick you will access the next batch of data, you may not want to close the Workbook, reducing load time each time.  This may also mean you need to move the open of the workbook to a higher level in your class, or if this is the main process of the app, make it static, stopping the garbage collector from destroying the connection.
dataApp.Quit();
}

尝试以下操作--它使用NuGet包DocumentFormat.OpenXml代码来自Using OpenXmlReader。但是,我修改了它,将数据添加到DataTable中。由于您要多次读取同一Excel文件中的数据,因此使用SpreadSheetDocument实例打开Excel文件一次并在完成后处理它会更快。由于在应用程序退出之前需要处理SpreedSheetDocument的实例,因此使用IDisposable

上面写着";ToDo";,您需要将创建DataTable列的代码替换为自己的代码,以便为项目创建正确的列。

我用一个包含大约15000行的Excel文件测试了下面的代码。当一次读取100行时,第一次读取花费大约500ms-800ms,而随后的读取花费大约100ms-400ms。

创建一个类(名称:HelperOpenXml(

HelperOpenXml.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Data;
using System.Diagnostics;
namespace ExcelReadSpecifiedRowsUsingOpenXml
{
public class HelperOpenXml : IDisposable
{
public string Filename { get; private set; } = string.Empty;
public int RowCount { get; private set; } = 0;
private SpreadsheetDocument spreadsheetDocument = null;
private DataTable dt = null;

public HelperOpenXml(string filename)
{
this.Filename = filename;
}
public void Dispose()
{
if (spreadsheetDocument != null)
{
try
{
spreadsheetDocument.Dispose();
dt.Clear();
}
catch(Exception ex)
{
throw ex;
}
}
}
public DataTable GetRowsSax(int startRow, int endRow, bool firstRowIsHeader = false)
{
int startIndex = startRow;
int endIndex = endRow;
if (firstRowIsHeader)
{
//if first row is header, increment by 1
startIndex = startRow + 1;
endIndex = endRow + 1;
}
if (spreadsheetDocument == null)
{
//create new instance
spreadsheetDocument = SpreadsheetDocument.Open(Filename, false);
//create new instance
dt = new DataTable();
//ToDo: replace 'dt.Columns.Add(...)' below with your code to create the DataTable columns
//add columns to DataTable
dt.Columns.Add("A");
dt.Columns.Add("B");
dt.Columns.Add("C");
dt.Columns.Add("D");
dt.Columns.Add("E");
dt.Columns.Add("F");
dt.Columns.Add("G");
dt.Columns.Add("H");
dt.Columns.Add("I");
dt.Columns.Add("J");
dt.Columns.Add("K");
}
else
{
//remove existing data from DataTable
dt.Rows.Clear(); 
}
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
int numWorkSheetParts = 0;
foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
{
int rowIndex = 0;
//use the reader to read the XML
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
List<string> cValues = new List<string>();
int colIndex = 0;
do
{
//only get data from desired rows
if ((rowIndex > 0 && rowIndex >= startIndex && rowIndex <= endIndex) ||
(rowIndex == 0 && !firstRowIsHeader && rowIndex >= startIndex && rowIndex <= endIndex))
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellRef = c.CellReference; //ex: A1, B1, ..., A2, B2
string cellValue = string.Empty;
//string/text data is stored in SharedString
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Debug.WriteLine("{0}: {1} ", c.CellReference, cellValue);
//add value to List which is used to add a row to the DataTable
cValues.Add(cellValue);
}
}
colIndex += 1; //increment
} while (reader.ReadNextSibling());
if (cValues.Count > 0)
{
//if List contains data, use it to add row to DataTable
dt.Rows.Add(cValues.ToArray()); 
}
rowIndex += 1; //increment
if (rowIndex > endIndex)
{
break; //exit loop
}
}
}
}
numWorkSheetParts += 1; //increment
}
DisplayDataTableData(dt); //display data in DataTable
return dt;
}

private void DisplayDataTableData(DataTable dt)
{
foreach (DataColumn dc in dt.Columns)
{
Debug.WriteLine("colName: " + dc.ColumnName);
}
foreach (DataRow r in dt.Rows)
{
Debug.WriteLine(r[0].ToString() + " " + r[1].ToString());
}
}
}
}

用法

private string excelFilename = @"C:TempTest.xlsx";
private HelperOpenXml helperOpenXml = null;
...
private void GetData(int startIndex, int endIndex, bool firstRowIsHeader)
{
helperOpenXml.GetRowsSax(startIndex, endIndex, firstRowIsHeader);
}

注意:请确保在应用程序退出之前调用Dispose()(例如:helperOpenXml.Dispose();(。

更新

OpenXML将日期存储为自1900年1月1日以来的天数。对于1900年1月1日之前的日期,它们存储在SharedString中。有关更多信息,请参阅使用打开的xml sdk 从xlsx读取日期

下面是一个代码片段:

Cell c = (Cell)reader.LoadCurrentElement();
...
string cellValue = string.Empty
...
cellValue = c.CellValue.InnerText;
double dateCellValue = 0;
Double.TryParse(cellValue, out dateCellValue);
DateTime dt = DateTime.FromOADate(dateCellValue);
cellValue = dt.ToString("yyyy/MM/dd");

另一个简单的替代方案是:查看NUGET包ExcelDataReader,并提供有关https://github.com/ExcelDataReader/ExcelDataReader

用法示例:

[Fact] 
void Test_ExcelDataReader() 
{

System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
var scriptPath = Path.GetDirectoryName(Util.CurrentQueryPath); // LinqPad script path
var filePath = $@"{scriptPath}TestExcel.xlsx";
using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
// Auto-detect format, supports:
//  - Binary Excel files (2.0-2003 format; *.xls)
//  - OpenXml Excel files (2007 format; *.xlsx, *.xlsb)
using (var reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream))
{
var result = reader.AsDataSet();
// The result of each spreadsheet is in result.Tables
var t0 = result.Tables[0];
Assert.True(t0.Rows[0][0].Dump("R0C0").ToString()=="Hello", "Expected 'Hello'");
Assert.True(t0.Rows[0][1].Dump("R0C1").ToString()=="World!", "Expected 'World!'");          
} // using
} // using
} // fact

在你开始阅读之前,你需要设置和编码提供者如下:

System.Text.Encoding.RegisterProvider(
System.Text.CodePagesEncodingProvider.Instance);

单元格的寻址方式如下:

var t0 = result.Tables[0]; // table 0 is the first worksheet
var cell = t0.Rows[0][0];  // on table t0, read cell row 0 column 0

您可以很容易地在for循环中循环行和列,如下所示:

for (int r = 0; r < t0.Rows.Count; r++)
{
var row = t0.Rows[r];
var columns = row.ItemArray;
for (int c = 0; c < columns.Length; c++)
{
var cell = columns[c];
cell.Dump();
}
}

我将此代码与EPPlus DLL一起使用,不要忘记添加引用。但应检查是否符合您的要求。

public DataTable ReadExcelDatatable(bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(this._fullPath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
DataTable tbl = new DataTable();
int i = 1;
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
//table head
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
tbl.Columns.Add(_tableHead[i]);
i++;
}
var startRow = hasHeader ? 2 : 1;

for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
}
return tbl;
}
}

我会给你一个不同的答案。如果将一百万行加载到DataTable中的性能不好,请使用驱动程序加载数据:如何有效地打开一个巨大的excel文件

DataSet excelDataSet = new DataSet();
string filePath = @"c:tempBigBook.xlsx";
// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with  "';Extended Properties="Excel 8.0;HDR=YES;"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties="Excel 12.0;HDR=YES;"";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
("select * from [Sheet1$]", conn);
objDA.Fill(excelDataSet);
//dataGridView1.DataSource = excelDataSet.Tables[0];
}

接下来使用DataView筛选数据集的DataTable。使用DataView的RowFilter属性,可以根据行的列值指定行的子集。

DataView prodView = new DataView(excelDataSet.Tables[0],  
"UnitsInStock <= ReorderLevel",  
"SupplierID, ProductName",  
DataViewRowState.CurrentRows); 

参考编号:https://www.c-sharpcorner.com/article/dataview-in-C-Sharp/

或者您可以直接使用DataTables的DefaultView RowFilter:

excelDataSet.Tables[0].DefaultView.RowFilter = "Amount >= 5000 and Amount <= 5999 and Name = 'StackOverflow'";