如何使用ExcelDataReader从文件中删除空单元格和行?



我正在使用ExcelDataReader读取excel文件,但正在处理的文件有16k列和1200条记录。在两者之间,只有 52 列被填充,其他是空的,所以我想删除带有行的空列。

尝试使用 :

var dataset = reader.AsDataSet(new ExcelDataSetConfiguration() {
ConfigureDataTable = _ => new ExcelDataTableConfiguration() {
FilterRow = rowReader => rowReader.Depth != 0
}
});

帮助将不胜感激,谢谢

AFAIK, 没有任何简单的方法来过滤空列或空行,我最好的尝试是:

// use ExcelDataTableConfiguration.FilterRow to filter empty rows
FilterRow = rowReader =>
{
var hasData = false;
for (var i = 0; i < rowReader.FieldCount; i++)
{
if (rowReader[i] == null || string.IsNullOrEmpty(rowReader[i].ToString()))
{
continue;
}
hasData = true;
break;
}
return hasData;
},
// use ExcelDataTableConfiguration.FilterColumn to filter empty columns
FilterColumn = (rowReader, colIndex) =>
{
var hasData = false;
rowReader.Reset();
// this will skip first row as it is name of column
rowReader.Read();
while (rowReader.Read())
{
if (rowReader[colIndex] == null || 
string.IsNullOrEmpty(rowReader[colIndex].ToString()))
{
continue;
}
hasData = true;
break;
}
// below codes do a trick!
rowReader.Reset();
rowReader.Read();
return hasData;
}

FilterRowFilterColumn内部使用IExcelDataReader论点是邪恶的!
我认为在主要reader中,每个rowReader都引用了一个对象! 所以你应该始终知道如何使用它们,正如你所看到的,我添加了一个技巧,让读者在使用后做好准备。 - HTH ;)

从 ShA.t 的回答中获得一点灵感,我能够使标题行和列准确地读取它们在 excel 文档中的开头位置。您只需要过滤空列,只要您知道行的开始位置即可。下面是一个示例代码:

public class TrimmedTableSample 
{
#region Properties
int HeaderRowIndex { get; set; }
#endregion
#region Methods
public void Read(string documentPath)
{
using (var stream = File.Open(documentPath, FileMode.Open, FileAccess.Read))
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration()
{
UseColumnDataType = true,
ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
{
EmptyColumnNamePrefix = "Column ",
UseHeaderRow = true,
ReadHeaderRow = (reader) =>
{
bool empty = true;
HeaderRowIndex = 0;
while (empty)
{
for (var i = 0; i < reader.FieldCount && empty; i++)
empty = string.IsNullOrWhiteSpace(reader.GetString(i));
if (empty)
{
empty = reader.Read(); // Only continue if more content is available
HeaderRowIndex++; // Keep track of the first row position.
}
}
},
FilterColumn = (reader, index) =>
{
bool empty = false;
string sheet = reader.Name;
// Start reading the table from the beginning
reader.Reset();
// EDIT: Head over the our current excel sheet
while (reader.Name != sheet)
if (!reader.NextResult())
break;

// Head to the first row with content
int rowIndex = 0;
while (rowIndex < HeaderRowIndex)
{
reader.Read();
rowIndex++;
}
while (reader.Read())
{
// Decide if the current column is empty
if (reader[index] == null || string.IsNullOrEmpty(reader[index].ToString()))
continue;
empty = true;
break;
}
// Start over again (This allows the reader to automatically read the rest of the content itself)
reader.Reset();
// EDIT: Head over the our current excel sheet
while (reader.Name != sheet)
if (!reader.NextResult())
break;
reader.Read();
// Head over to the first row with content
rowIndex = 0;
while (rowIndex < HeaderRowIndex)
{
reader.Read();
rowIndex++;
}
// Return info on whether this column should be ignored or not.
return empty;
}
}
});  
}
}
#endregion
}

祝你好运!

我使用这个通过调用DataRow.Delete((方法工作的解决方案:

foreach (var row in rows)
{
var emptyColumnsCount = 0;
for (var i = 0; i < columns.Count; i++)
{
if (!row.IsNull(columns[i]))
{
// Do some logic to read row.
}
else
{
emptColumnsCount++;
}
}
if (emptyColumnsCount == columns.Count)
row.Delete();
else
// Do some logic to use row
}

最新更新