转换HTML表格到CSV使用HTML敏捷包



我想从id = statsTable的表中精确所有元素,并希望所有的数据,然后我可以读到csv。

到目前为止我写的是:

// Create a request for the URL. 
WebRequest request = WebRequest.Create("http://www.pgatour.com/stats/stat.120.html");
Console.WriteLine("Requesting data from: http://www.pgatour.com/stats/stat.120.html");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;
WebResponse response = request.GetResponse();
using (Stream stream = response.GetResponseStream())
{
    StreamReader reader = new StreamReader(stream);
    // covert html to string
    String responseString = reader.ReadToEnd();
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(responseString);
    var desktopFolder = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory);
    var fullFileName = Path.Combine(desktopFolder, "GolfStats.csv");
    using (var PlayerFile = new StreamWriter(fullFileName))
    {
        PlayerFile.WriteLine("Data downloaded: " + DateTime.Now);
        var myTable = doc.DocumentNode
                        .Descendants("table")
                        .Where(table => table.Attributes.Contains("id"))
                        .SingleOrDefault(table => table.Attributes["id"].Value == "statsTable");
        var myTableValues = myTable.Descendants("td");
        foreach (var tdV in myTableValues)
        {
            PlayerFile.WriteLine(tdV.InnerText);
            Console.WriteLine(tdV.InnerText);
        }
        PlayerFile.Flush();
    }
}

问题是我的csv只是在一个列中列出数据,以及拿起一个广告放在表中(见url在webRequest)。如果你能帮助我输出数据在表格格式,这将是极好的!

为每个表单元格创建一个新行。若要更改它,使每个表行都有单独的行,请替换

var myTableValues = myTable.Descendants("td");
foreach (var tdV in myTableValues)
{
    PlayerFile.WriteLine(tdV.InnerText);
    Console.WriteLine(tdV.InnerText);
}

var myTableRows = myTable.Descendants("tr").Where(tr => tr.Attributes.Contains("id"));
foreach (var tr in myTableRows)
{
    string line = string.Join(";", tr.Descendants("td").Select(td => td.InnerText));
    PlayerFile.WriteLine(line);
    Console.WriteLine(line);
}

.Where(tr => tr.Attributes.Contains("id"))过滤掉广告,因为带有广告的表行没有id,而所有播放器行都有。

相关内容

  • 没有找到相关文章