Skip to content Skip to sidebar Skip to footer

C# - Reading Html?

I'm developing a program in C# and I require some help. I'm trying to create an array or a list of items, that display on a certain website. What I'm trying to do is read the ancho

Solution 1:

As others said HtmlAgilityPack is the best for html parsing, also be sure to download HAP Explorer from HtmlAgilityPack site, use it to test your selects, anyway this SelectNode command will get all anchors that have ID and it start with menu-item :

  HtmlDocument doc = new HtmlDocument();
  doc.Load(htmlFile);
  var myNodes = doc.DocumentNode.SelectNodes("//a[starts-with(@id,'menu-item-')]");
  foreach (HtmlNode node in myNodes)
  {
    Console.WriteLine(node.Id);

  }

Solution 2:

If the HTML is valid XML you can load it using the XmlDocument class and then access the pieces you want using XPaths, or you can use and XmlReader as Adriano suggests (a bit more work).

If the HTML is not valid XML I'd suggest to use some existing HTML parsers - see for example this - that worked OK for us.

Solution 3:

You can also use the HtmlAgility pack

Solution 4:

I think this case is simple enough to use a regular expression, like <a.*title="([^"]*)".*href="([^"]*)":

stringstrRegex= @"<a.*title=""([^""]*)"".*href=""([^""]*)""";
RegexOptionsmyRegexOptions= RegexOptions.None;
RegexmyRegex=newRegex(strRegex, myRegexOptions);

stringstrTargetString= ...;

foreach (Match myMatch in myRegex.Matches(strTargetString))
{
  if (myMatch.Success)
  {
    // Use the groups matched
  }
}

Post a Comment for "C# - Reading Html?"