Skip to content Skip to sidebar Skip to footer

Get The Avaliable Xpaths Of An Html Page?

I've taken and adapted this code of how to retrieve the XPath expressions of an XML document. I Would like to do the same but using an html page to retrieve its avaliable XPaths (

Solution 1:

As far as I can see, HtmlAgilityPack has a very similar class structures to XmlDocument. So I believe you can easiliy adapt current solution to cope with HtmlDocument, something like this :

PublicFunction GetXPaths(ByVal Document As HtmlDocument) As List(OfString)
    Dim XPathList AsNew List(OfString)
    Dim XPath AsString = String.Empty
    ForEach Child As HtmlNode In Document.DocumentNode.ChildNodes
        If Child.NodeType = HtmlNodeType.Element Then
            GetXPaths(Child, XPathList, XPath)
        EndIfNext' child'Return XPathList
EndFunctionPrivateSub GetXPaths(ByVal Node As HtmlNode,
                  ByRef XPathList As List(OfString),
                  OptionalByVal XPath AsString = Nothing)
    XPath &= "/" & Node.Name
    IfNot XPathList.Contains(XPath) Then
        XPathList.Add(XPath)
    EndIfForEach Child As HtmlNode In Node.ChildNodes
        If Child.NodeType = HtmlNodeType.Element Then
            GetXPaths(Child, XPathList, XPath)
        EndIfNext' child'EndSub

Worked fine when tested using HTML that is XML compliant. But I can't guarantee about how far this will work against malformed HTML documents.

Post a Comment for "Get The Avaliable Xpaths Of An Html Page?"