Get The Avaliable Xpaths Of An Html Page?
I've taken and adapted this code of how to retrieve the XPath expressions of an XML document. I Would like to do the same but using an html page to retrieve its avaliable XPaths (
Solution 1:
As far as I can see, HtmlAgilityPack has a very similar class structures to XmlDocument
. So I believe you can easiliy adapt current solution to cope with HtmlDocument
, something like this :
PublicFunction GetXPaths(ByVal Document As HtmlDocument) As List(OfString)
Dim XPathList AsNew List(OfString)
Dim XPath AsString = String.Empty
ForEach Child As HtmlNode In Document.DocumentNode.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
EndIfNext' child'Return XPathList
EndFunctionPrivateSub GetXPaths(ByVal Node As HtmlNode,
ByRef XPathList As List(OfString),
OptionalByVal XPath AsString = Nothing)
XPath &= "/" & Node.Name
IfNot XPathList.Contains(XPath) Then
XPathList.Add(XPath)
EndIfForEach Child As HtmlNode In Node.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
EndIfNext' child'EndSub
Worked fine when tested using HTML that is XML compliant. But I can't guarantee about how far this will work against malformed HTML documents.
Post a Comment for "Get The Avaliable Xpaths Of An Html Page?"