Skip to content Skip to sidebar Skip to footer

Php Domdocument / Xpath: Get Html-text And Surrounded Tags

I am looking for this functionality: Given is this html-Page:

Hello, world!

I want to get an array that onl

Solution 1:

You can iterate over the parentNodes of the DOMText nodes:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$textNodes = array();
foreach($xpath->query('/html/body//text()') as$i => $textNode) {
    $textNodes[$i] = array(
        'text' => $textNode->nodeValue,
        'parents' => array()
    );
    for (
        $currentNode = $textNode->parentNode;
        $currentNode->parentNode;
        $currentNode = $currentNode->parentNode
    ) {
        $textNodes[$i]['parents'][] = $currentNode->nodeName;
    }
}
print_r($textNodes);

demo

Note that loadHTML will add implied elements, e.g. it will add html and head elements which you will have to take into account when using XPath. Also note that any whitespace used for formatting is considered a DOMText so you will likely get more elements than you expect. If you only want to query for non-empty DOMText nodes use

/html/body//text()[normalize-space(.) != ""]

demo

Solution 2:

In your sample code, $res=$xpath->query("//body//*/text()") is a DOMNodeList of DOMText nodes. For each DOMText, you can access the containing element via the parentNode property.

Post a Comment for "Php Domdocument / Xpath: Get Html-text And Surrounded Tags"