Kotchasan Framework Documentation

DOMParser

EN 05 Feb 2026 07:48

DOMParser

\Kotchasan\DOMParser parses HTML strings into a DOM structure.

Usage

use Kotchasan\DOMParser;

// Parse from string
$html = '<div class="content"><p>Hello World</p></div>';
$parser = new DOMParser($html);

// Get all nodes
$nodes = $parser->nodes();

// Export back to HTML
$output = $parser->toHTML();

Constructor

$parser = new DOMParser($html, $charset = 'utf-8');

Parameter	Type	Description
`$html`	string	HTML code to parse
`$charset`	string	Encoding (default: utf-8)

Methods

load()

Static method to load HTML from a URL.

$parser = DOMParser::load('https://example.com/page.html');

nodes()

Retrieve an array of all nodes.

$nodes = $parser->nodes();

foreach ($nodes as $node) {
    echo $node->nodeName;  // DIV, P, SPAN, etc.
}

toHTML()

Export parsed HTML back as a string.

$html = $parser->toHTML();

DOMNode Properties

Each node has the following properties:

Property	Type	Description
`nodeName`	string	Tag name (uppercase) e.g., DIV, P
`nodeValue`	string/null	Value of text node
`attributes`	array	HTML attributes
`parentNode`	DOMNode/null	Parent node
`childNodes`	array	Child nodes
`previousSibling`	DOMNode/null	Previous sibling
`nextSibling`	DOMNode/null	Next sibling

Usage Examples

Parse and Display Structure

$html = '<article>
    <h1>Title</h1>
    <p class="intro">Introduction paragraph</p>
    <p>Another paragraph</p>
</article>';

$parser = new DOMParser($html);
$nodes = $parser->nodes();

function printNode($node, $level = 0) {
    $indent = str_repeat('  ', $level);

    if ($node->nodeName === '') {
        echo $indent . "TEXT: " . trim($node->nodeValue) . "\n";
    } else {
        echo $indent . $node->nodeName;
        if (!empty($node->attributes)) {
            echo " [" . implode(', ', array_keys($node->attributes)) . "]";
        }
        echo "\n";

        foreach ($node->childNodes as $child) {
            printNode($child, $level + 1);
        }
    }
}

foreach ($nodes as $node) {
    printNode($node);
}

// Output:
// ARTICLE
//   H1
//     TEXT: Title
//   P [CLASS]
//     TEXT: Introduction paragraph
//   P
//     TEXT: Another paragraph

Load from URL and Analyze

$parser = DOMParser::load('https://example.com');
$nodes = $parser->nodes();

// Count links
$linkCount = 0;
function countLinks($node, &$count) {
    if ($node->nodeName === 'A') {
        $count++;
    }
    foreach ($node->childNodes as $child) {
        countLinks($child, $count);
    }
}

foreach ($nodes as $node) {
    countLinks($node, $linkCount);
}

echo "Found $linkCount links";

Extract Text Content

$html = '<div><p>Hello</p><p>World</p></div>';
$parser = new DOMParser($html);

foreach ($parser->nodes() as $node) {
    echo $node->nodeText();  // "Hello\nWorld" (Using DOMNode::nodeText())
}

Check Class

$html = '<div class="container main-content">...</div>';
$parser = new DOMParser($html);
$nodes = $parser->nodes();

$node = $nodes[0];
if ($node->hasClass('container')) {
    echo "Found class 'container'";
}

HTML Cleanup

DOMParser automatically cleans HTML:

Removes <script> and <style> tags
Removes HTML comments
Removes <!DOCTYPE>, <link>, <meta> tags
Removes unnecessary whitespace

$html = '<!DOCTYPE html>
<html>
<head>
    <script>alert("test")</script>
    <style>body{}</style>
</head>
<body>
    <p>Content</p>
    <!-- comment -->
</body>
</html>';

$parser = new DOMParser($html);
echo $parser->toHTML();
// Output: <P>Content</P>

DOMNode - DOM node representation
Html - HTML generation
Text - Text utilities

DOMParser

Usage

Constructor

Methods

load()

nodes()

toHTML()

DOMNode Properties

Usage Examples

Parse and Display Structure

Load from URL and Analyze

Extract Text Content

Check Class

HTML Cleanup

Related Classes

Did you spot an improvement?