Kotchasan Framework Documentation
DOMParser
DOMParser
\Kotchasan\DOMParser parses HTML strings into a DOM structure.
Usage
use Kotchasan\DOMParser;
// Parse from string
$html = '<div class="content"><p>Hello World</p></div>';
$parser = new DOMParser($html);
// Get all nodes
$nodes = $parser->nodes();
// Export back to HTML
$output = $parser->toHTML();Constructor
$parser = new DOMParser($html, $charset = 'utf-8');| Parameter | Type | Description |
|---|---|---|
$html |
string | HTML code to parse |
$charset |
string | Encoding (default: utf-8) |
Methods
load()
Static method to load HTML from a URL.
$parser = DOMParser::load('https://example.com/page.html');nodes()
Retrieve an array of all nodes.
$nodes = $parser->nodes();
foreach ($nodes as $node) {
echo $node->nodeName; // DIV, P, SPAN, etc.
}toHTML()
Export parsed HTML back as a string.
$html = $parser->toHTML();DOMNode Properties
Each node has the following properties:
| Property | Type | Description |
|---|---|---|
nodeName |
string | Tag name (uppercase) e.g., DIV, P |
nodeValue |
string/null | Value of text node |
attributes |
array | HTML attributes |
parentNode |
DOMNode/null | Parent node |
childNodes |
array | Child nodes |
previousSibling |
DOMNode/null | Previous sibling |
nextSibling |
DOMNode/null | Next sibling |
Usage Examples
Parse and Display Structure
$html = '<article>
<h1>Title</h1>
<p class="intro">Introduction paragraph</p>
<p>Another paragraph</p>
</article>';
$parser = new DOMParser($html);
$nodes = $parser->nodes();
function printNode($node, $level = 0) {
$indent = str_repeat(' ', $level);
if ($node->nodeName === '') {
echo $indent . "TEXT: " . trim($node->nodeValue) . "\n";
} else {
echo $indent . $node->nodeName;
if (!empty($node->attributes)) {
echo " [" . implode(', ', array_keys($node->attributes)) . "]";
}
echo "\n";
foreach ($node->childNodes as $child) {
printNode($child, $level + 1);
}
}
}
foreach ($nodes as $node) {
printNode($node);
}
// Output:
// ARTICLE
// H1
// TEXT: Title
// P [CLASS]
// TEXT: Introduction paragraph
// P
// TEXT: Another paragraphLoad from URL and Analyze
$parser = DOMParser::load('https://example.com');
$nodes = $parser->nodes();
// Count links
$linkCount = 0;
function countLinks($node, &$count) {
if ($node->nodeName === 'A') {
$count++;
}
foreach ($node->childNodes as $child) {
countLinks($child, $count);
}
}
foreach ($nodes as $node) {
countLinks($node, $linkCount);
}
echo "Found $linkCount links";Extract Text Content
$html = '<div><p>Hello</p><p>World</p></div>';
$parser = new DOMParser($html);
foreach ($parser->nodes() as $node) {
echo $node->nodeText(); // "Hello\nWorld" (Using DOMNode::nodeText())
}Check Class
$html = '<div class="container main-content">...</div>';
$parser = new DOMParser($html);
$nodes = $parser->nodes();
$node = $nodes[0];
if ($node->hasClass('container')) {
echo "Found class 'container'";
}HTML Cleanup
DOMParser automatically cleans HTML:
- Removes
<script>and<style>tags - Removes HTML comments
- Removes
<!DOCTYPE>,<link>,<meta>tags - Removes unnecessary whitespace
$html = '<!DOCTYPE html>
<html>
<head>
<script>alert("test")</script>
<style>body{}</style>
</head>
<body>
<p>Content</p>
<!-- comment -->
</body>
</html>';
$parser = new DOMParser($html);
echo $parser->toHTML();
// Output: <P>Content</P>