PHP HTML DOM Parser

The HTML DOM defines a standard way for accessing and manipulating HTML documents.
What we are going to do is to read a document using PHP Simple HTML DOM Parser and then manipulate it according to our needs

DOWNLOAD
Download The script form here and extract it

Now we need to create a PHP file to start using the script.
Firstly, include simple_html_dom.php in the php file

PHP Code:
<?php
include('simple_html_dom.php');
?>

Now we need to access a page and manipulate it according to our needs.
we will first get the contents of the page.

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
?>

Now your whole page is stored in $html.Yo can test it by echoing it

To access elements using this script we use foreach loop.

1.Accessing links

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('a') as $element)
echo $element->href . ' - '.$element->innerhtml;
?>

As you may have noticed in the script above we used find() function to find an element and so is its use.You may use other tags like img,h1,h2,title etc. on it to find them in a document.

$element->href – This gets the url the link is pointing to.It basically got the href attribute.

Note:You may replace href by any attribute you wish to get

$element->innerhtml – This gets the text between <a> and </a> tags

2.Accessing Images

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('img') as $element)
echo $element->src;
?>

Like in the example above you used find(‘img’) to find all the image tags and then you used $element->src to get the src attribute.Likewise you may get any attribute you want.

3.Getting elements with a particular id or class

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('#xyz') as $element)
echo $element;
?>

find(‘#xyz’) finds all the elements with id xyz.You can later use them anyway you want.Use .xyz if you want to find elements with class xyz

4.Getting Div with particular class

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('div.xyz') as $element)
echo $element;
?>

find(‘div.xyz’) finds all the div tags with class as xyz

5.Getting all elements that has a certain attribute

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('*[xyz]') as $element)
echo $element;
?>

find(‘*[xyz]’) does the task. * operator makes the script to gets all the element and then [xyz] finds the elements with xyz attribute .

6.Getting some elements with some attribute

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('abc[xyz]') as $element)
echo $element;
?>

find(‘abc[xyz]’) – in this you define the element you want in place of abc and xyz is the attribute you want to access

7.Getting some element with some attribute with some value

PHP Code:
<?php
include('simple_html_dom.php');
$html = file_get_html('http://yoururlhere');
foreach($html->find('abc[xyz=pqr]') as $element)
echo $element;
?>

All remains same but instead of just the attribute we write attribute =value . Here pqr is any attribute value

Similarly you may see this table for usage [from official website]
[Image: i92CD.jpg]

Comments

comments