HTML Parser - Blogs

Are you looking for any HTML Parser?

Want to traverse through HTML DOM elements?

Want to read properties and its values of HTML element?

Want to add properties to HTML elements dynamically?

Want to modify HTML form at runtime?

JQuery is one of the best solution to do all these at client side. But what about to do the same on server side?

Here is one of the best, smarter, fastest and reliable solution: Html Agility Pack

Html Agility Pack is an open source .NET library which is very similar to working with XmlDocument class. All you need to do is inputting XPath to Html Agility Pack. You can download the library from CodePlex.

Sample HTML

Here I am going to show you few of the examples based on my sample HTML form as shown in above screenshot.

Loading HTML Form:

HtmlDocument class is used to load the HTML form as shown below:

//Load HTML Document into DOM
HtmlDocument htmDoc = new HtmlDocument();
string html = File.ReadAllText("E:\\TestSri.html");

Read HTML Node:

To read any HTML node/tag, you can use the XPath to that node. Then you will get the node in the form of HtmlNode object. HtmlNode class offers you the core feature of traversing to child nodes, reading properties of HTML tag, etc.

//Read HTML Node
//Get Title of HTML Document
HtmlNode titleNode = htmDoc.DocumentNode.SelectSingleNode("//title");
string title = titleNode.InnerText;

where htmDoc.DocumentNode will give the root node of HTML Form.


Read HTML Nodes with same type:

To read any HTML Node(s) from Agility Pack, all you need is to give XPath as shown in above example. But in this case instead of SelectSingleNode you can use SelectNodes to get nodes, which will return the collection of HTMLNodes.

//Read HTML Nodes
//Get all SPAN tags
HtmlNodeCollection spanCollection = htmDoc.DocumentNode.SelectNodes("//span");

//Get all Anchor tags
HtmlNodeCollection anchorCollection = htmDoc.DocumentNode.SelectNodes("//a");

//Get all DIV tags
HtmlNodeCollection divCollection = htmDoc.DocumentNode.SelectNodes("//div");

Find Node:

You can get any node from the HTML form based on its Id. For that we can use GetElementbyId() method of HTMLDocument class.

//Find Node
//Get the tag which is having id as dvLogin
HtmlNode ndLogin = htmDoc.GetElementbyId("dvLogin");
//Get the tag which is having id as guestName and find its text
HtmlNode ndGuestName = htmDoc.GetElementbyId("guestName");
string guestName = ndGuestName.InnerText;

 Read Attributes:

In order to read the attributes/properties of html node, first find HTML Node to which you want get them. HtmlNode class have attributes collection properties, choose the any attribute as you want from the collection which will give you HtmlAttribute object. HtmlAttribute class has the properties name and value to get needed data.

//Read Attributes
//Find URL of forgot password
HtmlNodeCollection anchorsCollection = htmDoc.DocumentNode.SelectNodes("//a");
string forgotURL = string.Empty;

foreach (HtmlNode node in anchorsCollection)
     if (node.InnerText.ToLower().Contains("forgot"))                
       HtmlAttribute attrib = node.Attributes["href"];                    
       forgotURL = attrib.Value;                

Add Attributes:

In order to add attributes/properties to any HTML Node, first find the HTML Node. Once you get the HtmlNode object, there are two ways to add the attributes. One is using HtmlAttribute class and second one is use the Add() methods on Attributes collection of HtmlNode class.

//Add Attributes
//Add the target property to forgot password anchor tag
HtmlNodeCollection anchorsCollection1 = htmDoc.DocumentNode.SelectNodes("//a");

foreach (HtmlNode node in anchorsCollection1)
   if (node.InnerText.ToLower().Contains("forgot"))
     node.Attributes.Add("target", "_blank");


Traverse thru HTML Document/Nodes:

As I describes above, HtmlNode class provides the core features of parser. The HtmlNode class has a property to get ChildNodes, which gives you the HtmlNodeCollection. On this collection you can traverse thru each node as you want.

//Traverse HTML Nodes
//Go thru dvLogin (div) tag and get the user name from loginName field
HtmlNode ndDivLogin = htmDoc.GetElementbyId("dvLogin");
HtmlNode ndUsername = ndDivLogin.ChildNodes.Single(node => node.Id == "loginName");
string userName = ndUsername.Attributes["value"].Value;

Save Modified HTML:

Once after modifying any HTML Form inside the parser you can save it wherever you want.

//Save modified HTML
htmDoc.Save("E:\\Test Sri1.html");

Not only these, you can perform very coolest operations inside this parser like inserting tags, removing tags, applying styles, etc.

Other well-known parsers: MsHtml

Hope this helps to who are looking for HTML Parser at server-side..!



Deploy .NET application on IIS using GitHub actions

In this blog I will be showing how to deploy a .net application on IIS server (self-hosted runner). ...

Read More >

What are Frames? How to handle frames in Selenium WebDriver with C#?

IFrame (FullForm: Inline Frame) is an HTML document that is included in another HTML document and is...

Read More >

What is Synchronization? Handling Synchronization in Selenium WebDriver using C#:

Synchronization meaning: when two or more components involved to perform any action, we expect these...

Read More >

Sending Test reports by Email using Office 365, Gmail

Wouldn’t it be great if Test Report are sent automatically across team as soon the Test Execut...

Read More >

Token Based Authentication for Web API's

Securing ASP.NET Web API using Custom Token Based AuthenticationProviding a security to the Web API&...

Read More >

Custom Control in WPF

File Upload Custom Control in WPFThis article is about Custom control in WPF, Custom controls are al...

Read More >

ASP.NET: Audit Trail Implementation using Entity Framework

When you are working with certain projects which involves the Customer records, you might need to tr...

Read More >

Create Restful service using Web API

What is Asp.Net Web API?Asp.Net Web API is a framework for building HTTP services that can be consum...

Read More >

Dependency Injection on SignalR


Read More >

Create custom project templates in Visual Studio

Visual studio installation comes with the various predefined project templates, and we can use one o...

Read More >


Try DevOpSmartBoard Ultimate complete Azure DevOps End-to end reporting tool

Sign Up

  • Recent
  • Popular
  • Tag
Monthly Archive

Contact Us
  • *
  • *