Web Scraping

Web scraping is a technique used in any language to extract data from a website.

...see more

Html Agility Pack (HAP) is a common library used in .NET for web scraping. The HAP HTML parser is written in C# to read/write DOM and supports plain XPATH or XSLT. They have recently added the .NET Core version also for web scraping. 

Html Agility Pack website screenshot

...see more

Web scraping basically works like this :

  • Fetch the HTML page
  • Analyze the HTML
  • Get the data that matches class names, divs, or whatever you've specified
...see more

Load Html From Web

HtmlWeb.Load method gets an HTML document from an internet resource.

Example

The following example loads an Html from the web.

var html = @"http://html-agility-pack.net/";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(html);
var node = htmlDoc.DocumentNode.SelectSingleNode("//head/title");
Console.WriteLine("Node Name: " + node.Name + "\n" + node.OuterHtml);

Source: Load from Web Method in Html Agility Pack

For an end-to-end sample see Easily Do Web Scraping In .NET Core 6.0

Comments