Successfully added
Software Development
by Roman
Web Scraping
Web scraping is a technique used in any language to extract data from a website.
...see more
Html Agility Pack (HAP) is a common library used in .NET for web scraping. The HAP HTML parser is written in C# to read/write DOM and supports plain XPATH or XSLT. They have recently added the .NET Core version also for web scraping.
...see more
Web scraping basically works like this :
- Fetch the HTML page
- Analyze the HTML
- Get the data that matches class names, divs, or whatever you've specified
...see more
Load Html From Web
HtmlWeb.Load method gets an HTML document from an internet resource.
Example
The following example loads an Html from the web.
var html = @"http://html-agility-pack.net/"; HtmlWeb web = new HtmlWeb(); var htmlDoc = web.Load(html); var node = htmlDoc.DocumentNode.SelectSingleNode("//head/title"); Console.WriteLine("Node Name: " + node.Name + "\n" + node.OuterHtml);
Source: Load from Web Method in Html Agility Pack
For an end-to-end sample see Easily Do Web Scraping In .NET Core 6.0
...see more
Some additional articles and resources with information about Web Scraping
Referenced in:
Comments