Skip to main content

2 posts tagged with ".NET"

View All Tags

· 8 min read
Oleg Kulyk

How to parse HTML in .NET

HTML parsing is a vital part of web scraping, as it allows convert web page content to meaningful and structured data. Still, as HTML is a tree-structured format, it requires a proper tool for parsing, as it can't be property traversed using Regex.

This article will reveal the most popular .NET libraries for HTML parsing with their strong and weak parts.

· 5 min read
Oleg Kulyk

HTML Parsing Libraries - C#

Web sites are written using HTML, which means that each web page is a structured document. Sometimes the goal is to obtain some data from them and preserve the structure while we’re at it. Websites don’t always provide their data in comfortable formats such as CSV or JSON, so only the way to deal with it is to parse the HTML page.