FizzlerEx is a JQuery/CSS3-selectors implementation for .NET, based on HtmlAgilityPack and the original Fizzler project.

 

using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;

var web = new HtmlWeb();
var document = web.Load("http://example.com/page.html")
var page = document.DocumentNode;

foreach(var item in page.QuerySelectorAll("div.item"))
{
    var title = item.QuerySelector("h3:not(.share)").InnerText;
    var date = DateTime.Parse(item.QuerySelector("span:eq(2)").InnerText);
    var description = item.QuerySelector("span:has(b)").InnerHtml;
}

 

Supported selectors

FizzlerEx additions (CSS/JQuery)

Selector Description
[attr!='value']
Elements with attribute not equal to value (or without attribute)
:has(b)
Elements that contain an element that matches the sub-expression
:not(.class)
Elements that do not match the specified sub-expression
:contains('text')
Elements whose InnerText contains the specified text
:eq(n)
Selects the n-th matched element (zero based)

FizzlerEx additions (non-standard)

Selector Description
b:select-parent
Selects the parent(s) of the matched node(s)
div[attr='']
Elements without the specified attribute (or with empty value)
div[attr%='[0-9]*'] Elements whose attr attribute matches the specified regex
span:matches('ab?') Elements whose inner text matches the specified regex
/div
Performs the initial selection at the top level of the search context instead of the descendant nodes.
For example, node.QuerySelector("/:select-parent") == node.ParentNode.
Without the slash, the result would be "the parent of the first descendant", probably not what you want.
body:split-after(hr) Groups the children of <body> into a pseudo-element every time a <hr> is found.
Each <hr> will be the first child of its own group.
Nodes before the first <hr> will be ignored.
Note that the sub-selector (hr) must only match direct children of the context node.
You may want to use body:split-after(/* > hr) to force this behavior (see the previous selector)
body:split-before(hr) Similar to the previous one, except that every <hr> will be the last of its own group.
Nodes after the last <hr> will be ignored.
body:split-between(hr) Similar to the previous one, except that only content between two <hr>s will be included. <hr>s themselves won't be part of the groups.
body:split-all(hr) Similar to the previous one, except that content before the first <hr> and after the last <hr> will be included too.
.main:before(hr) Selects the children of .main preceding the first <hr> child, and groups them into a single pseudo-element (<hr> is excluded).
.main:after(h1) Selects the children of .main following the first <h1> child, and groups them into a single pseudo-element (<h1> is excluded).
.main:between(h1; hr) Selects the children of .main between the first <h1> child and the first following <hr> (possibly the same element), grouping them into a single pseudo-element. <h1> and <hr> are not part of the group. Note the semicolon ( ; ) used to separate the two parameters.
:last Selects the last matched element

Standard selectors (from original Fizzler project)

Selector Description
* All elements
div Elements with the specified tag name
#id Elements with the specified id
.class Elements with the specified class
[attr] Elements with the specified attribute defined
[attr='value'] Elements with the specified attribute name and value
[attr~='word'] Attribute includes the specified word (whitespace-separated)
[attr!='prefix'] Attribute is either equal to 'prefix' or starts with 'prefix' followed by a hyphen (-).
[attr^='prefix'] Attribute starts with 'prefix'
[attr$='suffix'] Attribute ends with 'suffix'
[attr*='search'] Attribute contains 'search'
:first-child Elements that are the first child of their parent
:last-child Elements that are the last child of their parent
:nth-child(n) Elements that are the n-th child of their parent (1-based)
:nth-last-child(n) Elements that are the nth-last-child of their parent (1-based)
:only-child Elements that are the only child of their parent
:empty Elements that have no children
div > p Selects the children of the matched elements
div p Selects the descendant of the matched elements
prev + next Selects all next elements matching "next" that are immediately preceded by a sibling "prev"
prev ~ siblings Selects all sibling elements that follow after the "prev" element, have the same parent, and match the filtering "siblings" selector.

Last edited May 13, 2012 at 9:19 AM by antiufo, version 13