How to create your own ad filters
Introduction
A filter is a set of filtration rules applied to specific content (banners, popups, etc). Adguard is supplied with a standard filter created by our company. It’s being constantly updated by our experts and we hope that it meets the needs of most of our users. Each Adguard application occasionally updates the standard filters to the latest available version.
At the same time, Adguard allows you to create your own filter using the same filtration rules that the standard filters are based on.
Main rules
The simplest type of rules is called “Main rules”. They are used to block requests to specific URL’s. Let’s take a look at a simple example that uses the address of a banner you want to block as a rule. Let it be http://example.com/ads/banner.gif. When the browser sends a request to this address, Adguard will intercept it and return an empty response. However, in most situations, banner addresses change from time to time. For instance, an address may look like http://example.com/ads/banner123.gif, where 123 is a random number. In this particular case, blocking by URL won’t work and you’ll need to use a more general filter, like http://example.com/ads/banner*.gif or maybe evenhttp://example.com/ads/*. In this case, “*” is an arbitrary set of characters (wildcard).
Note: avoid blocking too much. For instance, http://example.com/* will block all the pages of example.com.
Exception rules
Sometimes, you will notice that some of your rules that have been working just fine so far, are now blocking things that should not be blocked. You don’t want to delete the rule, but you also want to stop the occasional blocking of meaningful content.
This is when you can use exception rules – they let you define when a specific rule should be applied. For instance, if a rule called adv blockshttp://example.com/advice.html, you can add an exception rule of the following form: @@advice. Exceptions are written in the same way as all other rules, so you can use wildcard characters and additional configuration characters (more information about them will be provided in the “Advanced functions” section).
To let Adguard know that a rule is an exception, start it with @@.
Start/end matching
By default, Adguard processes each rule as if it has wildcard characters * at the beginning and the end. For instance, there is no difference betweenad and ad rules. Usually, it’s not a problem, but sometimes you may want to have an exact match either at the beginning or the end. For instance, you may want to block all Flash content, but if you add an swf rule, the entire http://example.com/swf/index.html address will also be blocked.
Solution: add a | character to the rule to indicate that the end of the address is located in a specific position. For example, swf| will blockhttp://example.com/annoyingflash.swf, but will not affect http://example.com/swf/index.html. Similarly,|http://baddomain.example/ will block http://baddomain.example/banner.gif, but nothttp://gooddomain.example/analyze?http://baddomain.example.
Sometimes, you need to block both http://domain.com and http://www.domain.com or http://adv.domain.com. This can be accomplished by adding two “|” characters to the beginning of the rule corresponding to the beginning of a domain name: ||example.com/banner.gif will block all these addresses, but will not affect http://gooddomain.com/banner.gif or http://gooddomain.com/analyze?http://domain.com/banner.gif.
Delimiters
You may occasionally need to use delimiters in your rules. For example, you can write a rule that blocks http://domain.com/, but nothttp://domain.com.ua/. Here you can use “^” as a pointer for one delimiter: http://domain.com^
.
A delimiter is any character except letters, numbers or one of the following characters: — . %, as well as the end of an address. In the example below, delimiters are highlighted in red:
http://domain.com/foo.bar?a=12&b=%D1%82%D0%B5%D1%81%D1%82.
This address can be blocked by the following rules: ^domain.com^ or ^%D1%82%D0%B5%D1%81%D1%82^ or ^foo.bar^.
Comments
Any rule starting with an exclamation sign contains a comment (marked in grey on the list of rules). Adguard will ignore this rule, so you are free to write anything you want here. For instance, you may want to provide a comment on a previous rule to describe what it does.
Advanced functions
The functions described in this section are mostly intended for advanced users. They extend the capabilities of the “main rules”, but you must have an understanding of the way your browser works in order to use them.
You can change the behavior of a “main” rule using additional parameters. The list of these parameters is appended to the end of the rule after a dollar sign ($) and its elements are delimited with commas. Example:
||domain.ru$match-case,third-party.
In this case, ||domain.com is a rule, while match-case and third-party are its parameters.
Parameter types:
- domain – restricts the area of application of a rule to a list of domains. A domain=example.com parameter denotes that the rule must be applied only to the pages of "example.com". You can specify multiple domains using “|” as a delimiter: when used with this parameter, thedomain=example.com|example.net rule will be applied only to the pages located on "example.com" or "example.net" domains. If a domain name starts with a “~” sign, the rule will not be applied to the pages of this domain. For instance, domain=~example.comdenotes that the rule will be applied to all domains, except for "example.com". A domain=example.com|~foo.example.com rule restricts the rule to "example.com", but excludes the "foo.example.com" subdomain.
- third-party – limitation on third-party and own requests. If the third-party parameter is used, the rule is applied only to requests coming from external sources. Similarly, ~third-party restricts the rule to requests from the same source that the page comes from. Let’s use an example. The ||domain.com$third-party rule is applied to all sites, except domain.com itself. If we rewrite it as||domain.com$~third-party, it will be applied only to domain.com, but will not work on other sites.
- match-case – defines a rule applied only to addresses with exact letter case matches. For example, */BannerAd.gif$match-case will block http://example.com/BannerAd.gif, but not http://example.com/bannerad.gif. By default, the letter case is not matched.
- elemhide – it makes sense to use this parameter for exceptions only. It prohibits element hiding rules on pages affected by the current rule. Element hiding rules will be described below.
- content – it makes sense to use this parameter for exceptions only. It prohibits HTML filtration rules on pages affected by the current rule. HTML filtration rules will be described below.
- document – it makes sense to use this parameter for exceptions only. It prohibits both element hiding and HTML filtration rules.
- jscript – it makes sense to use this parameter for exceptions only. It prohibits the filtration of javascript files loaded to pages affected by the current rule.
- jsinject – it makes sense to use this parameter for exceptions only. It prohibits the injection of javascript code to web pages. Javascript code is added for blocking banners by size and for the proper operation of Adguard Assistant.
- urlblock – it makes sense to use this parameter for exceptions only. It prohibits the blocking of requests from pages affected by the current rule.
If protection is enabled and a certain web page is loaded incorrectly or something is not working on it, try creating an exception rule for it that will look like ||domain.com$content,elemhide,jsinject,urlblock. In case the problem is related to content filtration, it should go away.
Element hiding rules
Sometimes, you will come across some ads that cannot be removed because they were added to the body of a web page as text. If you view the source code of such a page, you will find something like that:
The first ad element is placed within a div element of the "textad" class. The following rule will hide this combination: ##div.textad. In this example, "##" is a sign of a hiding rule, while the rest is a selector defining the element to be hidden. Elements can be similarly hidden by their idattribute – for instance ##div#sponsorad will hide the second slogan. You don’t really need to specify the name of the element – the###sponsorad rule will also work. You can also hide elements by their names. For instance, ##textad will hide the third slogan.
Note: element hiding rules are completely different from regular rules. For instance, familiar wildcard characters are not supported — they have a different meaning and application principles.
IMPORTANT: you need basic HTML and CSS knowledge to use element hiding rules. Essentially, hiding rules are CSS selectors. Adguard adds its own styles to web pages that define element hiding rules. A {display:none!important} style is added to all CSS selectors.
Domain restrictions
As a rule, you want to hide specific ads on specific sites and don’t want these rules to work on the pages of other sites. For instance, ##.sponsormay hide useful code on some sites. But if you change it to look like example.com##.sponsor, it will be used on http://example.com/ andhttp://something.example.com/, but won’t work on http://example.org/.
You can also specify multiple domains – just list them using commas as delimiters:domain1.example,domain2.example,domain3.example##.sponsor.
If a domain name is preceded by a "~" character, the rule will not be applied to the pages of this domain. Examples: ~example.com##.sponsorwill be applied to the pages of all sites except for “example.com”. example.com,~foo.example.com##.sponsor allows you to apply the rule to "example.com", but excludes the "foo.example.com" subdomain.
HTML filtration rules
In most situations, the rules described above will suffice. However, you may occasionally need to change the HTML code of web pages to filter out ads. This is accomplished with the help of HTML filtration rules. They allow you to specify what HTML elements should be removed from a page before it is sent to the browser.
Let’s take a look at this code sample.
<script type="text/javascript"> document.write('<div>Buy pizza <a href="http://bestpizzaeverad.com">Here!</a></div>" />'); </script>
In this case, the ad slogan does not appear on the page immediately, but only after the page is loaded. The slogan is shown using Javascript. We can’t do anything here using the first two types of rules, but we can use HTML filtration rules to solve our problem. To get rid of this ad, we need to remove the entire script element from the code. However, not all script elements can be cut out, since the site itself may need them. Let’s use the following rule:
$$script[type="text/javascript"][tag-content="bestpizzaeverad.com"]
Here is what this rule does: it removes all script elements with the type attribute equal to text/javascript and containing the textbestpizzaeverad.com.
However, this is a fairly general rule, so if you start applying it to all sites, it may create all sorts of problems. To avoid this, you can restrict the rule to specific sites only. Domain-based restriction is set in the same way we set it for element hiding rules. This is why, for instance,example.com,~foo.example.com$$script[type="text/javascript"][tag-content="bestpizzaeverad.com"] will only be applied to example.com, but not to its subdomain, foo.example.com.
Additional attributes for HTML filtration
HTML filtration rules can be extended with the help of additional attributes. Below is a list of these elements.
- loaded-script – allows the user to apply the rule to loaded scripts, and not script elements only. It may come in handy in some situations, since ad scripts are stored on sites in an encrypted form. An example of such a rule: $$script[tag-content="crypted script"][loaded-script="true"]. It’s important to know that rules applied to loaded scripts should contain no other attributes except the following ones: tag-content, loaded-script, max-length, min-length.
- max-length – sets the maximum length of HTML content. If this parameter is defined and the content exceeds this limit, the rule is not applied to the element.
- min-length – sets the minimal length of HTML content. If this parameter is defined and the content length is below this limit, the rule is not applied to the element.
- parent-elements – a very important attribute that seriously affects the way this rule works. This attribute tells the program to remove the set parent element instead of the one it found. Let’s take a look at an example:
<table style="background: url('http://domain.com/banner.gif')"> <tr> <td> <a href="http://bestpizzaeverad.com">Buy my pizza FAST</a> </td> </tr> </table>
The problem with this piece of HTML code is that we can’t just remove the ad link from here. The banner itself is shown using a parent table (it’s used as its background). And that’s where we can use the parent-elements attribute. Let’s use the following rule to block the entire table: $$a[href="bestpizzaeverad.com"][parent-elements="table"]. When Adguard finds a link to “bestpizzaeverad.com” on the page, it will try to find the closest parent table element and remove it instead of removing the link. You can specify several parent elements to look for (use commas as delimiters). The closest one will be blocked.
- parent-search-level – Sets the maximum depth of search for a parent element. By default, this parameter is 3. This is done to avoid removing more elements than necessary if the page is modified. Do not set this parameter too high.
Good luck in creating your own ad filters!
We wish you luck in creating your own ad filters. If you want your filter rules to help other Adguard users — we invite you to take part in Adguard Experimental Filter developement. Click here to learn more.