To build the tracker blocking functionality in Better Blocker, we crawl the most popular sites on the Web using Better Inspector.
We examine what they’re loading to identify and block third-party trackers.
We save the information about the sites we’ve visted in a standard format called HAR (HTTP Archive format). Our archive is available for use by researchers (or anyone else who is interested) under a Creative Commons ShareAlike License.
The archive (~6,800 sites, ~400MB) is available in DAT format at:
(You can also browse this site using the same DAT URL on the peer-to-peer Web via Beaker Browser.)
If you’re having trouble with the DAT Desktop application or if you’re comfortable using the Terminal, you can also use the DAT CLI to get and stay synchronised with the archive (requires Node.js):
npm install -g dat
dat clone dat://archive.better.fyi better-http-archive