1. Crawling
We launch a headless Chromium browser (via Playwright) with a desktop User-Agent and a 1366×800 viewport. We then load the URL you provided and wait for the page to reach networkidle for up to six seconds.
To improve coverage we also try to scan one additional internal page. We pick the most “interesting” same-domain link we can find, typically checkout, contact, pricing, signup or login, because those pages often load payment, support and conversion-tracking scripts that the front page does not.
Free scans load the page once and record what the browser does without any interaction — no logins, no form submissions, no consent-banner clicks. Paid plans automatically run consent-aware scanning: we re-open the site in a clean browser context, click an “Accept all” button on the consent banner, then revisit the entry page plus up to four of the baseline subpages in the same context (so the consent cookies and localStorage flags persist). The report shows a side-by-side delta of vendors, cookies and third-party domains that only load after consent.
The consent clicker searches both the main page and likely CMP iframes (Sourcepoint, OneTrust, Cookiebot, Didomi, TrustArc, Quantcast, Usercentrics, Iubenda, Klaro, Osano, Borlabs, Complianz) using a curated list of vendor selectors and accept-all button labels in ~15 languages. When deterministic selectors miss, an optional LLM fallback (gpt-4o-mini, cached per host) identifies the accept button from a pruned list of visible elements. Against an 18-site smoke test of major European publishers, the current hit rate is around 89%.
2. Recording requests
For every network request the browser makes we record the URL, the host and the resource type. A request is classified as third-party when its registrable domain is different from the registrable domain of the scanned site.
We use a small last-two-labels heuristic with a list of common two-part TLDs (.co.uk, .com.au, .co.jp, …) to determine the registrable domain.
3. Vendor matching
We maintain a curated database of 607 third-party vendors: browse the directory. Each vendor entry contains one or more domain patterns.
A request matches a vendor when:
- its hostname equals the pattern, or
- its hostname ends with
.+ pattern (suffix match), or - for the few non-domain patterns we use, the full URL contains the pattern.
When a request matches multiple patterns we pick the most specific (longest) one. Domains that don’t match any vendor are listed as unmatched.
4. Region classification
Each vendor is classified by ownership region. The classification is based on where the parent company is incorporated, not where data physically resides. A US-owned vendor with EU data centres is still classified as US because data-access requests (FISA 702, Cloud Act) are governed by ownership.
5. The EU Independence Score
The score is an experimental signal, not a compliance rating. It starts at 100 and three penalty components are subtracted:
Score = 100 − P_vendor − P_mix − P_unknown
When significantly non-EU vendors (US, China, Global) make up a large share of the classified vendor stack, an additional penalty applies — up to −20. The penalty is scaled by sample size so a single finding on a short scan does not over-fire.
Example: 3 US vendors, 1 EU vendor = 75% non-EU ratio → P_mix ≈ 15
Label guardrails
Labels are not derived purely from the numeric score. Hard rules prevent misleading labels even when the score is numerically high. For example, Mostly EU independent is blocked if non-EU vendors outnumber EU/EEA vendors, regardless of the score.
The score card in your report shows a full breakdown so you can see exactly how each component contributed, along with a confidence indicator.
6. What we don't do
- We don’t determine GDPR or DSA compliance.
- We don’t click consent banners on free scans (paid plans run an automatic post-consent pass).
- We don’t crawl the entire site by default (front page + one internal link on free, up to 5 pages on Pro, up to 20 on Agency).
- We don’t log in to authenticated areas or fill out forms.
- We don’t store IP addresses in plaintext; they are salted-hashed daily.
7. Limitations
Geographic bias also matters: some vendors serve different scripts based on the visitor’s country. We currently scan from a European IP, so results approximate what European visitors see.
The methodology evolves as we improve coverage. If you find a wrong classification, please let us know.