Privacy-First Hotel Capture Measurement Plan

Section 01

VPN Detection and True Geolocation Recovery

Detecting whether a visitor is masking location via VPN, proxy, or Tor is foundational for non-local attendee classification. Techniques range from simple IP lookups to deeper network forensics, each with tradeoffs in accuracy, latency, and privacy.

Commercial VPN detection

95–99%

Accuracy for known VPN providers via IP reputation databases

Tor exit nodes

~100%

Detection rate via public exit node lists

Residential proxies

<60%

The critical blind spot for all detection services

VPN usage growth

+41%

Increase in global VPN adoption 2023 to 2025

IP Reputation and Data Centre Detection

The most reliable VPN detection method is querying continuously updated IP classification databases that categorise addresses as residential, business, hosting/data centre, VPN, proxy, or Tor exit node. Providers like IPQualityScore, MaxMind, Spur.us, IPinfo, and IP2Location maintain these datasets by enumerating commercial VPN endpoints, mapping cloud-provider ASN ranges (AWS AS16509, DigitalOcean AS14061, OVH AS16276), and ingesting abuse intelligence.

IP-to-ASN mapping extends this by classifying each IP's Autonomous System. Hosting-provider ASNs strongly indicate data-centre traffic where VPN endpoints usually run, and can be evaluated at the CDN edge with sub-millisecond latency using self-hosted MMDB data.

MTU/TTL Fingerprinting

VPN encapsulation leaves measurable artefacts at the network layer. VPN tunnelling reduces effective MTU below standard 1500-byte Ethernet MTU, and TCP MSS values expose this. Default TTL values also vary by OS, so a claimed Windows client with Linux-like TTL behaviour can indicate proxy or VPN mediation.

Network-Layer Fingerprints

SNITCH research (NDSS MADWeb 2025) demonstrated 89.1% accuracy in detecting VPN-tunnelled connections by comparing observed round-trip times against expected RTTs for claimed IP geolocation, with strong precision and recall at scale.

VPN Protocol	Typical MTU	MSS Value	Detection Reliability
WireGuard (IPv4)	~1440	~1400	High
OpenVPN (UDP)	~1400–1409	~1360–1369	High
IPsec/IKEv2	~1380–1438	~1340–1398	Medium
No VPN (Ethernet)	1500	1460	Baseline

WebRTC and DNS Leaks

WebRTC Leak Detection

Creating an RTCPeerConnection with a STUN server and inspecting ICE candidates can reveal mismatches between connection IP and candidate addresses. Browser mitigations are reducing this signal over time, but it remains a useful secondary check.

DNS Leak Detection

Resolver-level mismatch analysis can indicate that DNS traffic is bypassing the VPN path. DNS-over-HTTPS adoption has reduced practical leak rates, so this is best used as enrichment rather than a primary classifier.

Timezone and Locale Mismatches

Comparing IP-derived timezone against browser-reported timezone via Intl.DateTimeFormat().resolvedOptions().timeZone is simple and universally supported, requiring no permission.

Combined with navigator.language and Accept-Language header analysis, this passive approach narrows likely location with high confidence and no permission prompts.

HotelMap Implementation Note

Timezone comparison is identified as the single most valuable passive signal for non-local detection. It requires no consent and works across all major browsers.

True Location Recovery Behind VPN

Technique	Resolution	Accuracy	Permission	Spoofability
HTML5 Geolocation API	Street-level (1–100m)	Very High	Required	Low
WiFi positioning	10–50m urban	Very High	Required	Low
Browser timezone	Country/region	High	No	Medium
Language/locale	Country	Good	No	Medium
Accept-Language header	Country	Good	No	Low
WebRTC IP leak	City	Declining	No	N/A

Commercial VPN Detection Services

Provider	VPN Detection	City Accuracy	Starting Price	Strength
IPQualityScore	99.9% claimed	~85%	Free (200/day)	Fraud scoring, residential proxy detection
Spur.us	60M+ anonymous IPs	~80%	Free (1M/month)	VPN provider attribution by name
MaxMind GeoIP2	Good (not specialized)	80–90%	~$30/month	Industry standard, self-hosted MMDB
IPinfo	5 boolean privacy flags	~85%	$49/month	Daily updates via Probe Network
IP2Location	Good	~75%	$99/year	Best value, broad coverage

Recommended Stack

MaxMind GeoIP2 + Spur.us or IPQS

$100–500/month for moderate traffic volumes. Combines industry-standard geolocation accuracy with specialised VPN provider attribution.

Section 02

Cookieless User Identification

Beyond cookies, a rich set of identifiers exists to match repeat visitors. These range from client-side browser fingerprints to server-side protocol signals and first-party identity anchors from registration.

Browser Fingerprinting Techniques

Technique	Entropy	Method
Canvas Fingerprinting	8–10 bits	Draws text/shapes on invisible canvas, hashes pixel output via toDataURL(). GPU hardware and driver versions produce unique outputs.
WebGL Fingerprinting	~99% unique	Reads UNMASKED_VENDOR_WEBGL and UNMASKED_RENDERER_WEBGL strings.
AudioContext Fingerprinting	3–5 bits	Generates signals via OscillatorNode, processes through DynamicsCompressorNode.
Font Enumeration	13–15+ bits	JavaScript-based measurement against fallback fonts to detect installed typefaces.

Server-Side Fingerprinting

Server-side techniques are the most durable identification layer because browser extensions cannot directly spoof lower-layer protocol characteristics.

JA4 TLS Fingerprinting

JA4 TLS fingerprinting (developed by FoxIO in 2023) sorts cipher suites and extensions before hashing, making it resilient to TLS extension randomisation in Chrome 110+ and Firefox 114+.

HTTP/2 SETTINGS Frame Fingerprinting

HTTP/2 SETTINGS fingerprints use hardcoded browser values: Chrome uses INITIAL_WINDOW_SIZE: 6291456 and MAX_CONCURRENT_STREAMS: 1000, while Firefox uses INITIAL_WINDOW_SIZE: 131072 and omits MAX_CONCURRENT_STREAMS.

Cross-Layer Consistency Checking

The strongest pattern is cross-layer inconsistency: if User-Agent, TCP fingerprint, TLS fingerprint, and HTTP/2 behaviour disagree, interception or spoofing is likely.

First-Party Data Approaches

Login-based identity is the gold standard for deterministic matching. Event registration serves as the identity anchor with a persistent user ID.

Email Hashing

Email hashing via SHA-256 of lowercased, trimmed email enables cross-platform matching without sharing raw PII. This is the standard used by LiveRamp, The Trade Desk (UID2.0), and Facebook Custom Audiences.

Storage Mechanisms

localStorage, IndexedDB, and Service Worker state are useful first-party persistence layers, but are increasingly partitioned or time-limited by modern browser privacy controls.

CNAME Cloaking

CNAME cloaking maps first-party subdomains to tracker infrastructure. Defences vary by browser, and this approach is increasingly constrained by anti-tracking systems.

Fingerprint Uniqueness and Stability

EFF's Panopticlick found 83.6% of fingerprints unique among privacy enthusiasts (470K users), while “Hiding in the Crowd” found only 33.6% unique among 2M general-population users, and ~18.5% on mobile devices.

Fingerprint stability averages approximately 1.8 weeks per browser (PETS 2020 longitudinal study). Eckersley's fingerprint evolution algorithm correctly linked evolved fingerprints in 99.1% of cases with 0.87% false positive rate.

Commercial Fingerprinting Solutions

Solution	Accuracy	Scale	Starting Price	Best For
Fingerprint Pro	99.5% (claimed)	4B+ devices, 50M+ daily events	$99/month (20K calls)	Web fingerprinting, incognito detection
ThreatMetrix	Enterprise-grade	78B+ data records	Enterprise pricing	Full fraud prevention, behavioral biometrics
Arkose Labs	125+ risk signals	Enterprise	Enterprise pricing	Progressive fingerprinting, challenge-response

Recommended

Fingerprint Pro at $99/month

Best balance of accuracy, integration ease, and cost for HotelMap's scale. Offers best-in-class incognito detection and straightforward SDK integration.

Section 03

The Browser Privacy Landscape in 2026

Chrome and Privacy Sandbox

Google reversed on third-party cookie deprecation in July 2024, offering a user-choice model instead. Users manage preferences through Chrome settings where third-party cookies remain enabled by default.

On 17 October 2025, Google retired most Privacy Sandbox advertising APIs: Topics API, Protected Audience API (FLEDGE), Attribution Reporting API, IP Protection, and Private Aggregation — citing low adoption. Three technologies survived: CHIPS, FedCM, and Private State Tokens.

Safari 26 (September 2025)

Advanced Fingerprinting Protection (AFP) blocks known fingerprinting scripts and Google Tag Manager in Private Browsing. Third-party cookies have been blocked since 2019. JavaScript-set first-party cookies and script-writable storage are capped at 7 days without user interaction.

iCloud Private Relay uses a two-hop relay: Apple sees the user's IP but not the destination; Cloudflare/Akamai sees the destination but not the IP. Covers only Safari traffic and DNS queries.

Firefox 145 (November 2025)

Canvas readback returns randomized data. Font enumeration is blocked in favour of standard OS fonts. Hardware details are normalized, and WebGL/Audio API outputs are randomized. These protections are active in ETP Strict mode and Private Browsing. The combined effect reduced uniquely identifiable users by nearly 50% (from 65% to ~35% in testing).

What Still Works Across All Browsers in 2026

Technique	Chrome	Safari	Firefox	Brave
Canvas fingerprinting	Works	AFP blocks	Noise	Randomized
WebGL renderer string	Full detail	Masked	Grouped	Randomized
AudioContext	Works	Noise (Private)	Randomized (Strict)	Varies
Font enumeration	Works	Limited	Blocked (Strict)	Randomized
Third-party cookies	Default on	Blocked	Partitioned	Blocked
localStorage / IndexedDB	Partitioned	7-day ITP	Partitioned	Partitioned
JA4 TLS fingerprint	Server-side	Server-side	Server-side	Server-side
HTTP/2 SETTINGS	Server-side	Server-side	Server-side	Server-side
Timezone / locale	Works	Works	Works	Works

Key Pattern

Server-side signals (JA4, HTTP/2 SETTINGS, TCP/IP fingerprinting) remain reliable across all browsers. Client-side fingerprinting is most effective on Chrome (66.8% global market share) and significantly degraded on Safari, Firefox, and Brave.

Section 04

Event Attendee Tracking for HotelMap

Detecting Non-Local Registrants

The key question is whether a registrant lives far enough from the venue to require accommodation. The strongest implementation combines multiple weighted signals:

Registration address and zip code (weight 0.5)

Gold standard when available. Geocoding APIs convert address data to coordinates for Haversine distance scoring with very high reliability.

IP geolocation at registration (weight 0.3)

Fast passive signal for initial classification. Country accuracy is high; city-level precision varies by provider, mobile network, and VPN usage.

Browser timezone comparison (weight 0.2)

Strong tiebreaker signal for non-local inference using browser timezone versus venue timezone. Works without permissions and across major browsers.

Distance Band	Classification
0–50 miles (0–80 km)	Local
50–150 miles	Likely non-local
150+ miles	Definitely non-local

Recommended thresholds vary by event type. For multi-day conferences, 50 miles / 80 km is often the practical cutoff. A tiered model is most actionable: 0–50 miles local, 50–150 likely non-local, 150+ definitely non-local.

Cross-Session Identity Maintenance

Tracking registrants across weeks between registration and event requires a layered approach that does not rely on third-party cookies. A five-stage pipeline works best: registration anchor, URL handoff, email token re-identification, first-party cookie fallback, then probabilistic fingerprinting as last resort.

Registration

Capture reg_id and email_hash

URL Handoff

Pass reg_id and event_id

Email Re-ID

Tokenized link match

Cookie Fallback

hm_visitor_id session merge

Fingerprint

Probabilistic last resort

Server-side identity resolution — priority order from deterministic to probabilistic

URL Parameter Passing

Critical for the registration-to-hotel-booking handoff. When an attendee clicks from the registration confirmation to the hotel booking page, appending ?reg_id=XXX&email_hash=YYY&event_id=ZZZ deterministically links sessions. This is the most reliable cross-session bridge because it requires no cookie persistence.

Email-Based Re-Identification

Every post-registration email contains a personalised link with a unique token (hotelmap.com/book?token=abc123) that maps to the registrant's server-side profile. Each email click re-establishes identity with zero cookie dependency. This is also the most privacy-compliant cross-session method.

Server-Side Identity Graph

Stitches all signals: {registrant_id, email_hash, cookie_id, events[], sessions[]}. When any identifier matches on a subsequent visit, sessions merge. Redis (or equivalent) provides real-time lookups.

Capture Rate Benchmarks and Optimisation

Capture rate is the share of non-local attendees who book through the official housing system versus booking direct, via OTAs, or via other channels. Kalibri Labs/PCMA Foundation research analysed 2M+ records across major event markets:

Metric	Value
Official system bookings	48% of non-local attendees
In-block, wrong channel	23% stayed at block hotels but booked elsewhere
Price perception gap	39% believed official rates more expensive
Cart abandonment email open rate	70%

The headline constraint is perception: many attendees believe direct booking is cheaper even when event rates are stronger. Transparent comparisons and clear value framing are usually more impactful than adding tracking complexity.

Price Perception Problem

The biggest lever for improving capture rate is communication, not tracking. Presenting transparent rate comparisons addresses the perception that independent booking is cheaper. Attendees decide within the first few seconds on a booking page.

Retargeting Sequence for Unbooked Attendees

Email retargeting follows a tested cadence that materially outperforms general campaign traffic. Hotel cart abandonment emails achieve 70% open rates, 10% click-through rates, and $3.65 revenue per recipient. Retargeted ads deliver 180% higher click-through rates and 300% higher conversion rates versus first-time visitors.

Immediately post-registration — Confirmation + hotel CTA

Highest-conversion moment. Integrate booking as next step in the flow.

24–48 hours later — 'Complete your stay' reminder

Feature top 3 hotels by proximity and value.

1 week post-registration — Rate urgency

Real-time availability messaging: 'rooms filling up.'

2 weeks before event — 'Last chance for group rate'

Countdown timer and social proof.

1 week before event — Final reminder

Remaining inventory urgency messaging.

Day of cutoff — Urgency close

'Book today or lose the group rate.'

Full-Funnel Analytics Architecture

The tracking architecture implements server-side event stitching using reg_id as the primary key across all systems. Four events form the complete funnel from registration to booking, with derived metrics calculated at reporting time.

GA4 event schema — reg_id stitches all events server-side

Section 05

Legal Compliance Architecture

IP addresses are personal data under GDPR (confirmed by CJEU rulings). EDPB Guidelines 2/2023 (October 2024) clarified that ePrivacy Article 5(3) applies to device fingerprinting, tracking pixels, IP-only tracking, and URL tracking, meaning many fingerprinting patterns require consent unless strictly necessary.

Tier 1: No Additional Consent Needed

Legitimate interest for service delivery and fraud prevention covers the following techniques:

✓

IP reputation database lookups

Legitimate interest for service delivery and fraud prevention.

✓

ASN and data centre classification

Passive server-side classification with no device storage access.

✓

Timezone and language mismatch detection

Simple passive signal with no permission requirement.

✓

HTTP header analysis and server-side TLS/HTTP/2 fingerprinting

Passive, server-side techniques not accessing device storage.

Tier 2: Consent Recommended or Required

Terminal equipment access requires consent in most EU/UK contexts:

WebRTC leak detection

Accesses terminal equipment; consent recommended under ePrivacy.

DNS leak detection

May involve terminal-equipment access depending on implementation details.

Canvas and WebGL fingerprinting

UK ICO explicitly labelled fingerprinting for advertising as irresponsible (December 2024).

Device fingerprinting for cross-session tracking

Consent required in most EU/UK jurisdictions.

Tier 3: Explicit Consent Always Required

HTML5 Geolocation API

Browser-enforced permission popup; always requires explicit, informed consent.

GPS access

Explicit consent required at OS level.

Persistent cross-site tracking

Explicit consent always required regardless of jurisdiction.

Data Controller Relationships

Role	Entity	Responsibility
Data Controller	Event organizer	Determines purpose and means of processing. Registration forms must include clear privacy notices with separate opt-in checkboxes.
Data Processor	HotelMap	Processes under a Data Processing Agreement (DPA). Pre-checked boxes are invalid under GDPR.
Separate Data Controller	Hotels	Act independently post-booking for their own processing purposes.

Data Minimisation Strategy

Progressive reduction: raw IP becomes city-level location becomes distance calculation becomes binary is_non_local flag. Raw IP need not be stored beyond initial processing. Email hashing (SHA-256) provides cross-system identifier. Retention limited to 6–12 months after event with automated DSAR handling and deletion cascades.

Primary Recommendation

Build on email-tokenized links, not fingerprinting

Email-tokenized links and URL parameter passing are more reliable than fingerprinting and require no consent beyond registration itself. Server-set first-party cookies provide a fallback; device fingerprinting serves as probabilistic last resort with appropriate consent. This system operates within GDPR/ePrivacy compliance using a tiered consent model.

Section 06