DataSitr — Trust & Evidence

Riyadh primary + Dammam drill standby Customer traffic flipped from legacy edge to Alibaba ACK Riyadh on 4 May 2026; GCP Dammam drill standby passed a scoped exercise on 16 May 2026.

DNS A records for datasitr.com and api.datasitr.com cut over to ACK ingress 8.213.49.193 on 2026-05-04T01:12:50Z UTC. A 4-hour soak passed without regression. A separate disposable drill hostname validated DNS-level switching to GCP Dammam ingress 34.110.171.105, TLS continuity on standby.gcp.datasitr.com, GKE workload health, evidence capture, and rollback. The Dammam footprint is cost-controlled drill infrastructure; cross-cloud DB replication and auth failover are not yet active.

Control traceability 177 controls are tracked, with 148 code-test, 16 dated-evidence, 13 external-fact, and 0 pending substantiation classifications.

The public trust report was generated from the matrix on 2026-05-23T16:36:32Z and intentionally excludes file paths, line numbers, and reviewer-only mappings. "0 pending" covers substantiation-class assignment; the matrix separately flags 5 controls with active coverage gaps (2 PDPL) under controls_with_coverage_gap in control_matrix.json.

KMS + detector v8 scorecard Alibaba KMS startup bootstrap remains live; the public detector v8 scorecard is ready with all 8 release gates passing.

Keep key-custody claims bounded to startup bootstrap on the serving ACK image. Detector v8 is live on the May 20 ACK runtime baseline; detector readiness is still the published in-repo scorecard, not an external audit or production-wide coverage guarantee.

Boundary “Proven” means measured or drilled on the live pilot.

If a capability has not been exercised on the live pilot or cannot be reproduced from retained evidence, it stays outside the proof language.

Verification Public artifacts are sanitized; full mappings are request-gated.

Use the public trust report, control-matrix summary, detector scorecard JSON, and benchmark JSON for buyer-safe review. Qualified reviewers can request the signed bundle for control-level implementation, test, and evidence mappings.

Not implied Trust evidence stays tied to the surfaces actually exercised.

Where the evidence is narrower, the wording stays narrower: full-vault verification, HSM-backed custody, immutable-retention controls, and unplanned full-region failure tolerance remain explicit separate steps.

02

Live pilot evidence, dated

Start here for dated proof of a specific surface. Each row names the retained evidence and the citation to check. These rows are not an external audit, regulator approval, or HSM-custody claim.

Capability

Evidence

April 21 guarded rollout baseline

Current route proof. The May 4 customer-route cutover baseline is the current public ACK/API route proof; the 2026-05-16 scoped Dammam drill-standby exercise adds DNS/GKE/TLS evidence only. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and evidence/multi-region-drill/multi-region-warm-standby-20260516T220433Z/.

Control matrix and public trust report

Matrix totals. The public report summarizes 177 controls: 148 code-test, 16 dated-evidence, 13 external-fact, and 0 pending, with reviewer-only mappings removed. Citation: docs/generated/control_matrix.json and /trust-report.

PDPL citation integrity

Citation source. The in-repo SDAIA-published PDPL English text backs automated article-reference checks across code and docs, not an external legal opinion. Citation: scripts/validate_pdpl_citations.py.

Detector benchmark artifacts

Curated detector benchmark. The public detector v8 scorecard generated on 2026-05-18 is ready with all 8 gates passing, the public precision/recall artifact passes its curated snapshot, and the PII benchmark reports 31.48 ms p95 for the 1K-character English case. Citation: /benchmark, public-site/resources/detector_release_scorecard_latest.json, public-site/resources/detector_precision_recall_latest.json, and public-site/resources/pii_benchmark_latest.json.

Billing integrity

Billing chain. Billing events use SHA-256 hash-chain continuity and HMAC authentication for newer records, while the 10-year retention gate refuses in-retention deletion. Citation: docs/billing-integrity.md and tests/test_billing_integrity.py.

PII detection

Saudi PII coverage. English, Arabic, and Saudi-specific recognizers cover National ID, IBAN, phone, and related patterns; the repository-side v8 Arabic NER bundle adds FAC-label coverage while measured results stay on the benchmark page. Citation: saudivault/saudi_patterns.py, docs/detector-v8-release-notes-2026-05-19.md, and /benchmark.

Three-lane privacy routing

Policy routing. Green tokenizes before external routing, Amber pseudonymizes in-Kingdom, and Red keeps raw processing in-Kingdom according to tenant policy. Citation: saudivault/policy.py.

Encrypted vault

Vault encryption. Stored token originals use AES-256-GCM with per-tenant derived keys, and transit uses TLS 1.2/1.3. Citation: saudivault/vault.py and nginx/datasitr.conf.

Guarded deploy with rollback

Rollback drill. A forced public-smoke failure restored the prior deploy hash and health, with the proof note retained in-repo and raw drill logs retained separately. Citation: docs/rollback-drill-evidence-20260326.md.

Signed evidence export

Signed export. Sequenced processing records can export signed verification material for reviewer use; final immutable-retention posture remains separate. Citation: saudivault/compliance.py.

Off-host encrypted backup

Backup evidence. Dated pilot notes cover encrypted upload, download, and restore-drill verification. Treat freshness as an operator-verified date, not a standing guarantee. Citation: docs/backup-hardening-summary-20260330.md and docs/off-host-backup-evidence-20260324.md.

Isolated restore recovery

Isolated restore. The March 28 drill restored an encrypted backup and completed fresh logins in isolation, proving single-stack recoverability only. Citation: docs/backup-hardening-summary-20260330.md.

Monitoring and alerting

Monitoring proof. The pilot has dated operator evidence for health checks, metrics, log retention, and alert delivery, with freshness treated as a dated check. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and docs/ha-evidence-gate.md.

Shared-state scaling evidence

Scaled route evidence. Alibaba ACK has dated customer-route HA proof, and GCP Dammam has scoped DNS/GKE/TLS drill proof only. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and evidence/multi-region-drill/.

Auth survivability

Auth-path drill. The March 29 drill showed fresh login and authenticated processing during an intentional auth-path outage; the Dammam drill covers scoped standby routing only. Citation: docs/customer-security-one-pager.md.

Public restored-state cutover

Restored-state check. The March 29 rerun verified that the oldest and newest restored vault rows decrypt successfully under the restored environment; it does not prove that every vault row decrypts or imply full-vault verification. Citation: evidence/restore-drill-20260504T124300Z/ and docs/customer-security-one-pager.md.

Alternate public path

Alternate path. The March 29 drill showed the public path served through an alternate host under operator control; this row remains historical continuity evidence. Citation: docs/evaluator-packet-worker2.md and docs/customer-security-one-pager.md.

03

Regulatory standing, in writing

Each row is the exact phrasing the issuing authority has put in writing. We deliberately distinguish "registered" from "licensed" and "application in progress" from "awarded" — because the Saudi regulators do.

Registration

Status

National PDP Register #3260005651

Active. The owner is the registered Data Protection Officer for مؤسسة داتا ستر / Data Sitr Establishment under the Saudi Personal Data Protection Law (PDPL).

NDGP Data Services Provider Registration LR-25-000018

Registered as a data services and products provider on the National Data Governance Platform (NDGP); status "Complete" on the dashboard. NDMO has clarified in writing (2026-04-27) that this registration does NOT constitute a license — the licensing application window will open in an upcoming phase.

SDAIA AI Service Provider Accreditation AE-26-000237

Application In Progress with the Saudi Data and AI Authority (filed 2026-04-03). The accreditation has not been awarded; we will update this row when SDAIA issues the decision.

Commercial Registration 4030483372

Active under the Ministry of Commerce since 2022-08-31. Entity type: Establishment. Registered under the current name مؤسسة داتا ستر / Data Sitr Establishment.

Unified National Number 7030618388

700-prefix unified national number for the same Saudi establishment. It is not the Commercial Registration number.

Enforcement environment: SDAIA confirmed 48 PDPL enforcement decisions in January 2026 covering unlawful processing, weak security controls, and unconsented marketing — administrative fines up to SAR 5 million, doubled for repeat violations, with intentional sensitive-data violations carrying up to two years' imprisonment.

04

Available for buyer review

Assurance surfaces a buyer or security team can inspect immediately — no widening of the claims boundary required.

Control Traceability Matrix 177 controls are mapped into substantiation classes that buyers can inspect safely.

The public trust report shows the aggregate proof counts, while the full Ed25519-signed reviewer bundle remains available to qualified reviewers on request.

Public Trust Report A sanitized report now summarizes what the matrix proves without leaking file paths or line numbers.

Open the report at /trust-report or consume /resources/trust-report.json for automated review; the totals match the generated control-matrix JSON.

PDPL Citation Integrity The authoritative SDAIA-published PDPL English text is included in-repo as the citation source of truth.

A per-citation validator at scripts/validate_pdpl_citations.py enables automated audit of article references across the codebase.

Live Key Custody The May 4 customer-route cutover baseline (current) continues to bootstrap its startup master key through Alibaba KMS.

Keep that claim bounded to startup bootstrap on the serving ACK image. Tenant BYOK and HSM custody remain outside the current live boundary.

Edge WAF Cloud Armor blocks OWASP Top 10 attack patterns at the Dammam drill-standby ingress during exercises.

Pre-configured rule sets for SQL injection, cross-site scripting, local and remote file inclusion, and remote code execution are attached to the GKE Ingress backend. Blocked-request counts are visible to the operator on the Cloud Monitoring dashboard.

Uptime monitoring Cloud Monitoring uptime checks poll the Dammam drill-standby /healthz endpoint while the drill footprint is enabled.

An alert policy notifies the operator by email if the pass rate falls below 50% over a three-minute window. The current public status remains summarized at /status.

Vulnerability scan program OWASP ZAP, nuclei, and Trivy run on a documented cadence with registered-DPO review of findings.

The scan workflow lives at .github/workflows/security-scan.yml; the program is documented at docs/security/vulnerability-scan-program.md. Findings are stored under evidence/security-scans/ and reviewed by the registered DPO before any change of public posture.

Security questionnaire library A pre-compiled response library covers the categories enterprise procurement and vendor security teams ask about.

Library at docs/security/questionnaire-response-library.md spans company background, compliance posture, encryption and key management, access controls, incident response, subprocessors, data-subject rights, cross-border transfer, business continuity, audit access, and AI-specific routing details. Available to qualified buyers on request alongside the signed reviewer bundle.

Academy UX redesign The Academy dashboard redesign is a guarded-deploy UI improvement; it does not change processing API behavior.

The changelog records the cleaner training navigation, companion PDPL course surface, and accessibility polish. The change is dashboard presentation only: no routing, tokenization, provider, or lawful-basis API semantics change.

HA evidence freshness gate CI refuses to ship if the high-availability drill evidence is older than 168 hours.

Pinned drill artifacts in the deploy workflow enforce that the multi-region drill attestation is fresh. Stale evidence fails the gate and blocks the next deploy until a new drill is captured and signed with the published Ed25519 key.

Public reviewer artifacts — public trust report, control matrix summary, compliance reviewer pack, benchmark JSON artifacts, and the compliance summary page
Dashboard compliance tab — processing records, DPIA, audit summary, evidence pack, and compliance bundle with copy/download for procurement review
Dedicated regulator portal — read-only regulator access during evaluation by request, with cross-tenant processing records, SDAIA-shaped report builders, scoped signed-package generation for handoff artifacts, and a separate regulator access log

05

Where the boundaries live

One published list, one page, one source of truth. Procurement, security, and legal reviewers all read the same constraints from /compliance — by design.

The constraints are centralized so every reviewer sees the same wording. If the proof is narrow, the claim stays narrow.

Open the published constraints · Open reviewer pack · Open public trust report

06

Reproducible detector benchmarks

scripts/benchmark_pii.py generates every benchmark below from the in-repo detector against public or in-repo corpora. The detector v8 scorecard generated on 2026-05-18 reports ready with all 8 release gates passing; numbers below remain dated artifacts, not external-audit claims.

Corpus

Metric

Value

Wojood public Arabic NER (test split, 357 sentences, 17 entity types)

Headline-PII micro F1 / precision / recall

F1 0.9297 · P 0.9554 · R 0.9054

Saudi code-switched (Arabic ↔ English) corpus

Recall (research-corpora gate)

0.9884

Adversarial attacks v1 (800 cases: OCR decay, ZWJ injection, code-embedded, transposition, homoglyph, label attack)

must_detect recall / phantom-FP rate

1.0000 / 0.0500

Arabic literary negative corpus (205 cases of Classical Arabic poetry / scripture / scholarly text)

Clean rate (must NOT trigger structural PII)

1.0000 (205/205)

Frozen quality suite (12 internal eval packs, ~1,350 cases)

Required-suites pass rate

12/12 pass

English-1K p95 latency

Detection latency over 30 iterations

~57ms (target ≤ 75ms)

Per-type breakdown

Pipeline F1 by entity type on the same Wojood test split. WojoodNER 2024 winners report 0.91-0.92 fine-grained F1; we are within the academic SOTA range on the coarse-grained taxonomy.

Organization

0.96

Date / Time

0.96

URL

0.96

Geopolitical entity

0.93

Profession

0.92

Person

0.90

Location

0.86

Facility 12 gold spans only

0.59

Vertical line marks the WojoodNER 2024 academic SOTA threshold (0.91). Six of eight types meet or exceed it; FAC is sample-size noise (12 gold spans), and v8 adds FAC labels in the local Arabic NER bundle. Methodology: model + detector evolution documented in docs/ner-v3-card.md through docs/ner-v7-card.md and docs/detector-v8-release-notes-2026-05-19.md (v3 → v8).

To verify: clone the repository, run scripts/benchmark_pii.py and scripts/eval_wojood.py, and compare against the artifacts in docs/generated/. The full benchmark history is in docs/generated/pii_benchmark_history/.

Comparison to vanilla open-source

Same Wojood test split (357 sentences), same character-overlap scoring, three detectors:

Detector

F1

Precision

Recall

DataSitr v7b (full pipeline)

0.9014

0.8992

0.9037

Vanilla Microsoft Presidio

0.2150

0.2583

0.1841

Regex-only Saudi-structural baseline

0.0000

0.0000

0.0000

Uplift: DataSitr vs vanilla Presidio = +0.69 F1, +72 percentage points of recall on the same Arabic NER test. Regex-only is 0.00 because Wojood is Arabic prose without Saudi structural identifiers. The comparison artifact (with timestamps and verification commands) is at docs/generated/comparison_to_industry_2026-04-28.json.

07

Test coverage snapshot

Backend, dashboard, and production-build coverage. The verified snapshot is kept as a dated internal evidence note — the covered surfaces below stay current.

Surface

Current result

Backend tests

See current dated snapshot

Dashboard unit/integration

See current dated snapshot

Dashboard production build

Passing

The covered surfaces include PII detection, tokenization, vault encryption, pipeline orchestration, admin authorization, webhook delivery, monitor health, deploy/backup/restore scripts, and dashboard UI.

08

How to verify

Seven actions a buyer can take today, in order. Every link points to a published artifact or a live product surface — no NDAs, no screenshots.

Request a pilot API key and test detection and routing with their own representative data
Open the public trust report and compare /resources/trust-report.json totals against /resources/control_matrix.json
Review the compliance bundle in the dashboard (copy or download as JSON for internal review), or download branded evaluation PDFs from the resources page
Check the evidence pack sections for integrity, external evidence, and policy snapshot status
Request regulator-portal access when the evaluation requires cross-tenant evidence, SDAIA-shaped report builders, or scoped signed-package generation
Verify scoped signed packages using the published verification details rather than relying on screenshots or forwarded files alone
Ask about any published constraint on the compliance page — questions go to dpo@datasitr.com

Start evaluating · Enterprise · Status · Trust report · Contact

See it work on your data.

Evaluate →