DataSitr is a Saudi-hosted AI privacy gateway with public evidence for its current operating boundary. Customer traffic runs on Alibaba ACK Riyadh, the 4-hour May 4 soak passed, and the GCP Dammam drill-standby exercise proved DNS/GKE/TLS routing only. Cross-cloud DB replication, auth failover, regulator approval, HSM custody, and full-region tolerance are not yet claimed.
If a capability has not been exercised on the live pilot or cannot be reproduced from retained evidence, it stays outside the proof language.
Use the public trust report, control-matrix summary, detector scorecard JSON, and benchmark JSON for buyer-safe review. Qualified reviewers can request the signed bundle for control-level implementation, test, and evidence mappings.
Where the evidence is narrower, the wording stays narrower: full-vault verification, HSM-backed custody, immutable-retention controls, and unplanned full-region failure tolerance remain explicit separate steps.
Start here for dated proof of a specific surface. Each row names the retained evidence and the citation to check. These rows are not an external audit, regulator approval, or HSM-custody claim.
Current route proof. The May 4 customer-route cutover baseline is the current public ACK/API route proof; the 2026-05-16 scoped Dammam drill-standby exercise adds DNS/GKE/TLS evidence only. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and evidence/multi-region-drill/multi-region-warm-standby-20260516T220433Z/.
Matrix totals. The public report summarizes 177 controls: 148 code-test, 16 dated-evidence, 13 external-fact, and 0 pending, with reviewer-only mappings removed. Citation: docs/generated/control_matrix.json and /trust-report.
Citation source. The in-repo SDAIA-published PDPL English text backs automated article-reference checks across code and docs, not an external legal opinion. Citation: scripts/validate_pdpl_citations.py.
Curated detector benchmark. The public detector v8 scorecard generated on 2026-05-18 is ready with all 8 gates passing, the public precision/recall artifact passes its curated snapshot, and the PII benchmark reports 31.48 ms p95 for the 1K-character English case. Citation: /benchmark, public-site/resources/detector_release_scorecard_latest.json, public-site/resources/detector_precision_recall_latest.json, and public-site/resources/pii_benchmark_latest.json.
Billing chain. Billing events use SHA-256 hash-chain continuity and HMAC authentication for newer records, while the 10-year retention gate refuses in-retention deletion. Citation: docs/billing-integrity.md and tests/test_billing_integrity.py.
Saudi PII coverage. English, Arabic, and Saudi-specific recognizers cover National ID, IBAN, phone, and related patterns; the repository-side v8 Arabic NER bundle adds FAC-label coverage while measured results stay on the benchmark page. Citation: saudivault/saudi_patterns.py, docs/detector-v8-release-notes-2026-05-19.md, and /benchmark.
Policy routing. Green tokenizes before external routing, Amber pseudonymizes in-Kingdom, and Red keeps raw processing in-Kingdom according to tenant policy. Citation: saudivault/policy.py.
Vault encryption. Stored token originals use AES-256-GCM with per-tenant derived keys, and transit uses TLS 1.2/1.3. Citation: saudivault/vault.py and nginx/datasitr.conf.
Rollback drill. A forced public-smoke failure restored the prior deploy hash and health, with the proof note retained in-repo and raw drill logs retained separately. Citation: docs/rollback-drill-evidence-20260326.md.
Signed export. Sequenced processing records can export signed verification material for reviewer use; final immutable-retention posture remains separate. Citation: saudivault/compliance.py.
Backup evidence. Dated pilot notes cover encrypted upload, download, and restore-drill verification. Treat freshness as an operator-verified date, not a standing guarantee. Citation: docs/backup-hardening-summary-20260330.md and docs/off-host-backup-evidence-20260324.md.
Isolated restore. The March 28 drill restored an encrypted backup and completed fresh logins in isolation, proving single-stack recoverability only. Citation: docs/backup-hardening-summary-20260330.md.
Monitoring proof. The pilot has dated operator evidence for health checks, metrics, log retention, and alert delivery, with freshness treated as a dated check. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and docs/ha-evidence-gate.md.
Scaled route evidence. Alibaba ACK has dated customer-route HA proof, and GCP Dammam has scoped DNS/GKE/TLS drill proof only. Citation: evidence/ha/alibaba-live-2026-05-04T01:17:03Z/ and evidence/multi-region-drill/.
Auth-path drill. The March 29 drill showed fresh login and authenticated processing during an intentional auth-path outage; the Dammam drill covers scoped standby routing only. Citation: docs/customer-security-one-pager.md.
Restored-state check. The March 29 rerun verified that the oldest and newest restored vault rows decrypt successfully under the restored environment; it does not prove that every vault row decrypts or imply full-vault verification. Citation: evidence/restore-drill-20260504T124300Z/ and docs/customer-security-one-pager.md.
Alternate path. The March 29 drill showed the public path served through an alternate host under operator control; this row remains historical continuity evidence. Citation: docs/evaluator-packet-worker2.md and docs/customer-security-one-pager.md.
Each row is the exact phrasing the issuing authority has put in writing. We deliberately distinguish "registered" from "licensed" and "application in progress" from "awarded" — because the Saudi regulators do.
Active. The owner is the registered Data Protection Officer for مؤسسة داتا ستر / Data Sitr Establishment under the Saudi Personal Data Protection Law (PDPL).
Registered as a data services and products provider on the National Data Governance Platform (NDGP); status "Complete" on the dashboard. NDMO has clarified in writing (2026-04-27) that this registration does NOT constitute a license — the licensing application window will open in an upcoming phase.
Application In Progress with the Saudi Data and AI Authority (filed 2026-04-03). The accreditation has not been awarded; we will update this row when SDAIA issues the decision.
Active under the Ministry of Commerce since 2022-08-31. Entity type: Establishment. Registered under the current name مؤسسة داتا ستر / Data Sitr Establishment.
700-prefix unified national number for the same Saudi establishment. It is not the Commercial Registration number.
Enforcement environment: SDAIA confirmed 48 PDPL enforcement decisions in January 2026 covering unlawful processing, weak security controls, and unconsented marketing — administrative fines up to SAR 5 million, doubled for repeat violations, with intentional sensitive-data violations carrying up to two years' imprisonment.
Assurance surfaces a buyer or security team can inspect immediately — no widening of the claims boundary required.
The public trust report shows the aggregate proof counts, while the full Ed25519-signed reviewer bundle remains available to qualified reviewers on request.
Open the report at /trust-report or consume /resources/trust-report.json for automated review; the totals match the generated control-matrix JSON.
A per-citation validator at scripts/validate_pdpl_citations.py enables automated audit of article references across the codebase.
Keep that claim bounded to startup bootstrap on the serving ACK image. Tenant BYOK and HSM custody remain outside the current live boundary.
Pre-configured rule sets for SQL injection, cross-site scripting, local and remote file inclusion, and remote code execution are attached to the GKE Ingress backend. Blocked-request counts are visible to the operator on the Cloud Monitoring dashboard.
An alert policy notifies the operator by email if the pass rate falls below 50% over a three-minute window. The current public status remains summarized at /status.
The scan workflow lives at .github/workflows/security-scan.yml; the program is documented at docs/security/vulnerability-scan-program.md. Findings are stored under evidence/security-scans/ and reviewed by the registered DPO before any change of public posture.
Library at docs/security/questionnaire-response-library.md spans company background, compliance posture, encryption and key management, access controls, incident response, subprocessors, data-subject rights, cross-border transfer, business continuity, audit access, and AI-specific routing details. Available to qualified buyers on request alongside the signed reviewer bundle.
The changelog records the cleaner training navigation, companion PDPL course surface, and accessibility polish. The change is dashboard presentation only: no routing, tokenization, provider, or lawful-basis API semantics change.
Pinned drill artifacts in the deploy workflow enforce that the multi-region drill attestation is fresh. Stale evidence fails the gate and blocks the next deploy until a new drill is captured and signed with the published Ed25519 key.
One published list, one page, one source of truth. Procurement, security, and legal reviewers all read the same constraints from /compliance — by design.
The constraints are centralized so every reviewer sees the same wording. If the proof is narrow, the claim stays narrow.
scripts/benchmark_pii.py generates every benchmark below from the in-repo detector against public or in-repo corpora. The detector v8 scorecard generated on 2026-05-18 reports ready with all 8 release gates passing; numbers below remain dated artifacts, not external-audit claims.
Headline-PII micro F1 / precision / recall
F1 0.9297 · P 0.9554 · R 0.9054
Recall (research-corpora gate)
0.9884
must_detect recall / phantom-FP rate
1.0000 / 0.0500
Clean rate (must NOT trigger structural PII)
1.0000 (205/205)
Required-suites pass rate
12/12 pass
Detection latency over 30 iterations
~57ms (target ≤ 75ms)
Pipeline F1 by entity type on the same Wojood test split. WojoodNER 2024 winners report 0.91-0.92 fine-grained F1; we are within the academic SOTA range on the coarse-grained taxonomy.
Vertical line marks the WojoodNER 2024 academic SOTA threshold (0.91). Six of eight types meet or exceed it; FAC is sample-size noise (12 gold spans), and v8 adds FAC labels in the local Arabic NER bundle. Methodology: model + detector evolution documented in docs/ner-v3-card.md through docs/ner-v7-card.md and docs/detector-v8-release-notes-2026-05-19.md (v3 → v8).
To verify: clone the repository, run scripts/benchmark_pii.py and scripts/eval_wojood.py, and compare against the artifacts in docs/generated/. The full benchmark history is in docs/generated/pii_benchmark_history/.
Same Wojood test split (357 sentences), same character-overlap scoring, three detectors:
0.9014
0.8992
0.9037
0.2150
0.2583
0.1841
0.0000
0.0000
0.0000
Uplift: DataSitr vs vanilla Presidio = +0.69 F1, +72 percentage points of recall on the same Arabic NER test. Regex-only is 0.00 because Wojood is Arabic prose without Saudi structural identifiers. The comparison artifact (with timestamps and verification commands) is at docs/generated/comparison_to_industry_2026-04-28.json.
Backend, dashboard, and production-build coverage. The verified snapshot is kept as a dated internal evidence note — the covered surfaces below stay current.
See current dated snapshot
See current dated snapshot
Passing
The covered surfaces include PII detection, tokenization, vault encryption, pipeline orchestration, admin authorization, webhook delivery, monitor health, deploy/backup/restore scripts, and dashboard UI.
Seven actions a buyer can take today, in order. Every link points to a published artifact or a live product surface — no NDAs, no screenshots.
dpo@datasitr.com