Model card · Provotics

Research and educational use only. Provenance-1 is not a medical device. It has not been reviewed or cleared by any regulator, is not CLIA or CAP validated, and must not be used to make decisions about a real patient. Every output is an illustrative artifact of a model trained on retrospective public cohorts, not a medical finding. A confident site prediction is not a cancer diagnosis.

Overview

Provenance-1 reads one bulk tumor RNA-seq expression profile and estimates the body site the tumor came from, across 25 anatomical sites. It is built to be honest about uncertainty: every call carries a calibrated probability, a candidate set that lets the model abstain when the evidence is ambiguous, and an out-of-distribution check that flags inputs unlike anything it was trained on.

At a glance

Model: Provenance-1 (whole-body tumor site classifier)
Task: Predict 1 of 25 anatomical sites of origin from tumor RNA-seq
Input: One bulk RNA-seq expression profile (gene-level)
Output: A calibrated site of origin, a 90% conformal candidate set, and a novelty flag
Gene panel: A fixed 3,882-gene panel
Training data: 17,410 retrospective public tumor profiles (see Training data)
Held-out accuracy: macro-F1 0.908 on real patients; 89.5% on 381 independent tumors
Status: Research phase, invite-only access
Use: Research and educational only, not a medical device

Reflects the model deployed as of June 2026. Numbers trace to the internal model card; see the Validation page for methodology.

Intended use, and what is out of scope

What it is for

Research and education: exploring what a tumor's transcriptome reveals about its tissue of origin, studying calibrated uncertainty, and generating hypotheses on cohorts you already have. It reads expression values only, never identifiable patient data.

Out of scope

Not for clinical, diagnostic, prognostic, or treatment-selection use. It predicts an anatomical site, not a histological diagnosis, stage, or grade. Inputs from other assays (single-cell, microarray, targeted panels), other normalizations, or non-tumor tissue are out of distribution and should not be trusted.

How it reads a profile

A profile is harmonized to a common reference so different sequencing pipelines are comparable, mapped onto the fixed 3,882-gene panel, and scored by a calibrated ensemble. The raw scores are then turned into a probability you can act on, a conformal candidate set, and a novelty check. The methodology is documented in Docs; the internals are private.

Evaluation

We lead with the number measured on real held-out patients, not the prettiest one. A higher figure exists on a mixed test set that includes easier external samples; we do not quote it as the headline.

0.908macro-F1 on real held-out patients (balanced accuracy 0.911), scored evenly across all 25 sites

89.5%on 381 fully independent tumors from 7 studies, different patients, centres, and sequencing pipelines

0.091 → 0.011calibration error after temperature scaling, about an 8x reduction, so a stated confidence means what it says

90%conformal coverage target; about 90% of cases resolve to a single confident call, the rest return a candidate set or abstain

Training data

A pool of 17,410 retrospective tumor profiles from open genomic cohorts: primarily GDC (including TCGA), augmented with deduplicated pediatric samples from Treehouse, and with independent cohorts from cBioPortal added so the model generalizes across sequencing platforms. All cohorts are public and retrospective, and contain expression values, not protected health information. The exact source breakdown within the pool is not enumerated in a single artifact, and the pool figure should not be read as a single held-out test set.

Limitations

These are stated with confidence because the validation was deliberately adversarial. Read them as part of the model, not a disclaimer.

It does not recognize real mesothelioma

On real mesothelioma cases the model scores 0% recall and confidently misroutes them. An earlier per-site figure for Pleura and Mediastinum turned out to reflect one external batch's signature, not the biology, and we corrected it. Treat any Pleura and Mediastinum output as unreliable.

Rare sites are data-starved (a data ceiling, not a model ceiling)

Pleura and Mediastinum, Thymus, Esophagus, Skin, and Eye have very few examples (on the order of seventeen each in the relevant held-out evaluation), so their per-site metrics are statistically fragile. These tissues are close to the entire public universe of their kind, so the data is exhausted at source, and stronger models do not move the number.

Cross-platform inputs are harder

A single tumor sequenced on a different pipeline is the hardest case. Rather than guess, the model abstains on most of those and commits only when it is confident. Inputs that skip the harmonization step will be misclassified.

The uncertainty guarantees are in-distribution only

Calibration and conformal coverage are measured on a held-out split from the same distribution. They do not hold under platform or batch shift. The novelty and input-validity checks exist precisely because of this, and are themselves reference-only, not a validated clinical detector.

No fairness audit, no clinical validation

All evaluation is retrospective on public cohorts. There is no prospective study, no independent clinical-site validation, and no subgroup-equity audit. The training cohorts are not characterized here for demographic balance, and performance on underrepresented groups is unmeasured and may be worse.

Scope: the 25 sites

Adrenal GlandBladder & UrinaryBlood & Bone MarrowBrain & CNSBreastCervixColorectalEsophagusEyeHead & NeckKidneyLiver & BiliaryLungLymph NodeOvaryPancreasPleura & MediastinumProstateSkinSoft Tissue & BoneStomachTestisThymusThyroidUterus

Access

Provenance-1 is invite-only during the research phase and granted under a confidentiality agreement. See how it is validated, read the safety page, then apply for access.