15% of all profit is donated to Heal Palestine and the Palestine Children's Relief Fund (PCRF)

Notes

Reading a tumor's origin from the transcriptome.

22 June 2026

A cancer is defined by where it started. That origin shapes the entire treatment plan, but metastases and cancers of unknown primary can hide it. The reassuring fact is that the answer is rarely truly gone, it is written in the tumor's gene expression. Cells from the prostate keep expressing prostate programs even when they've spread; lung keeps looking like lung.

The signal is real, and buried

Each RNA-seq profile is roughly 18,000 numbers, one per gene. A handful are famous markers, KLK3 for prostate, NKX2-1 for lung, ESR1 for breast. But the robust signal lives in the joint pattern across thousands of genes, tangled up with noise and correlation. Reading it reliably is the whole problem.

Why an ensemble, and why calibration

No single classifier is best across 25 very different tissues, so Provotics combines several and calibrates the result. Calibration is the unglamorous step that makes the output usable: it turns a raw score into a probability that means what it says, so a clinician or researcher can weigh an 80% differently from a 55%.

Knowing when not to answer

The most important behaviour is restraint. When a profile sits outside the training distribution, the right output is not a guess, it is a flag. Provotics abstains on those, which is what keeps a confident-but-wrong answer from ever reaching the report. The genes behind each call are surfaced too, so when the drivers line up with known biology, you can see the model is reading signal rather than memorizing.

← All notes