{ "cells": [ { "cell_type": "markdown", "id": "5a234ce5-cbe3-4431-87be-7a1d7490c6fd", "metadata": {}, "source": [ "# Run CalicoST on prostate cancer dataset" ] }, { "cell_type": "markdown", "id": "297f7e52-56c0-4e8d-92fa-40b961acdfbd", "metadata": {}, "source": [ "## Obtain the data" ] }, { "cell_type": "markdown", "id": "28f5b861-7bf8-4e99-8cae-d00c06db7d14", "metadata": {}, "source": [ "We obtained the spatially resolved transcriptomics of a prostate cross section studied by [Erickon et al.](https://www.nature.com/articles/s41586-022-05023-2) from EGA using accession [EGAD00001008644](https://ega-archive.org/datasets/EGAD00001008644). We ran spaceranger to get the BAM files and applied CalicoST to study the CNAs and spatial evolution of cancer across multiple spatial regions.\n" ] }, { "cell_type": "markdown", "id": "a1ea5b15", "metadata": {}, "source": [ "## Compute allele counts by preprocessing: genotyping and reference-based phasing" ] }, { "cell_type": "markdown", "id": "f75e4a79", "metadata": {}, "source": [ "**Step 1: Download SNP and phasing panels**\n", "\n", "Download the following files to your machine.\n", "* [SNP panel](https://sourceforge.net/projects/cellsnp/files/SNPlist/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf.gz) - 0.5GB in size. You can also choose other SNP panels from [cellsnp-lite webpage](https://cellsnp-lite.readthedocs.io/en/latest/main/data.html#data-list-of-common-snps).\n", "* [Phasing panel](http://pklab.med.harvard.edu/teng/data/1000G_hg38.zip)- 9.0GB in size. Unzip the panel after downloading." ] }, { "cell_type": "markdown", "id": "b94f9dec", "metadata": {}, "source": [ "**Step 2: Make a table for BAM files and slice IDs to jointly genotype and phase**\n", "\n", "In order to jointly genotype and phase across multiple slices, CalicoST requires a `bamlist.tsv` file that specifies the BAM file paths, slice IDs, and spaceranger output directories. It **must** be tab-deliminated, **without** header, and contain the following columns **in order**, otherwise the pipeline will report an error.\n", "\n", "| BAM file location | slide ID | spaceranger out directory location |\n", "| ----------------- | -------- | ---------------------------------- |\n", "\n", "Below is the `bamlist.tsv` file we used for the prostate cancer data.\n", "
| \n", " | Tumor | \n", "
|---|---|
| BARCODES | \n", "\n", " |
| AAACAAGTATCTCCCA-1_H12 | \n", "0.050000 | \n", "
| AAACAGGGTCTATATT-1_H12 | \n", "NaN | \n", "
| AAACATTTCCCGGATT-1_H12 | \n", "0.050000 | \n", "
| AAACCGGGTAGGTACC-1_H12 | \n", "0.825997 | \n", "
| AAACCGTTCGTCCAGG-1_H12 | \n", "0.940555 | \n", "
| ... | \n", "... | \n", "
| TTGTTCAGTGTGCTAC-1_H25 | \n", "0.050000 | \n", "
| TTGTTGTGTGTCAAGA-1_H25 | \n", "0.188937 | \n", "
| TTGTTTCACATCCAGG-1_H25 | \n", "0.956014 | \n", "
| TTGTTTCATTAGTCTA-1_H25 | \n", "0.838851 | \n", "
| TTGTTTCCATACAACT-1_H25 | \n", "0.364009 | \n", "
13344 rows × 1 columns
\n", "| \n", " | clone_label | \n", "tumor_proportion | \n", "
|---|---|---|
| BARCODES | \n", "\n", " | \n", " |
| AAACAAGTATCTCCCA-1_H12 | \n", "3 | \n", "0.050000 | \n", "
| AAACAGGGTCTATATT-1_H12 | \n", "3 | \n", "NaN | \n", "
| AAACATTTCCCGGATT-1_H12 | \n", "3 | \n", "0.050000 | \n", "
| AAACCGGGTAGGTACC-1_H12 | \n", "3 | \n", "0.825997 | \n", "
| AAACCGTTCGTCCAGG-1_H12 | \n", "3 | \n", "0.940555 | \n", "
| ... | \n", "... | \n", "... | \n", "
| TTGTTCAGTGTGCTAC-1_H25 | \n", "2 | \n", "0.050000 | \n", "
| TTGTTGTGTGTCAAGA-1_H25 | \n", "2 | \n", "0.188937 | \n", "
| TTGTTTCACATCCAGG-1_H25 | \n", "1 | \n", "0.956014 | \n", "
| TTGTTTCATTAGTCTA-1_H25 | \n", "1 | \n", "0.838851 | \n", "
| TTGTTTCCATACAACT-1_H25 | \n", "2 | \n", "0.364009 | \n", "
13344 rows × 2 columns
\n", "| \n", " | CHR | \n", "START | \n", "END | \n", "clone1 A | \n", "clone1 B | \n", "clone2 A | \n", "clone2 B | \n", "clone3 A | \n", "clone3 B | \n", "clone4 A | \n", "clone4 B | \n", "clone5 A | \n", "clone5 B | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "89295 | \n", "1419136 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 1 | \n", "1 | \n", "1434861 | \n", "1440568 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2 | \n", "1 | \n", "1449689 | \n", "1496123 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 3 | \n", "1 | \n", "1512162 | \n", "1721078 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 4 | \n", "1 | \n", "1724838 | \n", "2308568 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 2255 | \n", "22 | \n", "46360834 | \n", "48898361 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2256 | \n", "22 | \n", "49773283 | \n", "49963978 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2257 | \n", "22 | \n", "50089879 | \n", "50180213 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2258 | \n", "22 | \n", "50185915 | \n", "50292030 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2259 | \n", "22 | \n", "50309030 | \n", "50783625 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
2260 rows × 13 columns
\n", "| \n", " | gene | \n", "clone1 A | \n", "clone1 B | \n", "clone2 A | \n", "clone2 B | \n", "clone3 A | \n", "clone3 B | \n", "clone4 A | \n", "clone4 B | \n", "clone5 A | \n", "clone5 B | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "AL627309.1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 1 | \n", "AL627309.5 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 2 | \n", "LINC01409 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 3 | \n", "LINC01128 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 4 | \n", "LINC00115 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 15035 | \n", "CHKB-DT | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 15036 | \n", "MAPK8IP2 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 15037 | \n", "ARSA | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 15038 | \n", "SHANK3 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 15039 | \n", "RABL2B | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "1 | \n", "
15040 rows × 11 columns
\n", "| \n", " | in_tissue | \n", "array_row | \n", "array_col | \n", "pxl_row_in_fullres | \n", "pxl_col_in_fullres | \n", "slice_id | \n", "clone_label | \n", "tumor_proportion | \n", "
|---|---|---|---|---|---|---|---|---|
| barcode | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| TCCTTCAGTGGTCGAA-1_H12 | \n", "1 | \n", "15 | \n", "69 | \n", "3366 | \n", "6308 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "
| GCGTCGAAATGTCGGT-1_H12 | \n", "1 | \n", "17 | \n", "65 | \n", "3618 | \n", "6018 | \n", "H12 | \n", "clone3 | \n", "0.132391 | \n", "
| AACTGATATTAGGCCT-1_H12 | \n", "1 | \n", "16 | \n", "66 | \n", "3492 | \n", "6090 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "
| CGAGCTGGGCTTTAGG-1_H12 | \n", "1 | \n", "17 | \n", "67 | \n", "3618 | \n", "6163 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "
| GGGTGTTTCAGCTATG-1_H12 | \n", "1 | \n", "16 | \n", "68 | \n", "3492 | \n", "6236 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| ATGGCCCGAAAGGTTA-1_H25 | \n", "1 | \n", "76 | \n", "120 | \n", "13480 | \n", "12001 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "
| CGTAATATGGCCCTTG-1_H25 | \n", "1 | \n", "77 | \n", "121 | \n", "13632 | \n", "12089 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "
| AGAGTCTTAATGAAAG-1_H25 | \n", "1 | \n", "76 | \n", "122 | \n", "13480 | \n", "12176 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "
| ATTGAATTCCCTGTAG-1_H25 | \n", "1 | \n", "76 | \n", "124 | \n", "13479 | \n", "12351 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "
| TTGAAGTGCATCTACA-1_H25 | \n", "1 | \n", "77 | \n", "127 | \n", "13630 | \n", "12615 | \n", "H25 | \n", "clone2 | \n", "NaN | \n", "
13344 rows × 8 columns
\n", "| \n", " | in_tissue | \n", "array_row | \n", "array_col | \n", "pxl_row_in_fullres | \n", "pxl_col_in_fullres | \n", "slice_id | \n", "clone_label | \n", "tumor_proportion | \n", "final_x | \n", "final_y | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| barcode | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| TCCTTCAGTGGTCGAA-1_H12 | \n", "1 | \n", "15 | \n", "69 | \n", "3366 | \n", "6308 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "238.417652 | \n", "74.069483 | \n", "
| GCGTCGAAATGTCGGT-1_H12 | \n", "1 | \n", "17 | \n", "65 | \n", "3618 | \n", "6018 | \n", "H12 | \n", "clone3 | \n", "0.132391 | \n", "241.246079 | \n", "69.170504 | \n", "
| AACTGATATTAGGCCT-1_H12 | \n", "1 | \n", "16 | \n", "66 | \n", "3492 | \n", "6090 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "239.573047 | \n", "70.654068 | \n", "
| CGAGCTGGGCTTTAGG-1_H12 | \n", "1 | \n", "17 | \n", "67 | \n", "3618 | \n", "6163 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "241.763717 | \n", "71.102355 | \n", "
| GGGTGTTTCAGCTATG-1_H12 | \n", "1 | \n", "16 | \n", "68 | \n", "3492 | \n", "6236 | \n", "H12 | \n", "clone3 | \n", "0.050000 | \n", "240.090685 | \n", "72.585919 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| ATGGCCCGAAAGGTTA-1_H25 | \n", "1 | \n", "76 | \n", "120 | \n", "13480 | \n", "12001 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "700.743520 | \n", "183.090182 | \n", "
| CGTAATATGGCCCTTG-1_H25 | \n", "1 | \n", "77 | \n", "121 | \n", "13632 | \n", "12089 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "701.914026 | \n", "181.184948 | \n", "
| AGAGTCTTAATGAAAG-1_H25 | \n", "1 | \n", "76 | \n", "122 | \n", "13480 | \n", "12176 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "702.735909 | \n", "183.264493 | \n", "
| ATTGAATTCCCTGTAG-1_H25 | \n", "1 | \n", "76 | \n", "124 | \n", "13479 | \n", "12351 | \n", "H25 | \n", "clone2 | \n", "1.000000 | \n", "704.728299 | \n", "183.438805 | \n", "
| TTGAAGTGCATCTACA-1_H25 | \n", "1 | \n", "77 | \n", "127 | \n", "13630 | \n", "12615 | \n", "H25 | \n", "clone2 | \n", "NaN | \n", "707.891194 | \n", "181.707882 | \n", "
13344 rows × 10 columns
\n", "