Run CalicoST on a simulated data

Download the data

We applied CalicoST on a small simulated data provided by examples/simulated_example.tar.gz from the github, which contains the following files/directories:

simulated_example
- outs: simulated transcript count matrix and spatial coordinates
- snpinfo: parsed allele count matrix

Untar the data by

tar -xzvf <CalicoST git-cloned directory>/examples/simulated_example.tar.gz

Run CalicoST to infer CNAs and cancer clones assuming spots are purely tumor or purely normal

To run CalicoST, we first copy configuration_cna file provided by CalicoST github to the example directory.

cd simulated_example
cp <CalicoST git-cloned directory>/configuration_cna ./

Then we modify the following paths in the copied configuration_cna file:

spaceranger_dir is path to the outs directory of the downloaded data.
snp_dir is the path to the snp_info directory of the downloaded data.
output_dir is the output directory for CalicoST to write the inferred clones and CNAs. It must be an existing directory.

We keep the default values for other parameters in configuration_cna file, while refer to this parameter specification for more details if parameter tuning is needed for other samples.

Now we use CalicoST to infer clones and allele-specific CNAs by running the following command in terminal

OMP_NUM_THREADS=1 python <CalicoST git-cloned directory>/src/calicost/calicost_main.py -c configuration_cna

It takes about 2h to run on this simulated data. When finished, the CalicoST output directory <output_dir> will contain the following files:

<output_dir>/clone3_rectangle0_w1.0/clone_labels.tsv store the inferred cancer clones;
<output_dir>/clone3_rectangle0_w1.0/cnv_seglevel.tsv store the inferred allele-specific copy number profile per genomic segment;
<output_dir>/clone3_rectangle0_w1.0/cnv_genelevel.tsv store the inferred allele-specific copy numbers projected to expressed genes;
<output_dir>/clone3_rectangle0_w1.0/cnv_diploid*, calicost/clone3_rectangle0_w1.0/cnv_triploid*, calicost/clone3_rectangle0_w1.0/cnv_tetraploid* store an additional version of integer allele-specific copy numbers when enforcing the ploidy to be diploid, triploid, and tetraploid. Experienced users can decide which ploidy to use based on prior knowledge or based on the rdr-baf plots.
<output_dir>/clone3_rectangle0_w1.0/plots/ store the plots corresponding to the spatial organization of inferred cancer clones and allele-specific copy numbers along the genome for each clone.

Load the results of CalicoST

Load the inferred cancer clones <output_dir>/clone3_rectangle0_w1.0/clone_labels.tsv by pandas.

import numpy as np
import pandas as pd

output_dir = "."
df_clones = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df_clones

	clone_label
BARCODES
spot_0	2
spot_1	2
spot_2	2
spot_3	2
spot_4	2
...	...
spot_1795	1
spot_1796	1
spot_1797	1
spot_1798	1
spot_1799	1

1800 rows × 1 columns

Load the inferred allele-specific copy numbers for each genomic bin <output_dir>/clone3_rectangle0_w1.0/cnv_seglevel.tsv.

df_cna = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/cnv_seglevel.tsv", header=0, index_col=0, sep='\t')
df_cna

	START	END	clone0 A	clone0 B	clone1 A	clone1 B	clone2 A	clone2 B
CHR
1	1001138	1616548	1	1	1	1	1	1
1	1635227	2384877	1	1	1	1	1	1
1	2391775	6101016	1	1	1	1	1	1
1	6185020	6653223	1	1	1	1	1	1
1	6785454	7780639	1	1	1	1	1	1
...	...	...	...	...	...	...	...	...
22	43528744	45187923	1	1	1	1	1	1
22	45190338	45828198	1	1	1	1	1	1
22	46053869	46687116	1	1	1	1	1	1
22	46762617	50199494	1	1	1	1	1	1
22	50200979	50783663	1	1	1	1	1	1

1299 rows × 8 columns

Load the inferred allele-specific copy numbers for each gene <output_dir>/clone3_rectangle0_w1.0/cnv_genelevel.tsv.

df_cna = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/cnv_genelevel.tsv", header=0, index_col=0, sep='\t')
df_cna

	clone0 A	clone0 B	clone1 A	clone1 B	clone2 A	clone2 B
gene
ISG15	1	1	1	1	1	1
C1orf159	1	1	1	1	1	1
SDF4	1	1	1	1	1	1
UBE2J2	1	1	1	1	1	1
INTS11	1	1	1	1	1	1
...	...	...	...	...	...	...
CPT1B	1	1	1	1	1	1
CHKB	1	1	1	1	1	1
CHKB-DT	1	1	1	1	1	1
SHANK3	1	1	1	1	1	1
RABL2B	1	1	1	1	1	1

9000 rows × 6 columns

The plots generated by CalicoST are in PDF format and can be directly viewed. Below, we load the PDF plots in this notebook for easy visualization.

Firstly, <output_dir>/clone3_rectangle0_w1.0/plots/clone_spatial.pdf shows the inferred cancer clone in space.

from wand.image import Image as WImage
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/clone_spatial.pdf", resolution=100)
img

../../_images/8b307f1ad83e78040eb7a9c9a35b1e1fa1374491c041cb2e890ba2fff03b13b7.png

Secondly, <output_dir>/clone3_rectangle0_w1.0/plots/acn_genome.pdf shows the allele-specific copy numbers per clone along the genome. The color scheme follows

# allele-specific copy numbers of each clone (the color scheme is the same as Fig2c
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/acn_genome.pdf", resolution=120)
img

../../_images/a3b6f0898fdf04e1c7abd7e12ca115cea981f00bd3ebf0b05314d201446036d1.png

Thirdly, <output_dir>/clone3_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf shows RDR-BAF along the genome for each clone. Here, each color indicates a HMM state, while different colors may correspond to the same allele-specific copy numbers.

# RDR-BAF plot along the genome for each clone
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf", resolution=120)
img

../../_images/6dc9bac2d62c38f0ca938094fedeb678686b77ba138dd25527f4a16e69137668.png