Run CalicoST on a simulated data

Download the data

We applied CalicoST on a small simulated data provided by examples/simulated_example.tar.gz from the github, which contains the following files/directories:

  • simulated_example

    • outs: simulated transcript count matrix and spatial coordinates

    • snpinfo: parsed allele count matrix

Untar the data by

tar -xzvf <CalicoST git-cloned directory>/examples/simulated_example.tar.gz

Run CalicoST to infer CNAs and cancer clones assuming spots are purely tumor or purely normal

To run CalicoST, we first copy configuration_cna file provided by CalicoST github to the example directory.

cd simulated_example
cp <CalicoST git-cloned directory>/configuration_cna ./

Then we modify the following paths in the copied configuration_cna file:

  • spaceranger_dir is path to the outs directory of the downloaded data.

  • snp_dir is the path to the snp_info directory of the downloaded data.

  • output_dir is the output directory for CalicoST to write the inferred clones and CNAs. It must be an existing directory.

We keep the default values for other parameters in configuration_cna file, while refer to this parameter specification for more details if parameter tuning is needed for other samples.

Now we use CalicoST to infer clones and allele-specific CNAs by running the following command in terminal

OMP_NUM_THREADS=1 python <CalicoST git-cloned directory>/src/calicost/calicost_main.py -c configuration_cna

It takes about 2h to run on this simulated data. When finished, the CalicoST output directory <output_dir> will contain the following files:

  • <output_dir>/clone3_rectangle0_w1.0/clone_labels.tsv store the inferred cancer clones;

  • <output_dir>/clone3_rectangle0_w1.0/cnv_seglevel.tsv store the inferred allele-specific copy number profile per genomic segment;

  • <output_dir>/clone3_rectangle0_w1.0/cnv_genelevel.tsv store the inferred allele-specific copy numbers projected to expressed genes;

  • <output_dir>/clone3_rectangle0_w1.0/cnv_diploid*, calicost/clone3_rectangle0_w1.0/cnv_triploid*, calicost/clone3_rectangle0_w1.0/cnv_tetraploid* store an additional version of integer allele-specific copy numbers when enforcing the ploidy to be diploid, triploid, and tetraploid. Experienced users can decide which ploidy to use based on prior knowledge or based on the rdr-baf plots.

  • <output_dir>/clone3_rectangle0_w1.0/plots/ store the plots corresponding to the spatial organization of inferred cancer clones and allele-specific copy numbers along the genome for each clone.

Load the results of CalicoST

Load the inferred cancer clones <output_dir>/clone3_rectangle0_w1.0/clone_labels.tsv by pandas.

import numpy as np
import pandas as pd

output_dir = "."
df_clones = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df_clones
clone_label
BARCODES
spot_0 2
spot_1 2
spot_2 2
spot_3 2
spot_4 2
... ...
spot_1795 1
spot_1796 1
spot_1797 1
spot_1798 1
spot_1799 1

1800 rows × 1 columns

Load the inferred allele-specific copy numbers for each genomic bin <output_dir>/clone3_rectangle0_w1.0/cnv_seglevel.tsv.

df_cna = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/cnv_seglevel.tsv", header=0, index_col=0, sep='\t')
df_cna
START END clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
CHR
1 1001138 1616548 1 1 1 1 1 1
1 1635227 2384877 1 1 1 1 1 1
1 2391775 6101016 1 1 1 1 1 1
1 6185020 6653223 1 1 1 1 1 1
1 6785454 7780639 1 1 1 1 1 1
... ... ... ... ... ... ... ... ...
22 43528744 45187923 1 1 1 1 1 1
22 45190338 45828198 1 1 1 1 1 1
22 46053869 46687116 1 1 1 1 1 1
22 46762617 50199494 1 1 1 1 1 1
22 50200979 50783663 1 1 1 1 1 1

1299 rows × 8 columns

Load the inferred allele-specific copy numbers for each gene <output_dir>/clone3_rectangle0_w1.0/cnv_genelevel.tsv.

df_cna = pd.read_csv(f"{output_dir}/clone3_rectangle0_w1.0/cnv_genelevel.tsv", header=0, index_col=0, sep='\t')
df_cna
clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
gene
ISG15 1 1 1 1 1 1
C1orf159 1 1 1 1 1 1
SDF4 1 1 1 1 1 1
UBE2J2 1 1 1 1 1 1
INTS11 1 1 1 1 1 1
... ... ... ... ... ... ...
CPT1B 1 1 1 1 1 1
CHKB 1 1 1 1 1 1
CHKB-DT 1 1 1 1 1 1
SHANK3 1 1 1 1 1 1
RABL2B 1 1 1 1 1 1

9000 rows × 6 columns

The plots generated by CalicoST are in PDF format and can be directly viewed. Below, we load the PDF plots in this notebook for easy visualization.

Firstly, <output_dir>/clone3_rectangle0_w1.0/plots/clone_spatial.pdf shows the inferred cancer clone in space.

from wand.image import Image as WImage
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/clone_spatial.pdf", resolution=100)
img
../../_images/8b307f1ad83e78040eb7a9c9a35b1e1fa1374491c041cb2e890ba2fff03b13b7.png

Secondly, <output_dir>/clone3_rectangle0_w1.0/plots/acn_genome.pdf shows the allele-specific copy numbers per clone along the genome. The color scheme follows

# allele-specific copy numbers of each clone (the color scheme is the same as Fig2c
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/acn_genome.pdf", resolution=120)
img
../../_images/a3b6f0898fdf04e1c7abd7e12ca115cea981f00bd3ebf0b05314d201446036d1.png

Thirdly, <output_dir>/clone3_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf shows RDR-BAF along the genome for each clone. Here, each color indicates a HMM state, while different colors may correspond to the same allele-specific copy numbers.

# RDR-BAF plot along the genome for each clone
img = WImage(filename=f"{output_dir}/clone3_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf", resolution=120)
img
../../_images/6dc9bac2d62c38f0ca938094fedeb678686b77ba138dd25527f4a16e69137668.png