Installation

Minimum installation

First setup a conda environment from the environment.yml file:

git clone https://github.com/raphael-group/CalicoST.git
cd CalicoST
conda env create -f environment.yml --name calicost_env

Then, install CalicoST using pip by

conda activate calicost_env
pip install -e .

Setting up the conda environments takes around 15 minutes on an HPC head node.

Additional installation for SNP parsing

CalicoST requires allele count matrices for reference-phased A and B alleles for inferring allele-specific CNAs, and provides a snakemake pipeline for obtaining the required matrices from a BAM file. Run the following commands in CalicoST directory for installing additional package, [Eagle2](https://alkesgroup.broadinstitute.org/Eagle/), for snakemake preprocessing pipeline.

mkdir external
wget --directory-prefix=external https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
tar -xzf external/Eagle_v2.4.1.tar.gz -C external

Additional installation for reconstructing phylogeny

Based on the inferred cancer clones and allele-specific CNAs by CalicoST, we apply Startle to reconstruct a phylogenetic tree along the clones. Install Startle by

git clone --recurse-submodules https://github.com/raphael-group/startle.git
cd startle
mkdir build; cd build
cmake -DLIBLEMON_ROOT=<lemon path>\
        -DCPLEX_INC_DIR=<cplex include path>\
        -DCPLEX_LIB_DIR=<cplex lib path>\
        -DCONCERT_INC_DIR=<concert include path>\
        -DCONCERT_LIB_DIR=<concert lib path>\
        ..
make

Prepare reference files for SNP parsing

We followed the recommended pipeline by Numbat <https://kharchenkolab.github.io/numbat/>`_ for parsing SNP information from BAM file(s): first genotyping using the BAM file by cellsnp-lite (included in the conda environment) and reference-based phasing by Eagle2. Download the following panels for genotyping and reference-based phasing.