Run CalicoST on prostate cancer dataset

Download the data

We applied CalicoST on five slices in a prostate cross section from Erickon et al. to study the cancer clones and spatial tumor evolution.

We provided the transcript count matrices from spaceranger output and the allele count matrices in examples/prostate_example.tar.gz, so that users can download and start running from there.

After downloading and untarring file, you will see the following files/directories

  • prostate_example

    • bamfile_list.tsv: A table of SRT slices with the paths to jointly identify clones and CNAs.

    • data: Spacerange directories of the SRT slices.

    • snpinfo: Allele count matrices from preprocessing Snakemake pipeline.

    • configuration_purity: Parameter configuration of CalicoST for inferring tumor proportion per spot.

    • configuration_cna_multi: Parameter configuration of CalicoST for inferring clones and CNAs.

    • estimate_tumor_prop: An empty directory to store future results for estimating tumor proportion.

    • calicost: An empty directory to store future results for inferred clones and CNAs.

Download and untar the data by

tar -xzvf <CalicoST git-cloned directory>/examples/prostate_example.tar.gz

Estimate tumor proportion per spot

CalicoST can estimate tumor proportions based on the BAF signals. To use CalicoST for inferring tumor proportion, first replace “<Replace with CalicoST directory>” in the configuration_purity file with the cloned CalicoST directory.

Then, run the following command in terminal

cd prostate_example
OMP_NUM_THREADS=1 python <CalicoST git-cloned directory>/src/calicost/estimate_tumor_proportion.py -c configuration_purity

This command takes about 1.5 hours to run and will generate an output file of estimate_tumor_prop/loh_estimator_tumor_prop.tsv.

Load and visualize inferred tumor proportions

We load and visualize the estimated tumor proportions as follows.

import scanpy as sc
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import copy
# Modify the example_directory to be the path of the downloaded and untarred data.
example_directory = "./"

tumor_proportions = pd.read_csv(f'{example_directory}/estimate_tumor_prop/loh_estimator_tumor_prop.tsv', header=0, index_col=0, sep='\t')
tumor_proportions
Tumor
BARCODES
AAACAAGTATCTCCCA-1_H12 0.050000
AAACAGGGTCTATATT-1_H12 NaN
AAACATTTCCCGGATT-1_H12 0.050000
AAACCGGGTAGGTACC-1_H12 0.825997
AAACCGTTCGTCCAGG-1_H12 0.940555
... ...
TTGTTCAGTGTGCTAC-1_H25 0.050000
TTGTTGTGTGTCAAGA-1_H25 0.188937
TTGTTTCACATCCAGG-1_H25 0.956014
TTGTTTCATTAGTCTA-1_H25 0.838851
TTGTTTCCATACAACT-1_H25 0.364009

13344 rows × 1 columns

slice_ids = ['H12', 'H14', 'H15', 'H21', 'H25']
directory_name = ['P1_H1_2_visium', 'P1_H1_4_visium', 'P1_H1_5_visium', 'P1_H2_1_visium', 'P1_H2_5_visium']

for i,s in enumerate(slice_ids):
    # load spatial locations
    # note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
    df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
    print(df.head())
    # combine the position data frame with the tumor proportion dataframe
    slice_tumor_proportions = tumor_proportions[tumor_proportions.index.str.endswith(s)]
    df = df.join( slice_tumor_proportions.rename(index=lambda x:x.split("_")[0]) )

    # plot
    fig, axes = plt.subplots(1, 1, facecolor='white')
    sns.scatterplot(x=df.array_row, y=df.array_col, hue=df.Tumor, palette='magma_r', linewidth=0, s=10, legend=False, ax=axes)
    norm = plt.Normalize(np.nanmin(df.Tumor.values), np.nanmax(df.Tumor.values))
    axes.figure.colorbar( plt.cm.ScalarMappable(cmap='magma_r', norm=norm), ax=axes )
    axes.set_title(s)
    plt.show()
                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1466   
TACCGATCCAACACTT-1          0          1          1                1592   
ATTAAAGCGGACGAGC-1          0          0          2                1466   
GATAAGGGACGATTAG-1          0          1          3                1592   
GTGCAAATCACCAATA-1          0          0          4                1466   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1298  
TACCGATCCAACACTT-1                1370  
ATTAAAGCGGACGAGC-1                1443  
GATAAGGGACGATTAG-1                1515  
GTGCAAATCACCAATA-1                1588  
../../_images/0bd44a02cfc38fc7624c8cd8f7ee53d12b4772411848057cc1597dfed2655c8b.png
                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1831   
TACCGATCCAACACTT-1          0          1          1                1983   
ATTAAAGCGGACGAGC-1          0          0          2                1831   
GATAAGGGACGATTAG-1          0          1          3                1983   
GTGCAAATCACCAATA-1          0          0          4                1831   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1544  
TACCGATCCAACACTT-1                1631  
ATTAAAGCGGACGAGC-1                1718  
GATAAGGGACGATTAG-1                1805  
GTGCAAATCACCAATA-1                1892  
../../_images/82cfd15c9b52cac247723460a9cdbfed9b10afc7dfe1965e4d99594f5956af96.png
                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1593   
TACCGATCCAACACTT-1          0          1          1                1720   
ATTAAAGCGGACGAGC-1          0          0          2                1593   
GATAAGGGACGATTAG-1          0          1          3                1719   
GTGCAAATCACCAATA-1          0          0          4                1593   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1172  
TACCGATCCAACACTT-1                1245  
ATTAAAGCGGACGAGC-1                1317  
GATAAGGGACGATTAG-1                1390  
GTGCAAATCACCAATA-1                1462  
../../_images/0625892712a0b5609691beace831441d37c7ae8303380451b5948db9ca4f792e.png
                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1830   
TACCGATCCAACACTT-1          0          1          1                1982   
ATTAAAGCGGACGAGC-1          0          0          2                1831   
GATAAGGGACGATTAG-1          0          1          3                1982   
GTGCAAATCACCAATA-1          0          0          4                1831   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1523  
TACCGATCCAACACTT-1                1610  
ATTAAAGCGGACGAGC-1                1697  
GATAAGGGACGATTAG-1                1784  
GTGCAAATCACCAATA-1                1871  
../../_images/ae058a782d54eedc0affcc2f6aaaa43bb55bf3db006f58727abb04ae9a958b1a.png
                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1948   
TACCGATCCAACACTT-1          0          1          1                2100   
ATTAAAGCGGACGAGC-1          0          0          2                1948   
GATAAGGGACGATTAG-1          0          1          3                2100   
GTGCAAATCACCAATA-1          0          0          4                1947   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1441  
TACCGATCCAACACTT-1                1529  
ATTAAAGCGGACGAGC-1                1616  
GATAAGGGACGATTAG-1                1705  
GTGCAAATCACCAATA-1                1791  
../../_images/aaa6c0342c37d83cf860fa701a266d5502c094ae2bd4be51c349dcc97226ed73.png

Run CalicoST to infer CNAs and cancer clones based on estimated tumor proportions

To infer cancer clones and CNAs by CalicoST, first replace “<Replace with CalicoST directory>” in the configuration_cna_multi file with the cloned CalicoST directory. Note that the parameter configuration takes in the estimated tumor proportion in the previous section by specifying “tumorprop_file : estimate_tumor_prop/loh_estimator_tumor_prop.tsv”.

Then, run the following command in terminal:

OMP_NUM_THREADS=1 python <CalicoST directory>/src/calicost/calicost_main.py -c configuration_cna_multi

This command takes about 8h to finish.

Load CalicoST-generated result tables

<output_dir>/clone5_rectangle0_w1.0/clone_labels.tsv stores the inferred cancer clones for each spot. Note that spots with a low tumor purity will also be assigned to a cancer clone, despite that the inferred cancer clone label is not very meaningful for those nearly normal spots.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df
clone_label tumor_proportion
BARCODES
AAACAAGTATCTCCCA-1_H12 3 0.050000
AAACAGGGTCTATATT-1_H12 3 NaN
AAACATTTCCCGGATT-1_H12 3 0.050000
AAACCGGGTAGGTACC-1_H12 3 0.825997
AAACCGTTCGTCCAGG-1_H12 3 0.940555
... ... ...
TTGTTCAGTGTGCTAC-1_H25 2 0.050000
TTGTTGTGTGTCAAGA-1_H25 2 0.188937
TTGTTTCACATCCAGG-1_H25 1 0.956014
TTGTTTCATTAGTCTA-1_H25 1 0.838851
TTGTTTCCATACAACT-1_H25 2 0.364009

13344 rows × 2 columns

<output_dir>/clone5_rectangle0_w1.0/cnv_seglevel.tsv stores the allele-specific copy numbers of each genome bin within each clone. Each row is a genome bin, the columns containing the chromosome, start, and end position of the genome bin, and the inferred the A and B allele copy number within each cancer clone.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_seglevel.tsv", header=0, index_col=None, sep='\t')
df
CHR START END clone1 A clone1 B clone2 A clone2 B clone3 A clone3 B clone4 A clone4 B clone5 A clone5 B
0 1 89295 1419136 1 1 1 1 1 1 1 1 1 1
1 1 1434861 1440568 1 1 1 1 1 1 1 1 1 1
2 1 1449689 1496123 1 1 1 1 1 1 1 1 1 1
3 1 1512162 1721078 1 1 1 1 1 1 1 1 1 1
4 1 1724838 2308568 1 1 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2255 22 46360834 48898361 1 1 1 1 1 1 1 1 1 1
2256 22 49773283 49963978 1 1 1 1 1 1 1 1 1 1
2257 22 50089879 50180213 1 1 1 1 1 1 1 1 1 1
2258 22 50185915 50292030 1 1 1 1 1 1 1 1 1 1
2259 22 50309030 50783625 1 1 1 1 1 1 1 1 1 1

2260 rows × 13 columns

<output_dir>/clone5_rectangle0_w1.0/cnv_genelevel.tsv stores the allele-specific copy number for each gene. The copy number per gene is derived from projecting the allele-specific copy numbers of each genome bin to the spanned genes that have enough expression to be retained after CalicoST gene filtering step.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_genelevel.tsv", header=0, index_col=None, sep='\t')
df
gene clone1 A clone1 B clone2 A clone2 B clone3 A clone3 B clone4 A clone4 B clone5 A clone5 B
0 AL627309.1 1 1 1 1 1 1 1 1 1 1
1 AL627309.5 1 1 1 1 1 1 1 1 1 1
2 LINC01409 1 1 1 1 1 1 1 1 1 1
3 LINC01128 1 1 1 1 1 1 1 1 1 1
4 LINC00115 1 1 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ... ... ...
15035 CHKB-DT 1 1 1 1 1 1 1 1 1 1
15036 MAPK8IP2 1 1 1 1 1 1 1 1 1 1
15037 ARSA 1 1 1 1 1 1 1 1 1 1
15038 SHANK3 1 1 1 1 1 1 1 1 1 1
15039 RABL2B 1 1 1 1 1 1 1 1 1 1

15040 rows × 11 columns

Visualize CalicoST-generated plots

Once CalicoST is finished running, it generates the plots of spatial organization of clones and allele-specific copy numbers. The plots are in PDF format and can be directly viewed. Below, we load the PDF plots in this notebook for easy visualization.

from wand.image import Image as WImage
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/clone_spatial.pdf", resolution=100)
img
../../_images/300a43b307d52f773d27147c7ec06e7898e1004f4b727a371401bf964e88a6b7.png
# allele-specific copy numbers of each clone (the color scheme is the same as Fig2c
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/acn_genome.pdf", resolution=120)
img
../../_images/8b4794f8f69a9cfbd49d40145aa65ea58c015ae95b62497a7d331796e53dd6df.png
# RDR-BAF plot along the genome for each clone
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf", resolution=120)
img
../../_images/0b2e506d3c4598b29d614c877f935eac521b4f376f1a85ad5ffe8dae1df8127e.png

Reconstruct tumor phylogeny and phylogeography

We use an existing phylogeny reconstruction method, Startle by Sashittal et al, to infer a phylogeny of CalicoST-inferred cancer clones. To reconstruct a phylogeny based CalicoST results of the first random initialization, run the following command in shell:

mkdir calicost/phylogeny_clone5_rectangle0_w1.0
python <CalicoST code directory>/src/calicost/phylogeny_startle.py -c calicost/clone5_rectangle0_w1.0 -s <startle executable path> -o calicost/phylogeny_clone5_rectangle0_w1.0/

The above run of Startle will produce a plain-text file calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick that encodes phylogeny tree with leaf nodes as CalicoST-inferred clones. We load the tree file as follows.

with open(f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick", 'r') as fp:
    print( fp.readlines() )
['((clone1:0,clone2:3):4,(clone3:1,(clone4:3,clone5:3):9):2);']

Now we project the phylogenetic tree in space to get a phylogeography. Before getting the phylogeography, we note that we currently don’t have the relative positioning among the five slices yet. We manually place the five slices according to Fig 1b in the original publication by Erickon et al., and transform the x/y coordinate in the tissue_positions.csv file according to the new positioning.

# load coordinates and inferred cancer clones
coords = []
for i,s in enumerate(slice_ids):
    # load spatial locations
    # note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
    df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
    df.index = df.index + "_" + s
    df['slice_id'] = s
    coords.append( df )

coords = pd.concat(coords)

# combine with the cancer clone table
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df.clone_label = 'clone' + df.clone_label.astype(str)
coords = coords.join(df)

# remove spots that are not assigned to clones by CalicoST (filtered out due to low UMI count or SNP-covering UMI count)
coords = coords[coords.clone_label.notnull()]
coords
in_tissue array_row array_col pxl_row_in_fullres pxl_col_in_fullres slice_id clone_label tumor_proportion
barcode
TCCTTCAGTGGTCGAA-1_H12 1 15 69 3366 6308 H12 clone3 0.050000
GCGTCGAAATGTCGGT-1_H12 1 17 65 3618 6018 H12 clone3 0.132391
AACTGATATTAGGCCT-1_H12 1 16 66 3492 6090 H12 clone3 0.050000
CGAGCTGGGCTTTAGG-1_H12 1 17 67 3618 6163 H12 clone3 0.050000
GGGTGTTTCAGCTATG-1_H12 1 16 68 3492 6236 H12 clone3 0.050000
... ... ... ... ... ... ... ... ...
ATGGCCCGAAAGGTTA-1_H25 1 76 120 13480 12001 H25 clone2 1.000000
CGTAATATGGCCCTTG-1_H25 1 77 121 13632 12089 H25 clone2 1.000000
AGAGTCTTAATGAAAG-1_H25 1 76 122 13480 12176 H25 clone2 1.000000
ATTGAATTCCCTGTAG-1_H25 1 76 124 13479 12351 H25 clone2 1.000000
TTGAAGTGCATCTACA-1_H25 1 77 127 13630 12615 H25 clone2 NaN

13344 rows × 8 columns

def flip_axis(coords, axis):
    max_x = np.max(coords[:,axis])
    min_x = np.min(coords[:,axis])
    tmp_coords = copy.copy(coords)
    tmp_coords[:,axis] = min_x + max_x - coords[:,axis]
    return tmp_coords


def rotate_by_angle(coords, angle):
    theta = angle / 180 * np.pi
    R = np.array([ [np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]] )
    mean_coords = np.mean(coords, axis=0)
    return (coords - mean_coords.reshape(1,-1)) @ R + mean_coords.reshape(1,-1)
adjusted_coords = copy.copy(coords[['array_row', 'array_col']].values)
# scale y coordinate so that the hexagon is not squeezed on one direction
adjusted_coords[:,0] = adjusted_coords[:,0] * np.sqrt(3)
# shift x and y coordinate to start from 0 for each slice
for s,sname in enumerate(slice_ids):
    index = np.where(coords.slice_id.values == sname)[0]
    adjusted_coords[index,0] -= np.min(adjusted_coords[index,0])
    adjusted_coords[index,1] -= np.min(adjusted_coords[index,1])
    

# position in number of cubes
cube_length = min( np.max(adjusted_coords[:,0]), np.max(adjusted_coords[:,1]) )
sample_cube_pos = np.array([ [2,0], #H12
                             [4, 0.2], #H14
                             [5,0.5], #H15
                             [0,1], #H21
                             [5,1.5] ]) #H25

swap_x_y = [False, True, True, False, True]
rotation_angle = [15,-5,-5,0,-5] # H12, H14, H15, H21, H25

full_adj_coords = np.zeros(adjusted_coords.shape)
for s,sname in enumerate(slice_ids):
    index = np.where(coords.slice_id.values == sname)[0]
    if swap_x_y[s]:
        tmp_coords = np.vstack([adjusted_coords[index,1],adjusted_coords[index,0]]).T
        if sname != "H25":
            tmp_coords = flip_axis(tmp_coords, axis=0 )
        tmp_coords = flip_axis(tmp_coords, axis=1)
        full_adj_coords[index,:] = tmp_coords + cube_length * sample_cube_pos[s]
    else:
        full_adj_coords[index,:] = adjusted_coords[index,:] + cube_length * sample_cube_pos[s]
        
    full_adj_coords[index,:] = rotate_by_angle(full_adj_coords[index,:], rotation_angle[s])

coords['final_x'] = full_adj_coords[:,0]
coords['final_y'] = full_adj_coords[:,1]
coords
in_tissue array_row array_col pxl_row_in_fullres pxl_col_in_fullres slice_id clone_label tumor_proportion final_x final_y
barcode
TCCTTCAGTGGTCGAA-1_H12 1 15 69 3366 6308 H12 clone3 0.050000 238.417652 74.069483
GCGTCGAAATGTCGGT-1_H12 1 17 65 3618 6018 H12 clone3 0.132391 241.246079 69.170504
AACTGATATTAGGCCT-1_H12 1 16 66 3492 6090 H12 clone3 0.050000 239.573047 70.654068
CGAGCTGGGCTTTAGG-1_H12 1 17 67 3618 6163 H12 clone3 0.050000 241.763717 71.102355
GGGTGTTTCAGCTATG-1_H12 1 16 68 3492 6236 H12 clone3 0.050000 240.090685 72.585919
... ... ... ... ... ... ... ... ... ... ...
ATGGCCCGAAAGGTTA-1_H25 1 76 120 13480 12001 H25 clone2 1.000000 700.743520 183.090182
CGTAATATGGCCCTTG-1_H25 1 77 121 13632 12089 H25 clone2 1.000000 701.914026 181.184948
AGAGTCTTAATGAAAG-1_H25 1 76 122 13480 12176 H25 clone2 1.000000 702.735909 183.264493
ATTGAATTCCCTGTAG-1_H25 1 76 124 13479 12351 H25 clone2 1.000000 704.728299 183.438805
TTGAAGTGCATCTACA-1_H25 1 77 127 13630 12615 H25 clone2 NaN 707.891194 181.707882

13344 rows × 10 columns

import calicost.utils_plotting

fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
plt.gca().invert_yaxis()
fig.show()
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
../../_images/75c2e34f2a0311be4074370c55cf8be6491663d4dd440bd8af4aa7788b3d1d9b.png

Now we project the phylogeny to the space of coords[[‘final_x’, ‘final_y’]] and infer ancestor locations using a Gaussian diffusion model.

import calicost.phylogeography

newick_file = f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick"
t = calicost.phylogeography.project_phylogeneny_space(newick_file, coords[['final_x', 'final_y']].values, coords.clone_label.values, 
                              single_tumor_prop=coords.tumor_proportion.values, sample_list=slice_ids, sample_ids=coords.slice_id.values)

print( t )
root node is ancestor1_2_3_4_5
a list of leaf nodes: ['clone1', 'clone2', 'clone3', 'clone4', 'clone5']
a list of internal nodes: ['ancestor1_2_3_4_5', 'ancestor1_2', 'ancestor3_4_5', 'ancestor4_5']

      /-clone1
   /-|
  |   \-clone2
--|
  |   /-clone3
   \-|
     |   /-clone4
      \-|
         \-clone5
# plot clones in space with the phylogeography

fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
axes = plt.gca()

# clone centers + ancestors
for node in t.traverse():
    axes.scatter( node.x, -node.y, marker="D", linewidth=2, edgecolor='black', facecolor="None", s=50)

# edges
for node in t.iter_leaves():
    while not node.is_root():
        p = node.up
        if np.abs(node.x - p.x) + np.abs(node.y - p.y) > 1:
            axes.annotate("", xy=(node.x, -node.y), xytext=(p.x, -p.y), arrowprops=dict(mutation_scale=15, lw=1, arrowstyle="->", color="black"))
        node = p
        
axes.invert_yaxis()
fig.show()
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
../../_images/c9df9acd75aeb9bbe6bcd090f96dbd719d4995331db150fc1b2f1c6114c90153.png