Run CalicoST on prostate cancer dataset

Download the data

We applied CalicoST on five slices in a prostate cross section from Erickon et al. to study the cancer clones and spatial tumor evolution.

We provided the transcript count matrices from spaceranger output and the allele count matrices in examples/prostate_example.tar.gz, so that users can download and start running from there.

After downloading and untarring file, you will see the following files/directories

prostate_example
- bamfile_list.tsv: A table of SRT slices with the paths to jointly identify clones and CNAs.
- data: Spacerange directories of the SRT slices.
- snpinfo: Allele count matrices from preprocessing Snakemake pipeline.
- configuration_purity: Parameter configuration of CalicoST for inferring tumor proportion per spot.
- configuration_cna_multi: Parameter configuration of CalicoST for inferring clones and CNAs.
- estimate_tumor_prop: An empty directory to store future results for estimating tumor proportion.
- calicost: An empty directory to store future results for inferred clones and CNAs.

Download and untar the data by

tar -xzvf <CalicoST git-cloned directory>/examples/prostate_example.tar.gz

Estimate tumor proportion per spot

CalicoST can estimate tumor proportions based on the BAF signals. To use CalicoST for inferring tumor proportion, first replace “<Replace with CalicoST directory>” in the configuration_purity file with the cloned CalicoST directory.

Then, run the following command in terminal

cd prostate_example
OMP_NUM_THREADS=1 python <CalicoST git-cloned directory>/src/calicost/estimate_tumor_proportion.py -c configuration_purity

This command takes about 1.5 hours to run and will generate an output file of estimate_tumor_prop/loh_estimator_tumor_prop.tsv.

Load and visualize inferred tumor proportions

We load and visualize the estimated tumor proportions as follows.

import scanpy as sc
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import copy

# Modify the example_directory to be the path of the downloaded and untarred data.
example_directory = "./"

tumor_proportions = pd.read_csv(f'{example_directory}/estimate_tumor_prop/loh_estimator_tumor_prop.tsv', header=0, index_col=0, sep='\t')
tumor_proportions

	Tumor
BARCODES
AAACAAGTATCTCCCA-1_H12	0.050000
AAACAGGGTCTATATT-1_H12	NaN
AAACATTTCCCGGATT-1_H12	0.050000
AAACCGGGTAGGTACC-1_H12	0.825997
AAACCGTTCGTCCAGG-1_H12	0.940555
...	...
TTGTTCAGTGTGCTAC-1_H25	0.050000
TTGTTGTGTGTCAAGA-1_H25	0.188937
TTGTTTCACATCCAGG-1_H25	0.956014
TTGTTTCATTAGTCTA-1_H25	0.838851
TTGTTTCCATACAACT-1_H25	0.364009

13344 rows × 1 columns

slice_ids = ['H12', 'H14', 'H15', 'H21', 'H25']
directory_name = ['P1_H1_2_visium', 'P1_H1_4_visium', 'P1_H1_5_visium', 'P1_H2_1_visium', 'P1_H2_5_visium']

for i,s in enumerate(slice_ids):
    # load spatial locations
    # note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
    df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
    print(df.head())
    # combine the position data frame with the tumor proportion dataframe
    slice_tumor_proportions = tumor_proportions[tumor_proportions.index.str.endswith(s)]
    df = df.join( slice_tumor_proportions.rename(index=lambda x:x.split("_")[0]) )

    # plot
    fig, axes = plt.subplots(1, 1, facecolor='white')
    sns.scatterplot(x=df.array_row, y=df.array_col, hue=df.Tumor, palette='magma_r', linewidth=0, s=10, legend=False, ax=axes)
    norm = plt.Normalize(np.nanmin(df.Tumor.values), np.nanmax(df.Tumor.values))
    axes.figure.colorbar( plt.cm.ScalarMappable(cmap='magma_r', norm=norm), ax=axes )
    axes.set_title(s)
    plt.show()

                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1466   
TACCGATCCAACACTT-1          0          1          1                1592   
ATTAAAGCGGACGAGC-1          0          0          2                1466   
GATAAGGGACGATTAG-1          0          1          3                1592   
GTGCAAATCACCAATA-1          0          0          4                1466   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1298  
TACCGATCCAACACTT-1                1370  
ATTAAAGCGGACGAGC-1                1443  
GATAAGGGACGATTAG-1                1515  
GTGCAAATCACCAATA-1                1588  

../../_images/0bd44a02cfc38fc7624c8cd8f7ee53d12b4772411848057cc1597dfed2655c8b.png

                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1831   
TACCGATCCAACACTT-1          0          1          1                1983   
ATTAAAGCGGACGAGC-1          0          0          2                1831   
GATAAGGGACGATTAG-1          0          1          3                1983   
GTGCAAATCACCAATA-1          0          0          4                1831   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1544  
TACCGATCCAACACTT-1                1631  
ATTAAAGCGGACGAGC-1                1718  
GATAAGGGACGATTAG-1                1805  
GTGCAAATCACCAATA-1                1892  

../../_images/82cfd15c9b52cac247723460a9cdbfed9b10afc7dfe1965e4d99594f5956af96.png

                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1593   
TACCGATCCAACACTT-1          0          1          1                1720   
ATTAAAGCGGACGAGC-1          0          0          2                1593   
GATAAGGGACGATTAG-1          0          1          3                1719   
GTGCAAATCACCAATA-1          0          0          4                1593   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1172  
TACCGATCCAACACTT-1                1245  
ATTAAAGCGGACGAGC-1                1317  
GATAAGGGACGATTAG-1                1390  
GTGCAAATCACCAATA-1                1462  

../../_images/0625892712a0b5609691beace831441d37c7ae8303380451b5948db9ca4f792e.png

                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1830   
TACCGATCCAACACTT-1          0          1          1                1982   
ATTAAAGCGGACGAGC-1          0          0          2                1831   
GATAAGGGACGATTAG-1          0          1          3                1982   
GTGCAAATCACCAATA-1          0          0          4                1831   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1523  
TACCGATCCAACACTT-1                1610  
ATTAAAGCGGACGAGC-1                1697  
GATAAGGGACGATTAG-1                1784  
GTGCAAATCACCAATA-1                1871  

../../_images/ae058a782d54eedc0affcc2f6aaaa43bb55bf3db006f58727abb04ae9a958b1a.png

                    in_tissue  array_row  array_col  pxl_row_in_fullres  \
barcode                                                                   
ACGCCTGACACGCGCT-1          0          0          0                1948   
TACCGATCCAACACTT-1          0          1          1                2100   
ATTAAAGCGGACGAGC-1          0          0          2                1948   
GATAAGGGACGATTAG-1          0          1          3                2100   
GTGCAAATCACCAATA-1          0          0          4                1947   

                    pxl_col_in_fullres  
barcode                                 
ACGCCTGACACGCGCT-1                1441  
TACCGATCCAACACTT-1                1529  
ATTAAAGCGGACGAGC-1                1616  
GATAAGGGACGATTAG-1                1705  
GTGCAAATCACCAATA-1                1791  

../../_images/aaa6c0342c37d83cf860fa701a266d5502c094ae2bd4be51c349dcc97226ed73.png

Run CalicoST to infer CNAs and cancer clones based on estimated tumor proportions

To infer cancer clones and CNAs by CalicoST, first replace “<Replace with CalicoST directory>” in the configuration_cna_multi file with the cloned CalicoST directory. Note that the parameter configuration takes in the estimated tumor proportion in the previous section by specifying “tumorprop_file : estimate_tumor_prop/loh_estimator_tumor_prop.tsv”.

Then, run the following command in terminal:

OMP_NUM_THREADS=1 python <CalicoST directory>/src/calicost/calicost_main.py -c configuration_cna_multi

This command takes about 8h to finish.

Load CalicoST-generated result tables

<output_dir>/clone5_rectangle0_w1.0/clone_labels.tsv stores the inferred cancer clones for each spot. Note that spots with a low tumor purity will also be assigned to a cancer clone, despite that the inferred cancer clone label is not very meaningful for those nearly normal spots.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df

	clone_label	tumor_proportion
BARCODES
AAACAAGTATCTCCCA-1_H12	3	0.050000
AAACAGGGTCTATATT-1_H12	3	NaN
AAACATTTCCCGGATT-1_H12	3	0.050000
AAACCGGGTAGGTACC-1_H12	3	0.825997
AAACCGTTCGTCCAGG-1_H12	3	0.940555
...	...	...
TTGTTCAGTGTGCTAC-1_H25	2	0.050000
TTGTTGTGTGTCAAGA-1_H25	2	0.188937
TTGTTTCACATCCAGG-1_H25	1	0.956014
TTGTTTCATTAGTCTA-1_H25	1	0.838851
TTGTTTCCATACAACT-1_H25	2	0.364009

13344 rows × 2 columns

<output_dir>/clone5_rectangle0_w1.0/cnv_seglevel.tsv stores the allele-specific copy numbers of each genome bin within each clone. Each row is a genome bin, the columns containing the chromosome, start, and end position of the genome bin, and the inferred the A and B allele copy number within each cancer clone.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_seglevel.tsv", header=0, index_col=None, sep='\t')
df

	CHR	START	END	clone1 A	clone1 B	clone2 A	clone2 B	clone3 A	clone3 B	clone4 A	clone4 B	clone5 A	clone5 B
0	1	89295	1419136	1	1	1	1	1	1	1	1	1	1
1	1	1434861	1440568	1	1	1	1	1	1	1	1	1	1
2	1	1449689	1496123	1	1	1	1	1	1	1	1	1	1
3	1	1512162	1721078	1	1	1	1	1	1	1	1	1	1
4	1	1724838	2308568	1	1	1	1	1	1	1	1	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2255	22	46360834	48898361	1	1	1	1	1	1	1	1	1	1
2256	22	49773283	49963978	1	1	1	1	1	1	1	1	1	1
2257	22	50089879	50180213	1	1	1	1	1	1	1	1	1	1
2258	22	50185915	50292030	1	1	1	1	1	1	1	1	1	1
2259	22	50309030	50783625	1	1	1	1	1	1	1	1	1	1

2260 rows × 13 columns

<output_dir>/clone5_rectangle0_w1.0/cnv_genelevel.tsv stores the allele-specific copy number for each gene. The copy number per gene is derived from projecting the allele-specific copy numbers of each genome bin to the spanned genes that have enough expression to be retained after CalicoST gene filtering step.

df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_genelevel.tsv", header=0, index_col=None, sep='\t')
df

	gene	clone1 A	clone1 B	clone2 A	clone2 B	clone3 A	clone3 B	clone4 A	clone4 B	clone5 A	clone5 B
0	AL627309.1	1	1	1	1	1	1	1	1	1	1
1	AL627309.5	1	1	1	1	1	1	1	1	1	1
2	LINC01409	1	1	1	1	1	1	1	1	1	1
3	LINC01128	1	1	1	1	1	1	1	1	1	1
4	LINC00115	1	1	1	1	1	1	1	1	1	1
...	...	...	...	...	...	...	...	...	...	...	...
15035	CHKB-DT	1	1	1	1	1	1	1	1	1	1
15036	MAPK8IP2	1	1	1	1	1	1	1	1	1	1
15037	ARSA	1	1	1	1	1	1	1	1	1	1
15038	SHANK3	1	1	1	1	1	1	1	1	1	1
15039	RABL2B	1	1	1	1	1	1	1	1	1	1

15040 rows × 11 columns

Visualize CalicoST-generated plots

Once CalicoST is finished running, it generates the plots of spatial organization of clones and allele-specific copy numbers. The plots are in PDF format and can be directly viewed. Below, we load the PDF plots in this notebook for easy visualization.

from wand.image import Image as WImage
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/clone_spatial.pdf", resolution=100)
img

../../_images/300a43b307d52f773d27147c7ec06e7898e1004f4b727a371401bf964e88a6b7.png

# allele-specific copy numbers of each clone (the color scheme is the same as Fig2c
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/acn_genome.pdf", resolution=120)
img

../../_images/8b4794f8f69a9cfbd49d40145aa65ea58c015ae95b62497a7d331796e53dd6df.png

# RDR-BAF plot along the genome for each clone
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf", resolution=120)
img

../../_images/0b2e506d3c4598b29d614c877f935eac521b4f376f1a85ad5ffe8dae1df8127e.png

Reconstruct tumor phylogeny and phylogeography

We use an existing phylogeny reconstruction method, Startle by Sashittal et al, to infer a phylogeny of CalicoST-inferred cancer clones. To reconstruct a phylogeny based CalicoST results of the first random initialization, run the following command in shell:

mkdir calicost/phylogeny_clone5_rectangle0_w1.0
python <CalicoST code directory>/src/calicost/phylogeny_startle.py -c calicost/clone5_rectangle0_w1.0 -s <startle executable path> -o calicost/phylogeny_clone5_rectangle0_w1.0/

The above run of Startle will produce a plain-text file calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick that encodes phylogeny tree with leaf nodes as CalicoST-inferred clones. We load the tree file as follows.

with open(f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick", 'r') as fp:
    print( fp.readlines() )

['((clone1:0,clone2:3):4,(clone3:1,(clone4:3,clone5:3):9):2);']

Now we project the phylogenetic tree in space to get a phylogeography. Before getting the phylogeography, we note that we currently don’t have the relative positioning among the five slices yet. We manually place the five slices according to Fig 1b in the original publication by Erickon et al., and transform the x/y coordinate in the tissue_positions.csv file according to the new positioning.

# load coordinates and inferred cancer clones
coords = []
for i,s in enumerate(slice_ids):
    # load spatial locations
    # note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
    df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
    df.index = df.index + "_" + s
    df['slice_id'] = s
    coords.append( df )

coords = pd.concat(coords)

# combine with the cancer clone table
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df.clone_label = 'clone' + df.clone_label.astype(str)
coords = coords.join(df)

# remove spots that are not assigned to clones by CalicoST (filtered out due to low UMI count or SNP-covering UMI count)
coords = coords[coords.clone_label.notnull()]
coords

	in_tissue	array_row	array_col	pxl_row_in_fullres	pxl_col_in_fullres	slice_id	clone_label	tumor_proportion
barcode
TCCTTCAGTGGTCGAA-1_H12	1	15	69	3366	6308	H12	clone3	0.050000
GCGTCGAAATGTCGGT-1_H12	1	17	65	3618	6018	H12	clone3	0.132391
AACTGATATTAGGCCT-1_H12	1	16	66	3492	6090	H12	clone3	0.050000
CGAGCTGGGCTTTAGG-1_H12	1	17	67	3618	6163	H12	clone3	0.050000
GGGTGTTTCAGCTATG-1_H12	1	16	68	3492	6236	H12	clone3	0.050000
...	...	...	...	...	...	...	...	...
ATGGCCCGAAAGGTTA-1_H25	1	76	120	13480	12001	H25	clone2	1.000000
CGTAATATGGCCCTTG-1_H25	1	77	121	13632	12089	H25	clone2	1.000000
AGAGTCTTAATGAAAG-1_H25	1	76	122	13480	12176	H25	clone2	1.000000
ATTGAATTCCCTGTAG-1_H25	1	76	124	13479	12351	H25	clone2	1.000000
TTGAAGTGCATCTACA-1_H25	1	77	127	13630	12615	H25	clone2	NaN

13344 rows × 8 columns

def flip_axis(coords, axis):
    max_x = np.max(coords[:,axis])
    min_x = np.min(coords[:,axis])
    tmp_coords = copy.copy(coords)
    tmp_coords[:,axis] = min_x + max_x - coords[:,axis]
    return tmp_coords


def rotate_by_angle(coords, angle):
    theta = angle / 180 * np.pi
    R = np.array([ [np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]] )
    mean_coords = np.mean(coords, axis=0)
    return (coords - mean_coords.reshape(1,-1)) @ R + mean_coords.reshape(1,-1)

adjusted_coords = copy.copy(coords[['array_row', 'array_col']].values)
# scale y coordinate so that the hexagon is not squeezed on one direction
adjusted_coords[:,0] = adjusted_coords[:,0] * np.sqrt(3)
# shift x and y coordinate to start from 0 for each slice
for s,sname in enumerate(slice_ids):
    index = np.where(coords.slice_id.values == sname)[0]
    adjusted_coords[index,0] -= np.min(adjusted_coords[index,0])
    adjusted_coords[index,1] -= np.min(adjusted_coords[index,1])
    

# position in number of cubes
cube_length = min( np.max(adjusted_coords[:,0]), np.max(adjusted_coords[:,1]) )
sample_cube_pos = np.array([ [2,0], #H12
                             [4, 0.2], #H14
                             [5,0.5], #H15
                             [0,1], #H21
                             [5,1.5] ]) #H25

swap_x_y = [False, True, True, False, True]
rotation_angle = [15,-5,-5,0,-5] # H12, H14, H15, H21, H25

full_adj_coords = np.zeros(adjusted_coords.shape)
for s,sname in enumerate(slice_ids):
    index = np.where(coords.slice_id.values == sname)[0]
    if swap_x_y[s]:
        tmp_coords = np.vstack([adjusted_coords[index,1],adjusted_coords[index,0]]).T
        if sname != "H25":
            tmp_coords = flip_axis(tmp_coords, axis=0 )
        tmp_coords = flip_axis(tmp_coords, axis=1)
        full_adj_coords[index,:] = tmp_coords + cube_length * sample_cube_pos[s]
    else:
        full_adj_coords[index,:] = adjusted_coords[index,:] + cube_length * sample_cube_pos[s]
        
    full_adj_coords[index,:] = rotate_by_angle(full_adj_coords[index,:], rotation_angle[s])

coords['final_x'] = full_adj_coords[:,0]
coords['final_y'] = full_adj_coords[:,1]
coords

	in_tissue	array_row	array_col	pxl_row_in_fullres	pxl_col_in_fullres	slice_id	clone_label	tumor_proportion	final_x	final_y
barcode
TCCTTCAGTGGTCGAA-1_H12	1	15	69	3366	6308	H12	clone3	0.050000	238.417652	74.069483
GCGTCGAAATGTCGGT-1_H12	1	17	65	3618	6018	H12	clone3	0.132391	241.246079	69.170504
AACTGATATTAGGCCT-1_H12	1	16	66	3492	6090	H12	clone3	0.050000	239.573047	70.654068
CGAGCTGGGCTTTAGG-1_H12	1	17	67	3618	6163	H12	clone3	0.050000	241.763717	71.102355
GGGTGTTTCAGCTATG-1_H12	1	16	68	3492	6236	H12	clone3	0.050000	240.090685	72.585919
...	...	...	...	...	...	...	...	...	...	...
ATGGCCCGAAAGGTTA-1_H25	1	76	120	13480	12001	H25	clone2	1.000000	700.743520	183.090182
CGTAATATGGCCCTTG-1_H25	1	77	121	13632	12089	H25	clone2	1.000000	701.914026	181.184948
AGAGTCTTAATGAAAG-1_H25	1	76	122	13480	12176	H25	clone2	1.000000	702.735909	183.264493
ATTGAATTCCCTGTAG-1_H25	1	76	124	13479	12351	H25	clone2	1.000000	704.728299	183.438805
TTGAAGTGCATCTACA-1_H25	1	77	127	13630	12615	H25	clone2	NaN	707.891194	181.707882

13344 rows × 10 columns

import calicost.utils_plotting

fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
plt.gca().invert_yaxis()
fig.show()

/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)

../../_images/75c2e34f2a0311be4074370c55cf8be6491663d4dd440bd8af4aa7788b3d1d9b.png

Now we project the phylogeny to the space of coords[[‘final_x’, ‘final_y’]] and infer ancestor locations using a Gaussian diffusion model.

import calicost.phylogeography

newick_file = f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick"
t = calicost.phylogeography.project_phylogeneny_space(newick_file, coords[['final_x', 'final_y']].values, coords.clone_label.values, 
                              single_tumor_prop=coords.tumor_proportion.values, sample_list=slice_ids, sample_ids=coords.slice_id.values)

print( t )

root node is ancestor1_2_3_4_5
a list of leaf nodes: ['clone1', 'clone2', 'clone3', 'clone4', 'clone5']
a list of internal nodes: ['ancestor1_2_3_4_5', 'ancestor1_2', 'ancestor3_4_5', 'ancestor4_5']

      /-clone1
   /-|
  |   \-clone2
--|
  |   /-clone3
   \-|
     |   /-clone4
      \-|
         \-clone5

# plot clones in space with the phylogeography

fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
axes = plt.gca()

# clone centers + ancestors
for node in t.traverse():
    axes.scatter( node.x, -node.y, marker="D", linewidth=2, edgecolor='black', facecolor="None", s=50)

# edges
for node in t.iter_leaves():
    while not node.is_root():
        p = node.up
        if np.abs(node.x - p.x) + np.abs(node.y - p.y) > 1:
            axes.annotate("", xy=(node.x, -node.y), xytext=(p.x, -p.y), arrowprops=dict(mutation_scale=15, lw=1, arrowstyle="->", color="black"))
        node = p
        
axes.invert_yaxis()
fig.show()

/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)

../../_images/c9df9acd75aeb9bbe6bcd090f96dbd719d4995331db150fc1b2f1c6114c90153.png