Run CalicoST on prostate cancer dataset
Download the data
We applied CalicoST on five slices in a prostate cross section from Erickon et al. to study the cancer clones and spatial tumor evolution.
We provided the transcript count matrices from spaceranger output and the allele count matrices in examples/prostate_example.tar.gz, so that users can download and start running from there.
After downloading and untarring file, you will see the following files/directories
prostate_example
bamfile_list.tsv: A table of SRT slices with the paths to jointly identify clones and CNAs.
data: Spacerange directories of the SRT slices.
snpinfo: Allele count matrices from preprocessing Snakemake pipeline.
configuration_purity: Parameter configuration of CalicoST for inferring tumor proportion per spot.
configuration_cna_multi: Parameter configuration of CalicoST for inferring clones and CNAs.
estimate_tumor_prop: An empty directory to store future results for estimating tumor proportion.
calicost: An empty directory to store future results for inferred clones and CNAs.
Download and untar the data by
tar -xzvf <CalicoST git-cloned directory>/examples/prostate_example.tar.gz
Estimate tumor proportion per spot
CalicoST can estimate tumor proportions based on the BAF signals. To use CalicoST for inferring tumor proportion, first replace “<Replace with CalicoST directory>” in the configuration_purity
file with the cloned CalicoST directory.
Then, run the following command in terminal
cd prostate_example
OMP_NUM_THREADS=1 python <CalicoST git-cloned directory>/src/calicost/estimate_tumor_proportion.py -c configuration_purity
This command takes about 1.5 hours to run and will generate an output file of estimate_tumor_prop/loh_estimator_tumor_prop.tsv
.
Load and visualize inferred tumor proportions
We load and visualize the estimated tumor proportions as follows.
import scanpy as sc
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import copy
# Modify the example_directory to be the path of the downloaded and untarred data.
example_directory = "./"
tumor_proportions = pd.read_csv(f'{example_directory}/estimate_tumor_prop/loh_estimator_tumor_prop.tsv', header=0, index_col=0, sep='\t')
tumor_proportions
Tumor | |
---|---|
BARCODES | |
AAACAAGTATCTCCCA-1_H12 | 0.050000 |
AAACAGGGTCTATATT-1_H12 | NaN |
AAACATTTCCCGGATT-1_H12 | 0.050000 |
AAACCGGGTAGGTACC-1_H12 | 0.825997 |
AAACCGTTCGTCCAGG-1_H12 | 0.940555 |
... | ... |
TTGTTCAGTGTGCTAC-1_H25 | 0.050000 |
TTGTTGTGTGTCAAGA-1_H25 | 0.188937 |
TTGTTTCACATCCAGG-1_H25 | 0.956014 |
TTGTTTCATTAGTCTA-1_H25 | 0.838851 |
TTGTTTCCATACAACT-1_H25 | 0.364009 |
13344 rows × 1 columns
slice_ids = ['H12', 'H14', 'H15', 'H21', 'H25']
directory_name = ['P1_H1_2_visium', 'P1_H1_4_visium', 'P1_H1_5_visium', 'P1_H2_1_visium', 'P1_H2_5_visium']
for i,s in enumerate(slice_ids):
# load spatial locations
# note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
print(df.head())
# combine the position data frame with the tumor proportion dataframe
slice_tumor_proportions = tumor_proportions[tumor_proportions.index.str.endswith(s)]
df = df.join( slice_tumor_proportions.rename(index=lambda x:x.split("_")[0]) )
# plot
fig, axes = plt.subplots(1, 1, facecolor='white')
sns.scatterplot(x=df.array_row, y=df.array_col, hue=df.Tumor, palette='magma_r', linewidth=0, s=10, legend=False, ax=axes)
norm = plt.Normalize(np.nanmin(df.Tumor.values), np.nanmax(df.Tumor.values))
axes.figure.colorbar( plt.cm.ScalarMappable(cmap='magma_r', norm=norm), ax=axes )
axes.set_title(s)
plt.show()
in_tissue array_row array_col pxl_row_in_fullres \
barcode
ACGCCTGACACGCGCT-1 0 0 0 1466
TACCGATCCAACACTT-1 0 1 1 1592
ATTAAAGCGGACGAGC-1 0 0 2 1466
GATAAGGGACGATTAG-1 0 1 3 1592
GTGCAAATCACCAATA-1 0 0 4 1466
pxl_col_in_fullres
barcode
ACGCCTGACACGCGCT-1 1298
TACCGATCCAACACTT-1 1370
ATTAAAGCGGACGAGC-1 1443
GATAAGGGACGATTAG-1 1515
GTGCAAATCACCAATA-1 1588
in_tissue array_row array_col pxl_row_in_fullres \
barcode
ACGCCTGACACGCGCT-1 0 0 0 1831
TACCGATCCAACACTT-1 0 1 1 1983
ATTAAAGCGGACGAGC-1 0 0 2 1831
GATAAGGGACGATTAG-1 0 1 3 1983
GTGCAAATCACCAATA-1 0 0 4 1831
pxl_col_in_fullres
barcode
ACGCCTGACACGCGCT-1 1544
TACCGATCCAACACTT-1 1631
ATTAAAGCGGACGAGC-1 1718
GATAAGGGACGATTAG-1 1805
GTGCAAATCACCAATA-1 1892
in_tissue array_row array_col pxl_row_in_fullres \
barcode
ACGCCTGACACGCGCT-1 0 0 0 1593
TACCGATCCAACACTT-1 0 1 1 1720
ATTAAAGCGGACGAGC-1 0 0 2 1593
GATAAGGGACGATTAG-1 0 1 3 1719
GTGCAAATCACCAATA-1 0 0 4 1593
pxl_col_in_fullres
barcode
ACGCCTGACACGCGCT-1 1172
TACCGATCCAACACTT-1 1245
ATTAAAGCGGACGAGC-1 1317
GATAAGGGACGATTAG-1 1390
GTGCAAATCACCAATA-1 1462
in_tissue array_row array_col pxl_row_in_fullres \
barcode
ACGCCTGACACGCGCT-1 0 0 0 1830
TACCGATCCAACACTT-1 0 1 1 1982
ATTAAAGCGGACGAGC-1 0 0 2 1831
GATAAGGGACGATTAG-1 0 1 3 1982
GTGCAAATCACCAATA-1 0 0 4 1831
pxl_col_in_fullres
barcode
ACGCCTGACACGCGCT-1 1523
TACCGATCCAACACTT-1 1610
ATTAAAGCGGACGAGC-1 1697
GATAAGGGACGATTAG-1 1784
GTGCAAATCACCAATA-1 1871
in_tissue array_row array_col pxl_row_in_fullres \
barcode
ACGCCTGACACGCGCT-1 0 0 0 1948
TACCGATCCAACACTT-1 0 1 1 2100
ATTAAAGCGGACGAGC-1 0 0 2 1948
GATAAGGGACGATTAG-1 0 1 3 2100
GTGCAAATCACCAATA-1 0 0 4 1947
pxl_col_in_fullres
barcode
ACGCCTGACACGCGCT-1 1441
TACCGATCCAACACTT-1 1529
ATTAAAGCGGACGAGC-1 1616
GATAAGGGACGATTAG-1 1705
GTGCAAATCACCAATA-1 1791
Run CalicoST to infer CNAs and cancer clones based on estimated tumor proportions
To infer cancer clones and CNAs by CalicoST, first replace “<Replace with CalicoST directory>” in the configuration_cna_multi
file with the cloned CalicoST directory. Note that the parameter configuration takes in the estimated tumor proportion in the previous section by specifying
“tumorprop_file : estimate_tumor_prop/loh_estimator_tumor_prop.tsv”.
Then, run the following command in terminal:
OMP_NUM_THREADS=1 python <CalicoST directory>/src/calicost/calicost_main.py -c configuration_cna_multi
This command takes about 8h to finish.
Load CalicoST-generated result tables
<output_dir>/clone5_rectangle0_w1.0/clone_labels.tsv
stores the inferred cancer clones for each spot. Note that spots with a low tumor purity will also be assigned to a cancer clone, despite that the inferred cancer clone label is not very meaningful for those nearly normal spots.
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df
clone_label | tumor_proportion | |
---|---|---|
BARCODES | ||
AAACAAGTATCTCCCA-1_H12 | 3 | 0.050000 |
AAACAGGGTCTATATT-1_H12 | 3 | NaN |
AAACATTTCCCGGATT-1_H12 | 3 | 0.050000 |
AAACCGGGTAGGTACC-1_H12 | 3 | 0.825997 |
AAACCGTTCGTCCAGG-1_H12 | 3 | 0.940555 |
... | ... | ... |
TTGTTCAGTGTGCTAC-1_H25 | 2 | 0.050000 |
TTGTTGTGTGTCAAGA-1_H25 | 2 | 0.188937 |
TTGTTTCACATCCAGG-1_H25 | 1 | 0.956014 |
TTGTTTCATTAGTCTA-1_H25 | 1 | 0.838851 |
TTGTTTCCATACAACT-1_H25 | 2 | 0.364009 |
13344 rows × 2 columns
<output_dir>/clone5_rectangle0_w1.0/cnv_seglevel.tsv
stores the allele-specific copy numbers of each genome bin within each clone. Each row is a genome bin, the columns containing the chromosome, start, and end position of the genome bin, and the inferred the A and B allele copy number within each cancer clone.
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_seglevel.tsv", header=0, index_col=None, sep='\t')
df
CHR | START | END | clone1 A | clone1 B | clone2 A | clone2 B | clone3 A | clone3 B | clone4 A | clone4 B | clone5 A | clone5 B | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 89295 | 1419136 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | 1 | 1434861 | 1440568 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1449689 | 1496123 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
3 | 1 | 1512162 | 1721078 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4 | 1 | 1724838 | 2308568 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2255 | 22 | 46360834 | 48898361 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2256 | 22 | 49773283 | 49963978 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2257 | 22 | 50089879 | 50180213 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2258 | 22 | 50185915 | 50292030 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2259 | 22 | 50309030 | 50783625 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2260 rows × 13 columns
<output_dir>/clone5_rectangle0_w1.0/cnv_genelevel.tsv
stores the allele-specific copy number for each gene. The copy number per gene is derived from projecting the allele-specific copy numbers of each genome bin to the spanned genes that have enough expression to be retained after CalicoST gene filtering step.
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/cnv_genelevel.tsv", header=0, index_col=None, sep='\t')
df
gene | clone1 A | clone1 B | clone2 A | clone2 B | clone3 A | clone3 B | clone4 A | clone4 B | clone5 A | clone5 B | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL627309.1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
1 | AL627309.5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | LINC01409 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
3 | LINC01128 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
4 | LINC00115 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15035 | CHKB-DT | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
15036 | MAPK8IP2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
15037 | ARSA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
15038 | SHANK3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
15039 | RABL2B | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
15040 rows × 11 columns
Visualize CalicoST-generated plots
Once CalicoST is finished running, it generates the plots of spatial organization of clones and allele-specific copy numbers. The plots are in PDF format and can be directly viewed. Below, we load the PDF plots in this notebook for easy visualization.
from wand.image import Image as WImage
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/clone_spatial.pdf", resolution=100)
img
# allele-specific copy numbers of each clone (the color scheme is the same as Fig2c
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/acn_genome.pdf", resolution=120)
img
# RDR-BAF plot along the genome for each clone
img = WImage(filename=f"{example_directory}/calicost/clone5_rectangle0_w1.0/plots/rdr_baf_defaultcolor.pdf", resolution=120)
img
Reconstruct tumor phylogeny and phylogeography
We use an existing phylogeny reconstruction method, Startle by Sashittal et al, to infer a phylogeny of CalicoST-inferred cancer clones. To reconstruct a phylogeny based CalicoST results of the first random initialization, run the following command in shell:
mkdir calicost/phylogeny_clone5_rectangle0_w1.0
python <CalicoST code directory>/src/calicost/phylogeny_startle.py -c calicost/clone5_rectangle0_w1.0 -s <startle executable path> -o calicost/phylogeny_clone5_rectangle0_w1.0/
The above run of Startle will produce a plain-text file calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick
that encodes phylogeny tree with leaf nodes as CalicoST-inferred clones. We load the tree file as follows.
with open(f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick", 'r') as fp:
print( fp.readlines() )
['((clone1:0,clone2:3):4,(clone3:1,(clone4:3,clone5:3):9):2);']
Now we project the phylogenetic tree in space to get a phylogeography. Before getting the phylogeography, we note that we currently don’t have the relative positioning among the five slices yet. We manually place the five slices according to Fig 1b in the original publication by Erickon et al., and transform the x/y coordinate in the tissue_positions.csv
file according to the new positioning.
# load coordinates and inferred cancer clones
coords = []
for i,s in enumerate(slice_ids):
# load spatial locations
# note that scanpy is incompatible with the latest tissue_positions.csv file, we directly load the positions as pandas data frame
df = pd.read_csv(f'{example_directory}/data/{directory_name[i]}/spatial/tissue_positions.csv', header=0, index_col=0, sep=',')
df.index = df.index + "_" + s
df['slice_id'] = s
coords.append( df )
coords = pd.concat(coords)
# combine with the cancer clone table
df = pd.read_csv(f"{example_directory}/calicost/clone5_rectangle0_w1.0/clone_labels.tsv", header=0, index_col=0, sep='\t')
df.clone_label = 'clone' + df.clone_label.astype(str)
coords = coords.join(df)
# remove spots that are not assigned to clones by CalicoST (filtered out due to low UMI count or SNP-covering UMI count)
coords = coords[coords.clone_label.notnull()]
coords
in_tissue | array_row | array_col | pxl_row_in_fullres | pxl_col_in_fullres | slice_id | clone_label | tumor_proportion | |
---|---|---|---|---|---|---|---|---|
barcode | ||||||||
TCCTTCAGTGGTCGAA-1_H12 | 1 | 15 | 69 | 3366 | 6308 | H12 | clone3 | 0.050000 |
GCGTCGAAATGTCGGT-1_H12 | 1 | 17 | 65 | 3618 | 6018 | H12 | clone3 | 0.132391 |
AACTGATATTAGGCCT-1_H12 | 1 | 16 | 66 | 3492 | 6090 | H12 | clone3 | 0.050000 |
CGAGCTGGGCTTTAGG-1_H12 | 1 | 17 | 67 | 3618 | 6163 | H12 | clone3 | 0.050000 |
GGGTGTTTCAGCTATG-1_H12 | 1 | 16 | 68 | 3492 | 6236 | H12 | clone3 | 0.050000 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
ATGGCCCGAAAGGTTA-1_H25 | 1 | 76 | 120 | 13480 | 12001 | H25 | clone2 | 1.000000 |
CGTAATATGGCCCTTG-1_H25 | 1 | 77 | 121 | 13632 | 12089 | H25 | clone2 | 1.000000 |
AGAGTCTTAATGAAAG-1_H25 | 1 | 76 | 122 | 13480 | 12176 | H25 | clone2 | 1.000000 |
ATTGAATTCCCTGTAG-1_H25 | 1 | 76 | 124 | 13479 | 12351 | H25 | clone2 | 1.000000 |
TTGAAGTGCATCTACA-1_H25 | 1 | 77 | 127 | 13630 | 12615 | H25 | clone2 | NaN |
13344 rows × 8 columns
def flip_axis(coords, axis):
max_x = np.max(coords[:,axis])
min_x = np.min(coords[:,axis])
tmp_coords = copy.copy(coords)
tmp_coords[:,axis] = min_x + max_x - coords[:,axis]
return tmp_coords
def rotate_by_angle(coords, angle):
theta = angle / 180 * np.pi
R = np.array([ [np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]] )
mean_coords = np.mean(coords, axis=0)
return (coords - mean_coords.reshape(1,-1)) @ R + mean_coords.reshape(1,-1)
adjusted_coords = copy.copy(coords[['array_row', 'array_col']].values)
# scale y coordinate so that the hexagon is not squeezed on one direction
adjusted_coords[:,0] = adjusted_coords[:,0] * np.sqrt(3)
# shift x and y coordinate to start from 0 for each slice
for s,sname in enumerate(slice_ids):
index = np.where(coords.slice_id.values == sname)[0]
adjusted_coords[index,0] -= np.min(adjusted_coords[index,0])
adjusted_coords[index,1] -= np.min(adjusted_coords[index,1])
# position in number of cubes
cube_length = min( np.max(adjusted_coords[:,0]), np.max(adjusted_coords[:,1]) )
sample_cube_pos = np.array([ [2,0], #H12
[4, 0.2], #H14
[5,0.5], #H15
[0,1], #H21
[5,1.5] ]) #H25
swap_x_y = [False, True, True, False, True]
rotation_angle = [15,-5,-5,0,-5] # H12, H14, H15, H21, H25
full_adj_coords = np.zeros(adjusted_coords.shape)
for s,sname in enumerate(slice_ids):
index = np.where(coords.slice_id.values == sname)[0]
if swap_x_y[s]:
tmp_coords = np.vstack([adjusted_coords[index,1],adjusted_coords[index,0]]).T
if sname != "H25":
tmp_coords = flip_axis(tmp_coords, axis=0 )
tmp_coords = flip_axis(tmp_coords, axis=1)
full_adj_coords[index,:] = tmp_coords + cube_length * sample_cube_pos[s]
else:
full_adj_coords[index,:] = adjusted_coords[index,:] + cube_length * sample_cube_pos[s]
full_adj_coords[index,:] = rotate_by_angle(full_adj_coords[index,:], rotation_angle[s])
coords['final_x'] = full_adj_coords[:,0]
coords['final_y'] = full_adj_coords[:,1]
coords
in_tissue | array_row | array_col | pxl_row_in_fullres | pxl_col_in_fullres | slice_id | clone_label | tumor_proportion | final_x | final_y | |
---|---|---|---|---|---|---|---|---|---|---|
barcode | ||||||||||
TCCTTCAGTGGTCGAA-1_H12 | 1 | 15 | 69 | 3366 | 6308 | H12 | clone3 | 0.050000 | 238.417652 | 74.069483 |
GCGTCGAAATGTCGGT-1_H12 | 1 | 17 | 65 | 3618 | 6018 | H12 | clone3 | 0.132391 | 241.246079 | 69.170504 |
AACTGATATTAGGCCT-1_H12 | 1 | 16 | 66 | 3492 | 6090 | H12 | clone3 | 0.050000 | 239.573047 | 70.654068 |
CGAGCTGGGCTTTAGG-1_H12 | 1 | 17 | 67 | 3618 | 6163 | H12 | clone3 | 0.050000 | 241.763717 | 71.102355 |
GGGTGTTTCAGCTATG-1_H12 | 1 | 16 | 68 | 3492 | 6236 | H12 | clone3 | 0.050000 | 240.090685 | 72.585919 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ATGGCCCGAAAGGTTA-1_H25 | 1 | 76 | 120 | 13480 | 12001 | H25 | clone2 | 1.000000 | 700.743520 | 183.090182 |
CGTAATATGGCCCTTG-1_H25 | 1 | 77 | 121 | 13632 | 12089 | H25 | clone2 | 1.000000 | 701.914026 | 181.184948 |
AGAGTCTTAATGAAAG-1_H25 | 1 | 76 | 122 | 13480 | 12176 | H25 | clone2 | 1.000000 | 702.735909 | 183.264493 |
ATTGAATTCCCTGTAG-1_H25 | 1 | 76 | 124 | 13479 | 12351 | H25 | clone2 | 1.000000 | 704.728299 | 183.438805 |
TTGAAGTGCATCTACA-1_H25 | 1 | 77 | 127 | 13630 | 12615 | H25 | clone2 | NaN | 707.891194 | 181.707882 |
13344 rows × 10 columns
import calicost.utils_plotting
fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
plt.gca().invert_yaxis()
fig.show()
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
Now we project the phylogeny to the space of coords[[‘final_x’, ‘final_y’]] and infer ancestor locations using a Gaussian diffusion model.
import calicost.phylogeography
newick_file = f"{example_directory}/calicost/phylogeny_clone5_rectangle0_w1.0/loh_tree.newick"
t = calicost.phylogeography.project_phylogeneny_space(newick_file, coords[['final_x', 'final_y']].values, coords.clone_label.values,
single_tumor_prop=coords.tumor_proportion.values, sample_list=slice_ids, sample_ids=coords.slice_id.values)
print( t )
root node is ancestor1_2_3_4_5
a list of leaf nodes: ['clone1', 'clone2', 'clone3', 'clone4', 'clone5']
a list of internal nodes: ['ancestor1_2_3_4_5', 'ancestor1_2', 'ancestor3_4_5', 'ancestor4_5']
/-clone1
/-|
| \-clone2
--|
| /-clone3
\-|
| /-clone4
\-|
\-clone5
# plot clones in space with the phylogeography
fig = calicost.utils_plotting.plot_individual_spots_in_space(full_adj_coords, coords.clone_label, coords.tumor_proportion, base_width=10, base_height=5)
axes = plt.gca()
# clone centers + ancestors
for node in t.traverse():
axes.scatter( node.x, -node.y, marker="D", linewidth=2, edgecolor='black', facecolor="None", s=50)
# edges
for node in t.iter_leaves():
while not node.is_root():
p = node.up
if np.abs(node.x - p.x) + np.abs(node.y - p.y) > 1:
axes.annotate("", xy=(node.x, -node.y), xytext=(p.x, -p.y), arrowprops=dict(mutation_scale=15, lw=1, arrowstyle="->", color="black"))
node = p
axes.invert_yaxis()
fig.show()
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1404: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
quantile_colors = this_full_cmap(np.array([0, np.min(copy_single_tumor_prop[idx]), np.max(copy_single_tumor_prop[idx]), 1]))
/n/fs/ragr-data/users/congma/temp/CalicoST/src/calicost/utils_plotting.py:1407: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
seaborn.scatterplot(x=shifted_coords[idx,0], y=-shifted_coords[idx,1], s=10, hue=copy_single_tumor_prop[idx], palette=this_cmap, linewidth=0, legend=None, ax=axes)