Specification of running parameters of CalicoST
===============================================

Supporting reference files
--------------------------
geneticmap_file: str
    The path to genetic map file.

hgtable_file: str
    The path to the location of genes in the genome. This file should be a tab-delimited file with the following columns: gene_name, chrom, cdsStart, cdsEnd.

normalidx_file: str, optional
    The path to the file containing the indices of normal spots in the spatial transcriptomics data. Each line is a single index without header.

tumorprop_file: str, optional
    The path to inferred tumor proportions per spot. This file should be a tab-delimited file with the following columns names: barcode, Tumor.

filtergenelist_file: str, optional
    The file to a list of genes to exclude from CNA inference, based on prior knowledge.

filterregion_file: str, optional
    The file to a list of genomic regions to exclude from CNA inference in BED format. E.g., HLA regions.


Phasing parameters
------------------
logphase_shift: float, optional
    Adjustment to the strength of Markov Model self-transition in phasing. The higher the value, the higher self-transition probability. Default is -2.0.

secondary_min_umi: int, optional
    The minimum UMI count a genome segment has in pseudobulk of spots in the step of genome segmentation. Default is 300.


Clone inference parameters
--------------------------
n_clones: int
    The number of clones to infer using only BAF signals. Default is 3.

n_clones_rdr: int, optional
    The number of clones to refine for each BAF-identified clone using RDR and BAF signals. Default is 2.

min_spots_per_clone: int, optional
    The minimum number of spots required to call a clone should have. Default is 100.

min_avgumi_per_clone: int, optional
    The minimum average UMI count required for a clone. Default is 10.

maxspots_pooling: int, optional
    If the UMI counts per spot are too low, CalicoST will pool this number of adjacent spots to infer the clone assignment at each HMRF step. Default is 7.

nodepotential: str, optional
    One of the following two options: "max" or "weighted_sum". "max" refers to using the MLE decoding of HMM in evaluating the probability of spots being in each clone. "weighted_sum" refers to using the full HMM posterior probabilities to evaluate the probability of spots being in each clone. Default is "weighted_sum".

spatial_weight: float, optional
    The strength of spatial coherence in HMRF. The higher the value, the stronger the spatial coherence. Default is 1.0.

construct_adjacency_method: str, optional
    Choosing from one of the two methods to construct the adjacency graph for HMRF, "hexagon" or "KNN". "hexagon" assumes the spot localization forms a hexagonal grid as in Visium platform. "KNN" assumes the spot localization is arbitrary and uses K-nearest neighbors to construct the adjacency graph. Default is "hexagon".

construct_adjacency_w: float, optional
    If using KNN to construct the adjacency matrix, CalicoST allows combining the spatial similarity with the expression similarity for the adjacency matrix. This weight, ranging between 0 and 1, specifies the weight of spatial similarity.  Default is 1.0.


CNA inference parameters
------------------------
n_states: int
    The number of allele-specific copy number states in the HMM for CNA inference.

t: float, optional
    The self-transition probability of HMM. The higher the value, the higher probability that adjacent genome segments are in the same CNA state. Default is 1-1e-5.

max_iter: int, optional
    The number of Baum-Welch steps to perform in HMM. Default is 30.

tol: float, optional
    The convergence threshold to terminate Baum-Welch steps. Default is 1e-4.


Merging clones with similar CNAs
--------------------------------
np_threshold: float, optional
    The threshold of Neyman Pearson statistics to decide two clones have distinct CNA events. The higher the value, the two clones are merged more easily. Default is 1.0.

np_eventminlen: int, optional
    The minimum number of consecutive genome segments to be considered as a CN event. Default is 10.