--- title: "CINmetrics" author: "Vishal H. Oza" date: "2-Dec-2020" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{CINmetrics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # CINmetrics The goal of CINmetrics package is to provide different methods of calculating Chromosomal Instability (CIN) metrics from the literature that can be applied to any cancer data set including The Cancer Genome Atlas. ```{r setup} library(CINmetrics) ``` The dataset provided with CINmetrics package is masked Copy Number variation data for Breast Cancer for 10 unique samples selected randomly from TCGA. ```{r} dim(maskCNV_BRCA) ``` Alternatively, you can download the entire dataset from TCGA using TCGAbiolinks package ```{r} ## Not run: #library(TCGAbiolinks) #query.maskCNV.hg39.BRCA <- GDCquery(project = "TCGA-BRCA", # data.category = "Copy Number Variation", # data.type = "Masked Copy Number Segment", legacy=FALSE) #GDCdownload(query = query.maskCNV.hg39.BRCA) #maskCNV.BRCA <- GDCprepare(query = query.maskCNV.hg39.BRCA, summarizedExperiment = FALSE) #maskCNV.BRCA <- data.frame(maskCNV.BRCA, stringsAsFactors = FALSE) #tai.test <- tai(cnvData = maskCNV.BRCA) ## End(Not run) ``` ## Total Aberration Index *tai* calculates the Total Aberration Index (TAI; [Baumbusch LO, et. al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553118/)), "a measure of the abundance of genomic size of copy number changes in a tumour". It is defined as a weighted sum of the segment means ($|\bar{y}_{S_i}|$). Biologically, it can also be interpreted as the absolute deviation from the normal copy number state averaged over all genomic locations. $$ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot |{\bar{y}_{S_i}}|} {\sum^{R}_{i = 1} {d_i}}\ \ where |\bar{y}_{S_i}| \ge |\log_2 1.7| $$ ```{r} tai.test <- tai(cnvData = maskCNV_BRCA) head(tai.test) ``` ## Modified Total Aberration Index *taiModified* calculates a modified Total Aberration Index using all sample values instead of those in aberrant copy number state, thus does not remove the directionality from the score. $$ Modified\ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot {\bar{y}_{S_i}}} {\sum^{R}_{i = 1} {d_i}} $$ ```{r} modified.tai.test <- taiModified(cnvData = maskCNV_BRCA) head(modified.tai.test) ``` ## Copy Number Aberration *cna* calculates the total number of copy number aberrations (CNA; [Davidson JM, et. al.](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0079079)), defined as a segment with copy number outside the pre-defined range of 1.7-2.3 ($(\log_2 1.7 -1) \le \bar{y}_{S_i} \le (\log_2 2.3 -1)$) that is not contiguous with an adjacent independent CNA of identical copy number. For our purposes, we have adapted the range to be $|\bar{y}_{S_i}| \ge |\log_2 1.7|$, which is only slightly larger than the original. This metric is very similar to the number of break points, but it comes with the caveat that adjacent segments need to have a difference in segmentation mean values. $$ Total\ Copy\ Number\ Aberration = \sum^{R}_{i = 1} n_i \ \ where\ \ \begin{align} |\bar{y}_{S_i}| \ge |\log_2{1.7}|, \\ |\bar{y}_{S_{i-1}} - \bar{y}_{S_i}| \ge 0.2, \\ d_i \ge 10 \end{align} $$ ```{r} cna.test <- cna(cnvData = maskCNV_BRCA) head(cna.test) ``` ## Counting Altered Base segments *countingBaseSegments* calculates the number of altered bases defined as the sums of the lengths of segments ($d_i$) with an absolute segment mean ($|\bar{y}_{S_i}|$) of greater than 0.2. Biologically, this value can be thought to quantify numerical chromosomal instability. This is also a simpler representation of how much of the genome has been altered, and it does not run into the issue of sequencing coverage affecting the fraction of the genome altered. $$ Number\ of\ Altered\ Bases = \sum^{R}_{i = 1} d_i\ where\ |\bar{y}_{S_i}| \ge 0.2 $$ ```{r} base.seg.test <- countingBaseSegments(cnvData = maskCNV_BRCA) head(base.seg.test) ``` ## Counting Number of Break Points *countingBreakPoints* calculates the number of break points defined as the number of segments ($n_i$) with an absolute segment mean greater than 0.2. This is then doubled to account for the 5' and 3' break points. Biologically, this value can be thought to quantify structural chromosomal instability. $$ Number\ of \ Break\ Points = \sum^{R}_{i = 1} (n_i \cdot 2)\ where\ |\bar{y}_{S_i}| \ge 0.2 $$ ```{r} break.points.test <- countingBreakPoints(cnvData = maskCNV_BRCA) head(break.points.test) ``` ## Fraction of Genome Altered *fga* calculates the fraction of the genome altered (FGA; [Chin SF, et. al.](https://pubmed.ncbi.nlm.nih.gov/17925008/)), measured by taking the sum of the number of bases altered and dividing it by the genome length covered ($G$). Genome length covered was calculated by summing the lengths of each probe on the Affeymetrix 6.0 array. This calculation **excludes** sex chromosomes. $$ Fraction\ Genome\ Altered = \frac {\sum^{R}_{i = 1} d_i} {G} \ \ where\ |\bar{y}_{S_i}| \ge 0.2 $$ ```{r} fraction.genome.test <- fga(cnvData = maskCNV_BRCA) head(fraction.genome.test) ``` ## CINmetrics *CINmetrics* calculates tai, cna, number of altered base segments, number of break points, and fraction of genome altered and returns them as a single data frame. ```{r} cinmetrics.test <- CINmetrics(cnvData = maskCNV_BRCA) head(cinmetrics.test) ```