Given a gene expression matrix and a 0-1 vector indicating the distant metastasis
status of samples, hack_cinsarc()
classifies samples into one of two risk
classes, C1 or C2, using the CINSARC signature as implemented in
Chibon et al., 2010.
Arguments
- expr_data
A normalized gene expression matrix (or data frame) with gene symbols as row names and samples as columns.
- dm_status
A numeric vector specifying whether a sample has either (1) or not (0) developed distant metastasis.
Value
A tibble with one row for each sample in expr_data
and two columns:
sample_id
and cinsarc_class
.
Details
CINSARC (Complexity INdex in SARComas) is a prognostic 67-gene signature related to mitosis and control of chromosome integrity. It was developed to improve metastatic outcome prediction in soft tissue sarcomas over the FNCLCC (Fédération Francaise des Centres de Lutte Contre le Cancer) grading system.
Algorithm
The CINSARC method implemented in hacksig
makes use of leave-one-out cross
validation (LOOCV) to classify samples into C1/C2 risk groups (see Lesluyes & Chibon, 2020).
First, gene expression values are centered by their mean across samples.
Then, for each iteration of the LOOCV, mean normalized gene values are computed
by metastasis group (i.e. compute the metastatic centroids). Then, one minus the
Spearman's correlation between centered samples and metastatic centroids are computed.
Finally, if a sample is more correlated to the non-metastatic centroid, then
it is assigned to the C1 class (low risk). Conversely, if a sample is more
correlated to the metastatic centroid, then it is assigned to the C2 class (high risk).
References
Chibon, F., Lagarde, P., Salas, S., Pérot, G., Brouste, V., Tirode, F., Lucchesi, C., de Reynies, A., Kauffmann, A., Bui, B., Terrier, P., Bonvalot, S., Le Cesne, A., Vince-Ranchère, D., Blay, J. Y., Collin, F., Guillou, L., Leroux, A., Coindre, J. M., & Aurias, A. (2010). Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity. Nature medicine, 16(7), 781–787. doi:10.1038/nm.2174 .
Lesluyes, T., & Chibon, F. (2020). A Global and Integrated Analysis of CINSARC-Associated Genetic Defects. Cancer research, 80(23), 5282–5290. doi:10.1158/0008-5472.CAN-20-0512 .
Examples
# generate random distant metastasis outcome
set.seed(123)
test_dm_status <- sample(c(0, 1), size = ncol(test_expr), replace = TRUE)
hack_cinsarc(test_expr, test_dm_status)
#> # A tibble: 20 × 2
#> sample_id cinsarc_class
#> <chr> <chr>
#> 1 sample1 C2
#> 2 sample2 C1
#> 3 sample3 C2
#> 4 sample4 C1
#> 5 sample5 C2
#> 6 sample6 C1
#> 7 sample7 C1
#> 8 sample8 C1
#> 9 sample9 C1
#> 10 sample10 C2
#> 11 sample11 C1
#> 12 sample12 C1
#> 13 sample13 C1
#> 14 sample14 C2
#> 15 sample15 C1
#> 16 sample16 C2
#> 17 sample17 C1
#> 18 sample18 C2
#> 19 sample19 C1
#> 20 sample20 C2