The PCAPAM50 pipeline consists of two steps: First, creating a gene expression-guided ER-balanced subset to make intermediate subtype calls, and second, using these intermediate subtype calls to perform a refined intrinsic subtyping called PCAPAM50. This page focuses on the PCAPAM50 approach. For instructions on the Conventional PAM50 approach, please visit its respective page.
This function processes clinical IHC subtyping data and PAM50 gene expression data to form a gene expression-guided ER-balanced set. This set is created by combining IHC classification information and using principal component 1 (PC1) to guide the separation. The function computes the median for each gene in this ER-balanced set, updates a calibration file, and runs subtype prediction algorithms to generate intermediate intrinsic subtype calls based on the PAM50 method. Various diagnostics and subtyping results are returned.
1) Load the Test data:
The test data is derived from the TCGA breast cancer dataset. The test matrix is an upper-quartile (UQ) normalized log2(x+1) transformed dataset of PAM50 gene expression from RNA-Seq data. It is recommended to perform UQ normalization and log2 transformation on your input matrix to closely align with the scale of PAM50 centroids.
data_path <- system.file("extdata", "Sample_IHC_PAM-Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix
2) Prepare the Data:
Ensure the clinical subtype data frame has a column “PatientID” matching the column names of the matrix. The IHC subtype column should be named “IHC,” with ER-positive subtypes starting with “L” (for luminals) and ER-negative subtypes not starting with “L.” In the test data, ER-positive cases are labeled “LA,” “LB1,” “LB2,” and ER-negative cases are labeled “TN” and “Her2+”.
The data must be sorted properly:
Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),]
Display the sorted data:
Test.ihc$ER_status=factor(Test.ihc$ER_status, levels=c("pos", "neg")) Test.ihc$IHC=factor(Test.ihc$IHC, levels=c("TN", "Her2+", "LA", "LB1", "LB2")) table(Test.ihc$ER_status, Test.ihc$IHC) # TN Her2+ LA LB1 LB2 # pos 0 0 19 65 27 # neg 23 7 0 0 0
Let’s examine the matrix. First, sort the test matrix using the IHC dataframe:
Test.matrix <- Test.matrix[, Test.ihc$PatientID]
Next, check the dimensions of the Test.matrix:
dim(Test.matrix) #[1] 50 141
This matrix contains the 50 PAM50 genes and expression values for 141 samples.
Important note: Ensure that your input matrix is also matched with the 50 gene names provided in the test matrix.
3) Create the Clinical Subtype Data Frame:
Create a clinical subtype data frame using the provided test files. The inputDir determines the output folder.
df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) inputDir <- "Call.PC1"
4) Call the Function:
Run the makeCalls.PC1ihc function. Refer to the manual for detailed documentation on usage and arguments. Example run on test data:
res.PC1 <- makeCalls.PC1ihc(df.cln = df.cln, seed = 118, mat = Test.matrix, inputDir = inputDir)
The function returns a list containing:
- Int.sbs - Data frame with integrated subtype and clinical data. - score.fl - Data frame with scores from subtype predictions. - mdns.fl - Data frame with median values for each gene in the ER-balanced set. - SBS.colr - Colors associated with each subtype from the prediction results. - outList - Detailed results from subtype prediction functions. - PC1cutoff - Cutoff values for PC1 used in subsetting. - DF.PC1 - Data frame of initial PCA results merged with clinical data.
It generates a plot within the inputDir folder displaying the percentage of misclassified IHC cases along the PC1 axis with the vertical line identified as the cutoff.
A heatmap is also generated within the inputDir folder.
This function uses the intermediate intrinsic subtype calls to create an ER-balanced set. It internally selects an equal number of Basal and LumA cases to form this subset.
1) Call the Function:
Call the function makeCalls.v1PAM() on test data. Refer to the manual for detailed documentation on usage and arguments.
df.pc1pam = data.frame(PatientID=res.PC1$Int.sbs$PatientID, PAM50=res.PC1$Int.sbs$Int.SBS.Mdns.PC1ihc, IHC=res.PC1$Int.sbs$IHC, stringsAsFactors=F) ### IHC column is optional inputDir <- "Calls.PCAPAM50" res.PCAPAM50 <- makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = Test.matrix, inputDir=inputDir)
The function returns a list containing:
- Int.sbs - Data frame with integrated subtype and clinical data. - score.fl - Data frame with scores from subtype predictions. - mdns.fl - Data frame with median values for each gene in the ER-balanced set. - SBS.colr - Colors associated with each subtype from the prediction results. - outList - Detailed results from subtype prediction functions.
A heatmap is generated within the inputDir folder.