Comparing Results – PCAPAM50 vs PAM50

Comparing Results

This section demonstrates the improvement in the accuracy of PCAPAM50 subtyping over PAM50 subtyping.


Test Data – PCAPAM50 vs PAM50

For the test data, we compare the accuracy of PCAPAM50 subtyping over PAM50 subtyping by comparing the intrinsic subtype calls to the IHC calls.

Below is the comparison of test data PAM50 calls and IHC subtype calls agreement:

res.PAM50$Int.sbs$IHC = factor(res.PAM50$Int.sbs$IHC, levels = c("TN", "HER2+", "LA", "LB1", "LB2"))
addmargins(table(res.PAM50$Int.sbs$Int.SBS.Mdns.PAM50, res.PAM50$Int.sbs$IHC))
#          TN HER2+  LA LB1 LB2 Sum
#  Basal   20     3   0   4   0  27
#  Her2     2     3   0   0   4   9
#  LumA     0     0  17  33  14  64
#  LumB     1     0   2  26   9  38
#  Normal   0     1   0   2   0   3
#  Sum     23     7  19  65  27 141 

Agreements:

Basal.TN agreement = 20/27
Her2.HER2+ agreement = 3/9
LumA.LA agreement = 17/64
LumB.LB1/LB2 agreement = 35/38

Here is the comparison of test data PCAPAM50 calls and IHC subtype calls agreement:

res.PCAPAM50$Int.sbs$IHC=toupper(res.PCAPAM50$Int.sbs$IHC)
res.PCAPAM50$Int.sbs$IHC = factor(res.PCAPAM50$Int.sbs$IHC,levels = c("TN", "HER2+", "LA", "LB1", "LB2"))
addmargins(table(res.PCAPAM50$Int.sbs$Int.SBS.Mdns.PCAPAM50, res.PCAPAM50$Int.sbs$IHC))
#          TN HER2+  LA LB1 LB2 Sum
#  Basal   20     3   0   4   0  27
#  Her2     2     4   0   0   4  10
#  LumA     0     0  17  27  11  55
#  LumB     1     0   2  33  12  48
#  Normal   0     0   0   1   0   1
#  Sum     23     7  19  65  27 141 

Agreements:

Basal.TN agreement = 20/27
Her2.HER2+ agreement = 4/10
LumA.LA agreement = 17/55
LumB.LB1/LB2 = 45/48

Upon comparison, we can see that PCAPAM50 has an overall agreement of 86/141 = 61%, which is an improvement of 7% over the PAM50 calls. If you closely examine the results, you will notice that the consistency of IHC LB subtype with LumB increased by 28.5%(35 to 45) with PCAPAM50.


TCGA Data – PCAPAM50 vs PAM50

In order to demonstrate improvements and replicate the results from the original paper, we obtained the 712 TCGA breast cancer cases from the original study where we derived IHC calls. This data is provided with the PCAPAM50 package. Follow the examples below to run the analysis. The conventional PAM50 calls obtained in the paper are included in the dataset, so only running PCAPAM50 is necessary to show the improvement.


1) Load TCGA data:

tcga_data_path <- system.file("extdata", "TCGA.712BC_IHC_PAM-Mat.Rdat", package = "PCAPAM50")

load(tcga_data_path) # Loads "TCGA.712BC.IHC"  "TCGA.712BC.matrix"

2) Prepare the data:

TCGA.712BC.IHC$ER_status <- rep("NA", length(TCGA.712BC.IHC$PatientID))

TCGA.712BC.IHC$ER_status[grep("^L",TCGA.712BC.IHC$IHC)] = "pos"

TCGA.712BC.IHC$ER_status[-grep("^L",TCGA.712BC.IHC$IHC)] = "neg"

TCGA.712BC.IHC <- TCGA.712BC.IHC[order(TCGA.712BC.IHC$ER_status, decreasing = TRUE),]

Display the sorted data:

TCGA.712BC.IHC$ER_status=factor(TCGA.712BC.IHC$ER_status, levels=c("pos", "neg"))
TCGA.712BC.IHC$IHC=factor(TCGA.712BC.IHC$IHC, levels=c("TN", "Her2+", "LA", "LB1", "LB2"))

table(TCGA.712BC.IHC$ER_status, TCGA.712BC.IHC$IHC)
#       TN Her2+  LA LB1 LB2
#  pos   0     0 111 325 123
#  neg 116    37   0   0   0 

Let’s examine the matrix. First, sort the test matrix using the IHC dataframe:

TCGA.712BC.matrix <- TCGA.712BC.matrix[, TCGA.712BC.IHC$PatientID]

dim(TCGA.712BC.matrix)
#[1]  50 712

This matrix has the 50 PAM50 genes and 712 sample expression values. This is already an upper-quartile (UQ) normalized log2(x+1) transformed dataset of PAM50 gene expression from RNA-Seq data.


3) Create the Clinical Subtype Data Frame:

df.tcga.cln <- data.frame(PatientID = TCGA.712BC.IHC$PatientID, IHC = TCGA.712BC.IHC$IHC, stringsAsFactors = FALSE)

inputDir <- "Call.PC1.TCGA"

4) Call the Function:

res.PC1 <- makeCalls.PC1ihc(df.cln = df.tcga.cln, seed = 118, mat = TCGA.712BC.matrix, inputDir = inputDir)

5) makeCalls.v1PAM:

df.pc1pam = data.frame(PatientID=res.PC1$Int.sbs$PatientID,
            PAM50=res.PC1$Int.sbs$Int.SBS.Mdns.PC1ihc,
            IHC=res.PC1$Int.sbs$IHC,
            stringsAsFactors=F) ### IHC column is optional

inputDir <- "Calls.PCAPAM50.TCGA" 

TCGA.712BC.matrix = TCGA.712BC.matrix[,df.pc1pam$PatientID]
res.PCAPAM50 <- makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = TCGA.712BC.matrix, inputDir=inputDir)

6) PCAPAM50 vs PAM50:

PAM50 vs IHC:

addmargins(table(TCGA.712BC.IHC$PAM50_Given.Mdns, TCGA.712BC.IHC$IHC))
#          TN Her2+  LA LB1 LB2 Sum
#  Basal  100     7   1  12   1 121
#  Her2     9    29   0   4  19  61
#  LumA     2     0 100 200  64 366
#  LumB     1     0   5 100  36 142
#  Normal   4     1   5   9   3  22
#  Sum    116    37 111 325 123 712 

Agreements:

Basal.TN agreement = 100/121
Her2.HER2+ agreement = 29/61
LumA.LA agreement = 100/366
LumB.LB1/LB2 agreement = 136/142

PCAPAM50 vs IHC:

res.PCAPAM50$Int.sbs$IHC=toupper(res.PCAPAM50$Int.sbs$IHC)
res.PCAPAM50$Int.sbs$IHC = factor(res.PCAPAM50$Int.sbs$IHC,levels=c("TN", "HER2+", "LA", "LB1", "LB2"))
addmargins(table(res.PCAPAM50$Int.sbs$Int.SBS.Mdns.PCAPAM50, res.PCAPAM50$Int.sbs$IHC))
#          TN HER2+  LA LB1 LB2 Sum
#  Basal   99     7   0  12   1 119
#  Her2    10    30   0   7  24  71
#  LumA     3     0 103 168  43 317
#  LumB     2     0   6 135  55 198
#  Normal   2     0   2   3   0   7
#  Sum    116    37 111 325 123 712 

Agreements:

Basal.TN agreement = 99/119
Her2.HER2+ agreement = 30/71
LumA.LA agreement = 103/317
LumB.LB1/LB2 agreement = 190/198

Upon comparison, we can see that PCAPAM50 has an overall agreement of 422/712 = 59.3%, which is an improvement of 8% over the PAM50 calls. If you closely examine the results, you will notice that the consistency of IHC LB subtype with LumB increased by 39.7%(136 to 190) with PCAPAM50.