PCAPAM50

What is PCAPAM50?

Accurate classification of breast cancer tumors based on gene expression data is not a trivial task, and it lacks standard practices. The PAM50 classifier makes calls based on the 50 gene centroid correlation distance to LA, LB, Basal, Her2 and normal-like centroids. However, the application of the PAM50 algorithm has its challenges. The two main challenges are (1) balancing estrogen receptor (ER) status and (2) the gene centering procedures. The PAM50 classifier works accurately if the original cohort/dataset is ER status-balanced. However, this is often not the case with most genome-wide studies. In such cases, a conventional strategy is to choose a subset which is ER status-balanced and use the median derived from that subset to gene center the entire cohort. In practice, an ER-balanced subset is chosen based on IHC-defined ER status. There have been reports of IHC-defined ER status, which is based on protein expression, not being completely consistent with ER status defined by gene expression. This inconsistency may impact the accuracy of the subsequent gene centering procedure which is aimed to minimize the bias of the dynamic range of the expression profile per sequencing technology. As a result, such inconsistency may contribute to the discrepancy between the IHC and PAM50 subtyping results. Hence, we explored the possibility of using a gene expression-based ER-balanced subset for gene centering leveraging principal component analyses (PCA) and iterative PAM50 calls to avoid introducing protein expression-based data into a gene expression-based subtyping method. The PCAPAM50 R package was created as a means to easily distribute this new method for tumor classification.


Package Structure

Diagram of Package Structure: View in a new tab for a larger image.

The PCAPAM50 package consists of 5 core directories.
Each of their purposes and components can be seen in the above diagram. Most notably, the package comes with sample and The Cancer Genome Atlas (TCGA) data to demonstrate running the program. Further information and a step-by-step guide utilizing these datasets can be found on the Instructions page. After testing, you can substitute these files to make calls for your data.


Function Structure

Diagram of General Function Structure: View in a new tab for a larger image.

This is the general structure for the majority of the functions found in the package. Detailed usage and argument instructions can be found using ?function_name in R or by viewing the manual. Additionally, examples utilizing these functions can be found on the Instructions page.


Copyright and License

Copyright 2018 Windber Reseach Institute, Windber, PA – 15963. All Rights Reserved.

Contact:
Developer: Praveen K. Raj Kumar [P.RajKumar@wriwindber.org]
Lab director: Hai Hu [H.hu@wriwindber.org]

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

You should have received a copy of the GNU General Public License along with this program. If not, see GNU licenses.


Citation

If you find this tool useful, please cite:
Raj-Kumar PK, Liu J, Hooke JA, Kovatich AJ, Kvecher L, Shriver CD, Hu H. PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B. Sci Rep. 2019 May 28;9(1):7956. doi: 10.1038/s41598-019-44339-4. PMID: 31138829; PMCID: PMC6538748.