Whole Exome Network Analysis Identifies CXCR5-CXCL13 Signaling as a Key Driver in Breast Cancer
I am delighted to submit the concept paper with the title “Whole Exome Network Analysis Identifies CXCR5-CXCL13 Signaling as a Key Driver in Breast Cancer” for consideration under the NIH Research Fellowship Program, Ruth L. Kirschstein National Research Service Award Individual Predoctoral Fellowship. The primary aim of the Kirschstein-NRSA Individual Predoctoral Fellowship is to provide financial support for mentored research training, leading to a doctoral degree in the biomedical, behavioral, or clinical sciences. However, this fellowship program also strives to enhance the diversity of the scientific workforce in the United States by providing opportunities for academic institutions to identify and recruit students from diverse population groups. This fellowship program encourages diverse population groups to seek graduate degrees in health-related research. The long-term goal of the Kirschstein-NRSA Individual Predoctoral Fellowship program is to enhance the number of scientists from diverse population groups and prepare them for research careers in the biomedical, behavioral, and clinical sciences.
I am a fourth-year graduate student at Morehouse School of Medicine (MSM), a historically Black College or University in the Atlanta University Center. The mission of MSM is to increase the health and well-being of individuals and communities with emphasis on people of color. This mission is primarily focused on underserved urban and rural populations in Georgia. MSM also seeks to increase the diversity of the health professional and scientific workforce. This mission shares a similar objective to that described in the mission of the Ruth L. Kirschstein NRSA Individual Predoctoral fellowship. I am obtaining a Ph.D. in Biomedical Science and a Master’s degree in Clinical Research (MSCR). The PhD/MSCR program has provided me with a strong foundation in research design, methods, and analytic techniques. My ability to conceptualize and think through research problems has also been enhanced through my participation in this dual degree program. I have gained experience conducting research as well as presented my research findings as first author. The PhD/MSCR program has afforded me the opportunity to interact with members of the scientific community at scientific meetings and workshops. Moreover, the dual degree program has provided me with a versatile skill which I can utilize in the next stage of my research career. Overall, I believe the Ruth L. Kirschstein NRSA Individual Predoctoral fellowship will provide me with financial support that will take my graduate career to the next level.
My dissertation committee consists of 3 experts in Oncology (James Lillard Jr. PhD, MBA, Shailesh Singh, PhD, and Sanjay Jain, MD), 1 expert in Toxicology (Danita Eatman), and 1 Biostatistician (Fengxia Yan, MD). Bioinformatics support will be provided by the bioinformatics core at Morehouse School of Medicine, The Georgia Institute of Technology, and Emory University.
Breast cancer (BrCa) is the second leading cause of cancer related deaths in American women (American Cancer Society, 2018). Approximately, 1 in 8 (12%) women in the United States will develop invasive BrCa during her lifetime (American Cancer Society, 2018). In 2018, approximately 250,000 new cases of invasive BrCa will be diagnosed in women and of these women, approximately 40,500 will die from the disease, about 1 in 36 (3%) (American Cancer Society, 2018). More recently, incidence rates have been stable in Caucasian women, but have increased in African American women. This BrCa health disparity is most notably observed in Triple Negative Breast Cancer (TNBC). TNBC is characterized by a lack of molecular markers; estrogen receptor (ER), progesterone receptor (PR), and the human epidermal growth receptor 2 (HER-2). It accounts for 10-20% of all BrCas and is an aggressive disease with poor prognosis (Pierobon, 2013). TNBC incidence rates are higher in Caucasian women. However, the number of fatalities associated with TNBC is significantly higher in African American women, than compared to other ethnic groups. If no major changes in prevention or treatment occur, the number of lives lost to TNBC will continue to rise.
More recently, rapid increases in early stage, BrCa incidence have been reported in pre-menopausal women less than 45 years of age. Additionally, many of these young, early stage breast cancer patients are of ethnic descent. The mechanisms responsible for early stage BrCa in young women of ethnic descent remains unknown. Furthermore, it remains controversial whether early breast cancer has unique tumor biology, which may be highly influenced by race. Despite new developments in early detection and treatments, approximately 5% of women diagnosed with BrCa in the United States will develop metastatic disease at the time of first presentation (EBCTCG, 2005). Additionally, another 30% of women with early-stage, non-metastatic BrCa at diagnosis, will develop distant metastatic disease that is not curable (EBCTCG, 2005). Additional predictive markers and new drug targets are needed to prolong survival and improve the quality of life for BrCa patients. It is essential to understand the molecules and mechanisms responsible for the aggressive phenotype of breast cancer to develop new, more effective drug targets for the disease.
Currently, there are no specific targeted therapies for TNBC due to the lack of ER, PR, and HER-2 markers. Current chemotherapy consists of a combination of drugs including paclitaxel (TAX), doxorubicin (DOX), and cyclophosphamide (CTX)]. TAX is a taxane, which disrupts microtubule function, inhibiting the process of cell division (Singh, 2014). It is more commonly prescribed than docetaxel, another member of the taxane family, due to its tolerable toxicity and is noted as first line therapy in metastatic disease (Erba, 2010). DOX is an anthracyclin, which inhibits DNA and RNA synthesis by intercalating between base pairs of a DNA/RNA strand (El Haibi, 2011). In an effort to combat the acute toxicity associated with DOX, it is often prescribed in combination with TAX (Eralp, 2004). CTX is an alkylating agent, which adds alkyl groups to DNA, which in turn interferes with DNA replication by forming DNA crosslinks (Singh, 2014). It is conditionally prescribed in combination with TAX and DOX, based on disease progression. The ability of CTX to induce the death of certain T regulatory cells contributes to its efficacy. However, this combination therapy produces undesirable side effects. A response rate for a treatment regimen of TAX + DOX + CTX in TNBC patients is a mere 12% for a single agent and can widely vary, 27-65%, for use of multiple agents (Singh, 2014). Patients eventually relapse as a result of chemoresistance and metastasis, ultimately succumbing to this disease. This brings attention to the cells potentially responsible for drug resistance in the tumor microenvironment.
The tumor microenvironment is composed of tumor cells as well as various types of stromal cells, such as fibroblasts and endothelial cells. Several types of inflammatory cells including neutrophils, macrophages, and lymphocytes are recruited to breast tumors and play either a positive or negative role in cancer progression. The infiltration of inflammatory cells is regulated by a variety of biologically active molecules in the tumor microenvironment. Chemokines play a significant role in this process (Singh, 2011). Chemokines are 8-10 kilo Dalton (kD) chemotactic cytokines involved in cell trafficking events and normal homeostasis. They are grouped into 4 major subfamilies (C, CC, CXC, and CX3C) based on the pattern of the two NH2-terminal cysteine residues. The extended N-terminus functions to recognize, bind, and activate the receptor. BrCa cells express Chemokine Receptor 5 (CXCR5). Chemokine Ligand 13 (CXCL13) is the sole ligand for CXCR5, which plays a role in cancer progression (Singh, 2009; Singh, 2009; Singh 2011). Our laboratory was the first to show that CXR5-CXCL13 signaling mediates prostate cancer metastasis and progression i.e., growth, migration, and invasion, and survival (El Haibi, 2010-2012). We also demonstrated that CXCR5-CXCL13 signaling induces cancer progression signaling pathways: PI3K, AKT, ERK, and Jun (El Haibi, 2010-2011). However, the mechanisms, by which, CXCL13-CXCR5 signaling promotes breast cancer is unknown. Panse et al revealed CXCR5 and CXCL13 are overexpressed in BrCa tissue (Panse, 2008). This study also showed elevated serum levels of CXCL13 in BrCa patients with metastatic disease, then compared to controls and disease-free patients. Additionally, a recent study provided evidence that co-expression of CXCR5 and CXCL13 showed a significant correlation with lymph node metastasis and independently, CXCL13 had EMT-inducing potential (El Haibi, 2010). Taken together, these findings suggest the CXCR5-CXCL13 signaling axis contributes to the aggressive phenotype of breast cancer (BrCa).
The central research questions of this project focus on determining the mechanisms and molecules responsible for differences in tumor biology of young BrCa patients and how these mechanisms and molecules contribute to poor BrCa prognosis. The purpose of this study is to characterize the molecular phenotype of (BrCa) in the context of Chemokine Receptor 5 (CXCR5), Chemokine Ligand 13 (CXCL13), and associated gene expression.
This study is novel as uses a bioinformatic approach and gene enrichment analyses to identify the specific molecules and mechanisms responsible for the aggressive phenotype of BrCa, especially in young and early stage BrCa patients. These two unique populations account for a high proportion of BrCa cases and are also associated with unfavorable prognosis. The results from this study have the potential to benefit young and early stage BrCa patients as it will serve as a new predictive factor and therapeutic target for young and early stage BrCa patients.
Breast cancer (BrCa) is the most frequently diagnosed cancer and the second leading cause of cancer-related death among women worldwide (American Cancer Society, 2018). Several targeted and adjuvant therapies exist for estrogen receptor (ER) and human epidermal receptor-2 (HER-2) positive breast cancers. Currently there are no targeted therapies for Triple Negative Breast Cancer (TNBC), which lacks the three main receptors used to characterize breast cancer subtypes; estrogen receptor (ER), progesterone receptor (PR), and human epidermal receptor-2 (HER-2). Despite new developments in early detection and treatments, approximately 5% of women diagnosed with BrCa in the US will develop metastatic disease at the time of first presentation (EBCTCG, 2005). Additionally, another 30% of women with early-stage, non-metastatic BrCa at diagnosis will develop distant metastatic disease that is not curable (EBCTCG, 2005). Additional predictive markers and new drug targets are needed to prolong survival and improve the quality of life for BrCa patients. Panse et al revealed levels of CXCR5 and CXCL13 are elevated in serum and overexpressed by tumor tissue in metastatic BrCa patients. The long-term goal of this study is to further characterize the molecular phenotype of BrCa in the context of CXCR5, CXCL13, and associated gene expression. We hypothesize the CXCR5-CXCL13 signaling axis contributes to the aggressive phenotype of BrCa. A bioinformatic approach will be used to aid in characterizing this new drug target for BrCa. Our patient cohort (1,049 female patients of Caucasian, African American, Latin American, and Asian/Pacific Island descent, age 35-82, diagnosed with ductal and lobular carcinoma ) will be obtained from The Cancer Genome Atlas (TCGA). All patients within the cohort have verification of informed consent and IRB approval. Differential Expression Sequencing (DESeq) analysis will be performed to identify genes differentially expressed among primary tumor and matched normal, solid tissue groups. Weighted Gene Network Co-expression (WGCNA) analysis will be performed to identify modules of co-expressed which will be correlated to factors influencing BrCa prognosis, such as age at diagnosis, TNM staging, race, menopausal status, breast cancer subtype, and survival time. Finally, canonical pathway, upstream regulator, and gene interaction analysis will be performed using Ingenuity Pathway Analysis. Our findings suggest CXCR5, CXCL13, and associated genes driving tertiary lymphoid structure formation, is present in BrCa, may serve as a predictive factor, and a new therapeutic target.
GOALS AND OBJECTIVES
The long-term goal of this study is to further characterize the molecular phenotype of BrCa in the context of CXCR5, CXCL13, and associated gene expression. We hypothesize the CXCR5-CXCL13 signaling axis contributes to the aggressive phenotype of BrCa. The objective of this study is to identify the molecules that contribute to the aggressive phenotype of BrCa in silico.
Data Collection and Normalization
The data used in this study will be obtained from The Cancer Genome Atlas (TCGA). Clinical and RNA-seq data, for a total of 1049 female patients of Caucasian, African American, Latin American, and Asian/Pacific Island descent, age 35-82, diagnosed with ductal and lobular breast cancer carcinomas will be obtained. All patients within the cohort have verification of informed consent and IRB approval.
Detecting Low counts, Batch Effect Correction, and Removal of Outliers
A minor limitation of RNA-Seq analysis is the presence of missing expression counts, which alters the distribution of the population. Due to this limitation, normalized counts for all 1,049 patients with 26,000 protein coding genes will be log2 transformed (expression value+1) to create a standard normal distribution. Genes possessing greater than 50% zero counts will be removed to prevent a skewed distribution and remaining genes will be filtered by a standard deviation of 1. We predict that between 5,000-8,000 protein coding genes will be analyzed for batch (center) effect. Using ComBat algorithm, batch effect correction will be applied to detect variance from a total of 52 sequencing centers that contributed to the TCGA BRCA dataset. ComBat, an Empirical Bayes method in the Bioconductor SVA package, will be used to remove all outliers.
Detection of Differentially Expressed Genes among Primary Tumor and Matched Normal Samples using DESeq
Differential Expression Sequencing (DESeq) is a free software package in R that detects genes that are differentially expressed between two groups. In this study, we will detect genes that are differentially expressed between 113 primary tumor and 113 matched normal samples. Normalized counts for approximately 26,000 genes will be used to determine differential expression.
Identification of modules associated with different stages of breast cancer primary tumors using WGCNA
Between 5,000 and 8,000 genes will be entered into Weighted Gene Co-Expression Analysis (WGCNA) software. WGCNA is a free software package in R that extracts information on single genes from large scale gene expression profiles, across all patient samples, and uses this information to construct gene network modules of co-expressed genes. These co-expressed genes will be built on a manual threshold power of 6. Modules containing co-expressed genes have the potential to be associated with specific clinical traits. Association is based on a scale of 0-1 with an alpha of 0.05. Network calculation will allow for the identification of a module(s) of genes highly co-expressed with Chemokine Receptor 5 (CXCR5) and Chemokine Ligand 13 (CXCL13) and strongly correlated with clinical traits, such as age at diagnosis, TNM staging, race, menopausal status, breast cancer subtype, and survival time.
Functional Enrichment analysis of genes within each module using IPA
Following WGCNA, a network module(s) will be identified for functional enrichment containing genes co-expressed with CXCR5 and CXCL13. Ingenuity Pathway analysis (IPA) will be used to elucidate the biological roles of genes inside modules of co-expressed genes. Often modules contain co-expressed genes that contain co-regulated genes, with similar biological functions. These modules of co-expressed genes regulate epigenetic features downstream of particular transcription factors. Genes with high connectivity will be pooled together and IPA will be used to perform an analysis that shows the canonical pathway of selected module hubs.
Upstream Regulator Analysis using IPA
The Upstream Regulator analysis feature in IPA will be used to identify the biological function of significantly associated gene co-expression module(s) to BrCa stage. Co-expressed genes in this module will most likely be regulated by the same or similar upstream regulators, including transcription factors. We will identify the upstream transcriptional regulators in each module with a p-value of overlap <0.01, which will give insight into the biological drivers of each module.
Kaplan Meier (KM) survival analysis will be performed using GraphPad PRISM software. Groups with alterations in upstream regulators will be compared to no alterations. We will use the endpoint survival time for the time variable. Furthermore, high and low survival patient groups will be determined using z-score values.