Fanyan Luo1*, Lizhi Lv1, Weijie Ye2 and Rong Liu2
Received: October 01, 2019; Published: October 11, 2019
Corresponding author: Fanyan Luo, Department of Cardiothoracic Surgery, Xiangya Hospital, Central South University, Changsha 410008, PR China
DOI: 10.32474/ACR.2019.02.000139
Background: Accumulated evidence suggests that dysregulated expression of long non-coding RNAs (lnc RNAs) may participate in the development of cardiovascular diseases. In this study, we aim at identifying circulating lnc RNAs associated with acute myocardial infarction (AMI).
Materials and methods: By repurposing microarray probes from two public datasets (GSE48080, GSE66360) from gene expression omnibus database, an array-based transcriptional analysis of lnc RNAs in AMI patients and controls were conducted by us. Data analyses with R and Bioconductor.
Results: Six lnc RNAs (MIR22HG, RP11-296O14.3, IDI2-AS1, RP11-539L10.2, MIR3945HG, RP11-96D1.11) were identified to be expression differently in AMI (Bonferroni p value <0.01), and a distinguish score was constructed based on the expression data of two lnc RNAs (RP11-539L10.2 and MIR22HG). This distinguish score showed predictive power in distinguishing AMI from controls in the training (AUC=0.92) and validating (AUC=0.70) datasets. Functional enrichment analyses revealed potential functional roles of MIR3945HG in immune response.
Conclusion: Taken together, our newly identified circulating lnc RNAs may have a potential role in the development of AMI.
Keywords: long non-coding RNA, myocardial infarction; data mining, biomarker, prediction model.
List of Abbreviations: Lnc RNAs: Long Non-coding RNAs; AMI: Acute Myocardial Infarction; CVD: Cardiovascular Disease; MI: Myocardial Infarction; GEO: Gene Expression Omnibus; AUC: Area Under The Curve; PCGs: Protein Coding Genes; ROC: Receiver Operating Characteristic Curve; GO: Gene Ontology
Cardiovascular disease (CVD) is the leading cause of death worldwide, with the 2013 global study illuminate that CVD is responsible for 17.3 million deaths globally [1]. It brings about 31.5% of all deaths, 45% of all non-communicable disease deaths, and twice more than that caused by cancer [2]. In most regions of the world, the age-standardized of myocardial infarction (MI) has decreased over past two decades, and the global morbidity of MI has increased by 29 million disability-adjusted life years [3]. Despite the significant advancement of pharmacotherapy, revascularization strategies and organ transplantation, the main cause of death in adult above 35 years old is MI in the United States [4]. Some assessments of cardiovascular risk factors such as hypertension, diabetes, and smoking play vital roles for doctor to prevent and predict disease [5-7]. Further, advances in genomics and proteomics have promoted the development of novel molecular biomarkers which have potential clinical values for AMI [1-8]. In recent studies, long non-coding RNAs (lnc RNAs) have attracted great interest in the domain of cardiovascular diseases [9]. Lnc RNAs, range from 200 nucleotides (bp) to multiple kilobases (kb), are mRNA-like transcripts but lack of protein coding capacity [10]. Linc RNAs play a vital role in some biological processes, such as epigenetic and post-transcription regulation. Dysregulated lnc RNAs participate in regulating cardiac development and in the pathogenesis of heart failure [11,12].
For example, lncRNA ANRIL (also known as CDKN2BAS) is associated with the risk of coronary atherosclerosis [13], peripheral artery diseas [14], carotid arteriosclerosis [15], and other vascular disease. It has been reported that expression levels of lnc RNAs are altered in the cardiac tissue [16] and blood [17] after AMI. Zhang et al measured the circulating levels of 15 cardiovascular disease related lnc RNAs, and found that circulating lncRNA ZFAS1 and CDR1 are predictive of AMI [18]. By data mining previously published gene expression microarray data from database such as gene expression omnibus (GEO) and Array Express, we can get the lncRNA profiling since thousands of lncRNA-specific probes were represented on the commonly used microarray platforms such as Affymetrix Human U133 Plus 2.0 arrays [19]. In this study, we applied this method to conduct gene expressions of lnc RNAs profiling on two cohorts from GEO database. We investigated the expression of lnc RNAs in AMI patients and control subjects. A three-lncRNA signature were identified from the GSE66360 test series showed predictive power in distinguishing AMI from controls and validated it in the GSE48060 validation series. We also integrated with proteincoding mRNA expression data to predict the potential role of our identified lnc RNAs.
Two datasets with the profiling data of gene expression were downloaded from the GEO database (http://www.ncbi.nlm.nih. gov/geo/) with accession number GSE66360 and GSE48060 [1], respectively. In the training dataset GSE66360, circulating endothelial cells from patients experienced acute myocardial infarction (AMI, n=49) and healthy controls (n=50) were isolated and gene expression pattern was determined by using the Affymetrix Human U133 Plus 2.0 arrays (Affymetrix). In the validating dataset GSE48060, the study samples consisted of whole blood from 31 first-time AMI patients and 21 controls with a normal echocardiogram. Nucleated cells, fractionated from heparinized blood, were isolated and gene expression pattern was determined by using the Affymetrix Gene Chip Human Genome U133 Plus 2.0, which includes 54,675 probe sets.
The probe sets of Affymetrix Human U133 Plus 2.0 arrays that were not mapped for pseudogene transcripts or protein-coding transcripts but were uniquely and perfectly assigned for lncRNA sequences were obtained from Du’s study [20] (http://cistrome. org/lncRNA/lncRNA_data_repository.html, file Array. probe. alignment/U133p2.lncRNA.uniq). Every lncRNA was confirmed by at least four probes. To reduce the probability of inaccurate annotations, the lncRNAs obtained from our analysis and lncRNAs defined in the GENCODE project (http://www.gencodegenes.org/, release 25) [21] were cross-referenced by Ensembl id. Finally, up to 2653 probes corresponding to 2183 lncRNAs were left. The CEL files were normalized with the MAS5 algorithm using the “affy” R Bioconductor package (http://www.bioconductor.org/packages/ release/bioc/html/affy.html). Furthermore, the probe-level expression profiles were converted into lncRNA-based expressions with the collapse row function [22], specifically, when multiple probes assigned in one lncRNA, the expression level of the lncRNA was calculated with the mean expression level of those probes. Finally, the lncRNA expression levels were normalized with a mean of 0 and an SD of 1.
To identify AMI associated lnc RNAs, we conducted t test to assess the relationship between the continuous expression level of each lncRNA in AMI and health controls. The lnc RNAs with Bonferroni p values less than 0.01 were considered to be statistically significant and associated with AMI. Multivariate logistic regression was performed for the above selected lnc RNAs, and those lnc RNAs with a multivariate model p value of less than 0.05 were left for the distinguish score calculation. The distinguish score was calculated to evaluate each patient’s probability of AMI according to the following formula:
where n is the number of lnc RNAs in the model; Expi stands for the expression level of lnc RNAi; Coei represents the estimated regression coefficient of lnc RNAi in the multivariable logistic regression model. Patients who have higher distinguish scores are expected to have a higher probability of AMI. The area under the receiver operator characteristic curve (AUC) was used to access the classification performance of the distinguish scores according to their capability to distinguish AMI from normal control. Moreover, UC value was calculated via ROCR R package (https://cran.r-project. org/web/packages/ROCR/index.html). All statistical analyses in this study were performed using the R statistical software version 3.3.3 [23] and Bioconductor with related packages.
A previous study reported that the biological functions of lnc RNAs are correlated with the co-expressed protein coding genes (PCGs) [24]. Thus, we tested the correlation between the expression levels of each paired lncRNA and PCG. The PCGs was defined as lncRNA correlated when the correlation coefficient was higher than 0.4 in both the datasets. The GO biological process (GOTERMBP- ALL) enrichment analyses of the PCGs co-expressed with AMI associated lnc RNAs were performed to predict the function of AMI associated lnc RNAs via the DAVID annotation tool (http://david. abcc.ncifcrf.gov/) with the functional annotation clustering option [25]. The enriched Gene Ontology (GO) terms with a Bonferroni p value of <0.05 were considered as a potential function of AMI associated lnc RNAs. The significantly enriched GO terms were visualized with the Enrichment Map Plugin in Cityscape [26]. The overall workflow of this study is shown in Figure 1.
Two datasets were downloaded from GEO with the following accession numbers: GSE66360 and GSE48060. A total of 151 samples with 99 individuals for the GSE66360 (49 AMI cases, 50 controls) and 52 individuals for the GSE48060 (31 AMI cases, 21 controls) were analyzed (Table 1).
Table 1: Patient characteristics of the datasets utilized in our study.
AMI: acute myocardial infarction
The GSE6630 (n=99) was selected as training dataset to determine the association between lnc RNAs and AMI. Six lnc RNAs (MIR22HG, RP11-296O14.3, IDI2-AS1, RP11-539L10.2, MIR3945HG, RP11-96D1.11) were found to be significantly associated with AMI patient (Bonferroni <0.01, Table 2) using differential expression analysis. Among these above six lnc RNAs, the expression of RP11-296O14.3, RP11-539L10.2 and RP11- 96D1.11 were significant lower in AMI patients that controls. Meanwhile, the expression of the remaining three lnc RNAs (MIR22HG, MIR3945HG, and IDI2-AS1) were significant higher in AMI than controls. Lnc RNAs with p value than 0.05 were listed in Table 1.
Table 2: Logistic regression model for myocardial infarction in patients with complete clinical and genomic data in the training dataset (n=99).
*The value of direction up and down suggest the expression of this gene significant higher and lower in AMI patients that controls, respectively.
By subjecting the lnc RNAs expression data to multivariate logistic regression model, we found two lnc RNAs were significantly different expression in the MI (multivariate p<0.05, Table 2). We then generate a distinguish score with the two lnc RANs (RP11- 539L10.2 and MIR22HG) to distinguish MIs from controls. The distinguish score formula was developed according to the expression levels of two lnc RNAs as follows: distinguish score = (-1.8461 × expression level of RP11-539L10.2) + (1.304 × expression level of MIR22HG) (Table 3). The distribution of the lncRNA distinguish score and expression signature were shown in Figure 2. We found that patients with high- distinguish scores tended to express high levels of MIR22HG and low level of RP11- 539L10.2 in their circulating cells. In addition, receiver operating characteristic curve (ROC) analysis was conducted to assess the predictive accuracy of the two-lnc RNAs signature (Figure 3). The results showed a prognostic power of distinguishing AMIs from controls either in the training dataset (AUC=0.92). Further, we calculated the AUCs for each of the lncRNA and found the two lnc RNAs showed a good distinguish performance (AUC>0.5; Table 3).
Figure 2: LncRNA risk score analysis.
The distribution of 2-lncRNA risk score and lncRNA expression signature were analyzed in the train and validate datasets. Heatmap of the lncRNA expression profiles in the train (A) and validate (C) datasets. Rows and columns represent lnc RNAs and patients, respectively. (A) lncRNA signature risk score distribution in the training (B) and validating (D) datasets.
Figure 3: ROC curves assess the accuracy of our defined signature.
ROC curves assess the accuracy of the lncRNA signature in the train (red line) and validate (green line) datasets. True positive rate represents sensitivity, whereas false positive rate is one minus the specificity.
To infer the potential biological function of the lnc RNAs, the co-expressed relationships between the expression levels of six lnc RNAs and protein-coding genes (PCGs) were tested in both datasets. Based on the criterion of Person correlation coefficients higher than 0.4, MIR3945HG was shown to be co-expressed with 303 PCGs (Table 2). GO function enrichment analysis for these PCGs was then performed, suggested that these PCGs were significantly enriched in 7 GO terms (Figure 4A), which clustered in inflammatory response, innate immune response, regulation of cytokines secretion, leukocyte migration, immune response, MyD88-dependent toll-like receptor signaling pathway, and defense response to bacterium and chemotaxis (Figure 4, Table 3). In general, function annotation results indicated that MIR3945HG might participate in the development of AMI through interacting with immune response related PCGs.
Figure 4: GO enrichment analysis for the function of MIR3945HG.
(A) The original significance outputted from DAVID for GO biological processes were transformed in to ‘–log (P-value)’ for plotting. (B) The functional map of enriched GO terms with each node indicates an enriched GO term and each edge represents the common genes shared between connecting enriched GO.
The study of gene biology functions has mainly focus in protein-coding genes and miRNAs until the discovery of thousands of functional regulatory lnc RNAs [27]. Lnc RNAs, show great tissue- and disease-specific expression levels compared with protein-coding genes, are dysregulated in many disease types, which are believed to play important roles in regulating several biological processes. It has been reported that dysregulation in lnc RNAs expression is associated with some cardiovascular diseases risk [28]. The functions of lncRNA in AMI have not been fully understood [29,30]. In our study, by data mining two previously published gene expression microarray data from GEO, we obtained the expression pattern of lnc RNAs in AMI patients and control subjects. Six lnc RNAs were identified to be associated with AMI, and a set of two-lncRNA (MIR22HG and RP11-539L10.2) was signature shown power in distinguishing AMI from controls either in the training or in the validation series. Although the number of lnc RNAs discovered and recorded in biological databases, such as Ensemble [31] and GENCODE [30] is increasing, only a few lnc RNAs were fully functionally characterized. In our study, we found that six lnc RANs (MIR22HG, RP11-296O14.3, IDI2-AS1, RP11- 539L10.2, MIR3945HG, RP11-96D1.11) were associated with AMI. Voellenkle C et al. reported differential expression of MIR22HG in human umbilical vein endothelial cells between normoxia and hypoxia status, and they validated this finding in a mouse model of hindlimb ischemia, suggesting an important role of MIR22HG in the vascular physiopathology [30].
MIR3945HG is also observed to be aberrantly expressed in lung squamous cell carcinoma, and its high expression is associated with longer survival time of lung squamous cell carcinoma patients [31]. To gain a deeper knowledge of the above mentioned lnc RNAs in AMI, the underlying regulatory mechanisms should be further studied (Table 4).
First, owing to the restricted availability of data, a fraction (about 3800 in more than 20 thousand) but not all of the human lnc RNAs were included in our analysis. Second, we tested the associations between the expression of lnc RNAs and AMI and identified some candidate lnc RNAs might participate in the development of AMI, but the mechanisms are not clear. The functions of lnc RNAs identified in this study need to be explored in further experimental study. Third, the lnc RNAs expression from circulating endothelial cells in the training dataset and from circulating nucleated cells of the blood in the validating dataset. This is also showed that robustness of the three-lncRNA signature in circulating cells is robust in distinguishing AMI form controls.
Our study presents a two-lnc RNAs signature significantly related with AMI. This signature might contribute to diagnose AMI patients. Changes of lnc RNAs’ expression level in the circulating cells may reflect the underlying biological mechanisms for AMI under detection. Our findings suggest that expression profiling of the lncRNA complement of the cardiac transcriptome in the systemic circulation might provide new approach for early diagnosis and treatment of the AMI.
Bio chemistry
University of Texas Medical Branch, USADepartment of Criminal Justice
Liberty University, USADepartment of Psychiatry
University of Kentucky, USADepartment of Medicine
Gally International Biomedical Research & Consulting LLC, USADepartment of Urbanisation and Agricultural
Montreal university, USAOral & Maxillofacial Pathology
New York University, USAGastroenterology and Hepatology
University of Alabama, UKDepartment of Medicine
Universities of Bradford, UKOncology
Circulogene Theranostics, EnglandRadiation Chemistry
National University of Mexico, USAAnalytical Chemistry
Wentworth Institute of Technology, USAMinimally Invasive Surgery
Mercer University school of Medicine, USAPediatric Dentistry
University of Athens , GreeceThe annual scholar awards from Lupine Publishers honor a selected number Read More...