Loading...

JOURNAL OF NEUROSCIENCE AND NEUROSURGERY (ISSN:2517-7400)

Gene Network Visualization and Functional Enrichment Analysis in Genes Associated with Amyotrophic Lateral Sclerosis

Konstantina Skolariki1*, Themistoklis Exarchos1, Panagiotis Vlamos1

1Department of Informatics, Ionian University, Corfu, Greece

CitationCitation COPIED

Skolariki K, Exarchos, Vlamos P. Gene Network Visualization and Functional Enrichment Analysis in Genes Associated with Amyotrophic Lateral Sclerosis. J Neurosci Neurosurg. 2020 Jun;3(2);148.

Abstract

Amyotrophic Lateral Sclerosis (ALS) is a fatal, progressive neurodegenerative disease. There are several genes associated with ALS such as SOD1, TARDBP, OPTN, VCP, UBQLN2 and C9orf72. For this study a variety of tools were used in order to 1) identify genes linked with ALS (Ensebl), 2) create a gene network (GeneMANIA), 3) perform functional analysis (Cytoscape), 4) create a heat map and ascertain co-expression levels between genes (STRING) and 5) to detect overlapping genes between several databases (Ensebl, DisGeNet and UniProt). Numerous key genes and pathways were identified that could play a role in the pathogenesis of ALS. However, additional examination is needed in order to establish the exact mechanism of action of these genes and pathways.

Keywords

Amyotrophic Lateral Sclerosis (ALS); Neurodegenerative diseases; Gene networks; Gene network visualization; Cytoscape

Introduction

Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease or more specifically Motor Neuron Disease (MND) and it is characterized by muscle weakness, atrophy and loss of upper and lower motor neurons [1]. Even though it belongs to the neurodegenerative category, it poses distinct difficulties to patients; caregivers as well as healthcare professionals seeing as it significantly differ from other neurodegenerative diseases like Alzheimer’s disease or Parkinson’s disease. This is mainly connected with the age of the affected patients which is usually lower in ALS cases than that of other neurodegenerative diseases as well as the prognosis and progression of ALS. Since there is no cure for the disease, healthcare professionals primarily aim to manage the symptoms of ALS patients. ALS is a rapidly progressive disease. At the onset of the disease, the majority of the patients (around 40-60%) experience muscle weakness in the upper extremities and approximately 20% in the legs [2]. As the disease advances, more progressive symptoms of muscle atrophy appear [2]. The degree of atrophy can also be used as an approach to monitor functional impairment which can predict survival time and aid health care professionals to provide the best possible therapeutic plan for the patient [3]. ALS is a very complex neurodegenerative disease that is believed to involve multifactorial mechanisms of action that impact neurodegeneration. However, these mechanisms still remain unclear. A proposed mechanism of ALS pathogenesis is as follows: impaired glutamate uptake by astrocytes results in an upsurge at glutamate excitotoxicity. Increased neurotransmitter glutamate in the synaptic cleft results in increased inflow of Ca2+ ions in the neurons. The amplified Ca2+ ions levels would normally be removed by mitochondria. However, due to mitochondrial dysfunction they remain in the cytoplasm. Subsequently, the activation of Ca2+- dependent enzymatic pathways which contributes to oxidative stress results to neuro degeneration [4]. The majority of ALS cases, 90-95%, are sporadic meaning that are not hereditary. The rest of the cases around 5- 10% are familial and the result of gene mutations. Established risk factors for ALS include: 1. Heredity, 2. Age, 3. Sex, 4. Genetics and 5. Several environmental factors (e.g. smoking). Gene mutations linked with ALS are most commonly found in SOD1, TARDBP, OPTN, VCP, UBQLN2 and C9orf72 and form intercellular aggregates, increase of oxidative stress and contribute to the impairment of axonal transport [4]. However, several other genes are associated with the disease. In the present study genes associated with ALS were analyzed in terms of functional enrichment and co-expression. Gene co-expression networks are most commonly used to correlate genes with biological processes, molecular functions and cellular components.

Methods

Gene Identification

In order to identify the majority if not all the genes associated with ALS, the database known as Ensembl was utilized. Ensembl is a genome database joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute (https://www.ensembl.org/index.html ). Ensebl is a genome browser that retrieves genomic information offering scientists a centralized resource tool. The search comprised of the term “Amyotrophic Lateral Sclerosis” in Homo Sapiens (organism of choice). The results summary provided 1087 loci/genes associated with ALS. After removing the duplicates, the final gene count associated with ALS was 245. In the Ensembl results table, some genes appeared to be linked with dementia and not ALS. Removal of these genes resulted in the final list of 220 genes linked with ALS.

Visualisation of gene interaction network

The list of the 220 genes related with ALS as provided by the ENSEMBL database, was uploaded to the GeneMANIA web tool (http://genemania.org ). GeneMANIA is a website that provided with a given query gene list may that be single gene queries; multiple gene queries or network search delivers to the user the most closely connected genes amongst the networks and attributes. It recognizes co-expression, co-localization, pathway interactions as well as genetic and physical interactions in the gene network [5]. It indexes 2,277 association networks and contain 597,392,998 interactions obtained from 163,599 genes origination from 9 organisms (data obtained from the GeneMANIA website). A gene network provides visualization of interactions between a set of genes, where each gene is a node and their connections are represented by edges which characterize the functional associations between the genes. The edges between the nodes represent: 1. Physical interactions in red, 2. Co-expression in pink, 3. Predicted gene associations in orange, 4. Co- localization in purple, 5. Genetic interactions in green, 6. Pathway commonalities in blue and 7. Shared protein domains in yellow. The visualization parameters of the edges in Figure 2, can be seen in Table 1.

Overlapping genes

The 220 Ensebl genes were uploaded to two other databases:1. UniProt and 2. DisGeNet (https://www.disgenet.org/home/ ), a platform that contains one of the largest publicly available collections of genes and variants associated to human diseases, in order to identify the top 10 overlapping genes. The identification of the overlapping genes between Ensebl and UniProt was established via the STRING database (https://string-db.org/ ). The STRING database collects and integrates information from several sources such as reference publications and experimental data and creates predicted and known protein-protein interactions for a variety of organisms [6]. After uploading the 220 genes in STRING, it matched several of them to UniProt keywords and in this particular case to ‘Amyotrophic lateral sclerosis’. For the documentation of the overlapping genes between Ensebl and DisGeNet, the DisGeNet database was searched for ‘Amyotrophic lateral sclerosis’ related genes. After identifying the top 10 overlapping genes, they were uploaded in the STRING database and a co-expression analysis was performed.

Functional enrichment analysis

Cytoscape (https://cytoscape.org ) is an open source bioinformatics software platform utilized for the visualization of gene interaction networks [7]. Supplementary features/plug-ins are available through the Cytoscape app. Through the Cytoscape application, the BiNGO (Biological Network Gene Ontology) plug-in was downloaded using the App Manager. Bingo is a Javabased tool that is utilized in order to assess overrepresentation or underrepresentation of Gene Ontology (GO) categories. Using the GeneMANIA interactions dataset, the BINGO settings were arranged for GO Biological Process, Molecular Function and Cellular Component visualization in Homo sapiens. The size of the nodes represents the number of genes which are annotated to that node. The color scale for GO Biological Process, Cellular Component and Molecular Process for the network visualization is shown in Figure 1. The p-value <0.05 was set as the threshold to indicate statistical significance. The color of the node symbolizes the corrected p-value. White nodes are not significantly over-represented. The color scale ranges from yellow (p-value = significant level) to dark orange (p-value = higher significance level).


Figure 1: Colour scale for GO biological process molecular function and cellular component network


Table 1: Visualization parameters for the gene network