1Department of Informatics, Ionian University, Corfu, Greece
Corresponding author details:
Konstantina Skolariki
Department of Informatics
Ionian University
Corfu,Greece
Copyright: © 2020 Skolariki K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 international License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
Amyotrophic Lateral Sclerosis (ALS) is a fatal, progressive neurodegenerative disease.
There are several genes associated with ALS such as SOD1, TARDBP, OPTN, VCP, UBQLN2
and C9orf72. For this study a variety of tools were used in order to 1) identify genes linked
with ALS (Ensebl), 2) create a gene network (GeneMANIA), 3) perform functional analysis
(Cytoscape), 4) create a heat map and ascertain co-expression levels between genes
(STRING) and 5) to detect overlapping genes between several databases (Ensebl, DisGeNet
and UniProt). Numerous key genes and pathways were identified that could play a role in
the pathogenesis of ALS. However, additional examination is needed in order to establish
the exact mechanism of action of these genes and pathways.
Amyotrophic Lateral Sclerosis (ALS); Neurodegenerative diseases; Gene networks;
Gene network visualization; Cytoscape
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disease or more
specifically Motor Neuron Disease (MND) and it is characterized by muscle weakness,
atrophy and loss of upper and lower motor neurons [1]. Even though it belongs to the
neurodegenerative category, it poses distinct difficulties to patients; caregivers as well
as healthcare professionals seeing as it significantly differ from other neurodegenerative
diseases like Alzheimer’s disease or Parkinson’s disease. This is mainly connected with
the age of the affected patients which is usually lower in ALS cases than that of other
neurodegenerative diseases as well as the prognosis and progression of ALS. Since there is
no cure for the disease, healthcare professionals primarily aim to manage the symptoms of
ALS patients. ALS is a rapidly progressive disease. At the onset of the disease, the majority
of the patients (around 40-60%) experience muscle weakness in the upper extremities and
approximately 20% in the legs [2]. As the disease advances, more progressive symptoms
of muscle atrophy appear [2]. The degree of atrophy can also be used as an approach
to monitor functional impairment which can predict survival time and aid health care
professionals to provide the best possible therapeutic plan for the patient [3]. ALS is a very
complex neurodegenerative disease that is believed to involve multifactorial mechanisms
of action that impact neurodegeneration. However, these mechanisms still remain unclear.
A proposed mechanism of ALS pathogenesis is as follows: impaired glutamate uptake by
astrocytes results in an upsurge at glutamate excitotoxicity. Increased neurotransmitter
glutamate in the synaptic cleft results in increased inflow of Ca2+ ions in the neurons.
The amplified Ca2+ ions levels would normally be removed by mitochondria. However, due
to mitochondrial dysfunction they remain in the cytoplasm. Subsequently, the activation
of Ca2+- dependent enzymatic pathways which contributes to oxidative stress results to
neuro degeneration [4]. The majority of ALS cases, 90-95%, are sporadic meaning that
are not hereditary. The rest of the cases around 5- 10% are familial and the result of gene
mutations. Established risk factors for ALS include: 1. Heredity, 2. Age, 3. Sex, 4. Genetics
and 5. Several environmental factors (e.g. smoking). Gene mutations linked with ALS
are most commonly found in SOD1, TARDBP, OPTN, VCP, UBQLN2 and C9orf72 and form
intercellular aggregates, increase of oxidative stress and contribute to the impairment of
axonal transport [4]. However, several other genes are associated with the disease. In the
present study genes associated with ALS were analyzed in terms of functional enrichment
and co-expression. Gene co-expression networks are most commonly used to correlate
genes with biological processes, molecular functions and cellular components.
Gene Identification
In order to identify the majority if not all the genes associated with ALS, the database known as Ensembl was utilized. Ensembl is a genome database joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute (https://www.ensembl.org/index.html ). Ensebl is a genome browser that retrieves genomic information offering scientists a centralized resource tool. The search comprised of the term “Amyotrophic Lateral Sclerosis” in Homo Sapiens (organism of choice). The results summary provided 1087 loci/genes associated with ALS. After removing the duplicates, the final gene count associated with ALS was 245. In the Ensembl results table, some genes appeared to be linked with dementia and not ALS. Removal of these genes resulted in the final list of 220 genes linked with ALS.
Visualisation of gene interaction network
The list of the 220 genes related with ALS as provided by the ENSEMBL database, was uploaded to the GeneMANIA web tool (http://genemania.org ). GeneMANIA is a website that provided with a given query gene list may that be single gene queries; multiple gene queries or network search delivers to the user the most closely connected genes amongst the networks and attributes. It recognizes co-expression, co-localization, pathway interactions as well as genetic and physical interactions in the gene network [5]. It indexes 2,277 association networks and contain 597,392,998 interactions obtained from 163,599 genes origination from 9 organisms (data obtained from the GeneMANIA website). A gene network provides visualization of interactions between a set of genes, where each gene is a node and their connections are represented by edges which characterize the functional associations between the genes. The edges between the nodes represent: 1. Physical interactions in red, 2. Co-expression in pink, 3. Predicted gene associations in orange, 4. Co- localization in purple, 5. Genetic interactions in green, 6. Pathway commonalities in blue and 7. Shared protein domains in yellow. The visualization parameters of the edges in Figure 2, can be seen in Table 1.
Overlapping genes
The 220 Ensebl genes were uploaded to two other databases:1. UniProt and 2. DisGeNet (https://www.disgenet.org/home/ ), a platform that contains one of the largest publicly available collections of genes and variants associated to human diseases, in order to identify the top 10 overlapping genes. The identification of the overlapping genes between Ensebl and UniProt was established via the STRING database (https://string-db.org/ ). The STRING database collects and integrates information from several sources such as reference publications and experimental data and creates predicted and known protein-protein interactions for a variety of organisms [6]. After uploading the 220 genes in STRING, it matched several of them to UniProt keywords and in this particular case to ‘Amyotrophic lateral sclerosis’. For the documentation of the overlapping genes between Ensebl and DisGeNet, the DisGeNet database was searched for ‘Amyotrophic lateral sclerosis’ related genes. After identifying the top 10 overlapping genes, they were uploaded in the STRING database and a co-expression analysis was performed.
Functional enrichment analysis
Cytoscape (https://cytoscape.org ) is an open source
bioinformatics software platform utilized for the visualization of
gene interaction networks [7]. Supplementary features/plug-ins
are available through the Cytoscape app. Through the Cytoscape
application, the BiNGO (Biological Network Gene Ontology)
plug-in was downloaded using the App Manager. Bingo is a Javabased tool that is utilized in order to assess overrepresentation
or underrepresentation of Gene Ontology (GO) categories. Using
the GeneMANIA interactions dataset, the BINGO settings were
arranged for GO Biological Process, Molecular Function and Cellular
Component visualization in Homo sapiens. The size of the nodes
represents the number of genes which are annotated to that node.
The color scale for GO Biological Process, Cellular Component and
Molecular Process for the network visualization is shown in Figure
1. The p-value <0.05 was set as the threshold to indicate statistical
significance. The color of the node symbolizes the corrected p-value.
White nodes are not significantly over-represented. The color scale
ranges from yellow (p-value = significant level) to dark orange
(p-value = higher significance level).
Figure 1: Colour scale for GO biological process molecular
function and cellular component network
Table 1: Visualization parameters for the gene network