Bioinformatics experiments. Exp 1. Analysis of mechanisms of Cd+2 impact on MAPK-signalling through DUSPs (or “Catch me (Cd2+) if you can”). Part 3. Ontological analyzis. B. GeneCodis

in steemstem •  5 years ago  (edited)

DUSP2_cartoon_representation.png

(One of the "heroes" of the 1st experiment - DUSP2 - with its catalytic domain (cartoon representation) with its conservative (V)-HC-XX-X-XX-R-(S/T) motif (highlighted with magenta). The image was created by me with the help of PyMol - open source tool for molecules visualization/exploration. You can use the image if you want)

Welcome back to our journey with MAPK signalling pathway, DUSPs (dual-specificity phosphatases) and Cd2+.

Finding the biological processes the genes (related to “cellular response to cadmium ion”) are involved

In this part we are going to visit GeneCodis [1, 2, 3, 4]

Select the organism (Homo sapiens in our case).

Choose “GO Biological Processes” in “Select the annotations” option.

GeneCodis_start_page_5.png

And let’s paste the list of our genes (29 unique genes) in the input field (“Paste your lists of genes” option). And click “Submit”. The page will be reloaded several times.

Note that we got 42 genes in the previous part, but there're 29 unique genes. You would get different results, if you submitted 42 genes to GeneCodis (I mean, seems like it doesn't exclude redundant data).

Also note that GeneCodis assign unique URL to each job. You can see results discussed in this post at http://genecodis.genyo.es/analysis/job-5920922725697

At the top of the page. You’ll see “3 Genes (10.34%)” (highlighted with red). There were no annotations for these genes (so that GeneCodis uses 26 genes to represent results). Using “Summary of the user provided list of genes” you can see the list with our genes, along with their description and names.

Genecodis_11.png

Now open “Singular Enrichment Analysis of GO Biological Process” section where we can already see “stress-activated MAPK cascade (BP)” and “cellular response to cadmium ion (BP)” processes (which contains those terms of “Biological process” ontology) with tags cloud. With “Singular Enrichment Analysis of GO Biological Process” we’ll have 1 biological process for 1 group of genes (as opposed to “Modular Enrichment Analysis (all annotations)” where we will see multiple processes for 1 gene group).

GeneCodis_22.png

Also you can see a table in the “Interactive table” section.

By clicking on the “NG” column we can sort results based on the “Number of annotated genes in the input list”.

GeneCodis_7.png

Let’s export results. You can see “Get results in other formats:” option above the “Interactive table” section. Click the icon below it. You’ll see the page titled “Summary in other formats”. There you can find a graph with results (at the bottom of the page). Click on “Get the results in TAB delimited text format” link to export result in tsv format.

GeneCodis_44.png

Now, open that with Notepad, copy text and paste it into Excel. You’ll see “Support” column (which is (to my knowledge) is equivalent for “NG” column mentioned above). Sort results in Excel with the help of “Sort & filter option” based on that “Support” column values (“Largest to smallest”).

Delete "Id", "Items", "List size", "Reference Support", "Reference size", "Hyp_c" columns. And choose the first 15 results (groups with 9, 8, 7 and 6 genes). So that we get the following …

GeneCodis_results_3.png

(The image above shows the top 15 biological processes the genes we've gotten with the help of QuickGO are involved in. There're 275 entries in total). The results were gotten with the help of GeneCodis. The image was created by me. Results are presented in Excel).

where

“Items_Details” column contains biological processes;

“Support” column contains the number of genes taking part in specified process;

“Hyp” column contains p-values (probability value / significance).

“Genes” column contains, obviously, genes names.

(as for “Hyp_c” we’ve deleted, to my knowledge Hyp_c is the p-value corrected with the help of FDR (false discovery rate) method. [5]. I don’t see a big difference between the values in Hyp_c and Hyp columns in our case, so I deleted Hyp_c column).

|| Useful tip
Let’s talk a little bit about p-values (probability values)
Statement that the group of genes on the right (“Genes” column) has nothing to do with the process on the left (“Item_Details”) is our null hypothesis.
The opposite statement is our alternative hypothesis.
Now we reject the null hypothesis (taking into account that it’s correct/true). The probability that by rejecting that null hypothesis we are making a mistake is that p-value (which is in our case is something like 3.26E-14 (very small number for all genes)). In other words, p-value is the probability that the null hypothesis is true/correct. Or, in other words, the probability to make a mistake by concluding that those genes are involved in those processes (“Item_Details” column) is that p-value. So, looks like we can be sure that the results we’ve got with the help of GeneCodis are reliable [6].

Results

So, you can see that we have “stress-activated MAPK cascade (BP)” process among the first 15 results we’ve got with GeneCodis with 6 genes (FOS, MAPK3, MAPK1, JUN, MAPK8, MAPK9). These genes and this process relate to the “cellular response to cadmium ion” phrase we’ve been looking for with QuickGO. Thus, we can conclude that cadmium influences MAPK signalling pathway (and all other biological processes you see in the table above) even without searching for that information traditionally with the help of papers/articles/monographs.

A good question here might be, I guess, that we don’t see DUSPs genes among the ones we’ve found with the help of QuickGO. And our first experiment is titled “Bioinformatics experiments. Exp 1. Analysis of mechanisms of cadmium ions impact on MAPK signalling pathway through the members of dual-specificity phosphatases (DUSP) family (or “Catch me (Cd2+) if you can”). Part …”. So, obviously, it might seem strange to us that we don’t see DUSPs genes there. Probably there’s no such information neither in the literature, nor in those databases, though. So our experiment, I think, still make sense.

Anyway, we are not making this first experiment to get the data, which could allow scientists to develop a new strategy to remove Cd+2 from people contaminated with it. We are exploring bioinformatics tools and are trying to figure out how to use them to make some little discoveries.

We said at the beginning of this part (Part 3. A) that ontological analysis could also help us to narrow down the list of DUSPs we need to analyse. MAPK1, MAPK3, MAPK8 and MAPK9 are kinases, and FOS gene product (c-Fos) is the protein which forms a complex with JUN (transcription factor) (involved in MAPK signalling) in the nucleus [7, 8].

Well, now we know that MAPK8, MAPK9, MAPK3 and MAPK1 are involved in “cellular response to cadmium ion”. This doesn’t allow us to narrow down the list of DUSPs, because we know now that 2 MAPK signalling pathways (the first one is where MAPK1 and MAPK3 are involved and the second one is where MAPK8 and MAPK9 are involved) might be influenced by the cadmium (see the first post of Part 3).

But the results allow us to exclude the pathway where p38 (MAPK14) is involved. This might help us to try predict the cell fate (death, proliferation…) in response to cadmium in the last post of this series where we are going to discuss the results of the first experiment.


Now, I’ve tried not to accompany the previous post with screenshots of QuickGo, PubMed
At that time I had doubts about so-called “Fair use”. But after talking to one lawyer I have no doubts anymore:

In cases where the material is used for education or for review, and does not compete with the original product, you are on safe ground - even if you are making money on it.
source

For more information --> https://www.narrative.org/post/copyrights-is-it-fair-use
and
https://community.narrative.org/support/topic/youtube-videos-not-ours-software-web-service-product-copyrighted-screenshots-fair-use-or-not

Even though I’m not from USA (“Fair use” is the name used there), similar doctrines are used in other countries.


In the previous post I’ve provided a useful tip on how to get a nicely formatted article/paper reference. I’d like to mention it here as well with a screenshot, because I think this really can help to save you time (if you haven’t known about it yet).

PubMed_getting_references.png
(PubMed)

To get a nicely formatted article/paper (for ex this) reference on
PubMed (free portal/search engine for us to get abstracts/citations on life sciences/medical topics) you can choose "Send to" option, then check "File" option (in "Choose Destination" category) and then "Summary (text)" in the "Format" category, and click "Create file".
https://www.nlm.nih.gov/bsd/pubmed.html
my previous post

That’s how I get most references for the “References” section in my posts.
You’ll see in that file something like this

txt_from_PubMed.png

All images (without the license specified) are used under the doctrine known in USA as "Fair Use" (similar doctrines are used in other countries). For more information visit the US Gov website.

Other posts of this series:

Bioinformatics experiments. Introduction

Bioinformatics experiments. Exp 1. Analysis of mechanisms of cadmium ions impact on MAPK signalling pathway through the members of dual-specificity phosphatases (DUSP) family (or “Catch me (Cd2+) if you can”). Part 1. Theory

Bioinformatics experiments. Exp 1. Analysis of mechanisms of cadmium ions impact on MAPK signalling pathway through the members of dual-specificity phosphatases (DUSP) family (or “Catch me (Cd2+) if you can”). Part 2. MSA (Multiple Sequence Alignment)

Bioinformatics experiments. Exp 1. Analysis of mechanisms of cadmium ions impact on MAPK signalling pathway through the members of dual-specificity phosphatases (DUSP) family (or “Catch me (Cd2+) if you can”). Part 3. Ontological analyzis. A

References:

  1. GeneCodis3

  2. Tabas-Madrid D, Nogales-Cadenas R: GeneCodis3: a non-redundant and modular enrichment analysis tool for functional genomics. Nucleic Acids Research 2012; doi: 10.1093/nar/gks402

  3. Nogales-Cadenas R, Carmona-Saez P: GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Research 2009; doi: 10.1093/nar/gkp416

  4. Carmona-Saez P, Chagoyen M: GENECODIS: A web-based tool for finding significant concurrent annotations in gene lists. Genome Biology 2007 8(1):R3

  5. GeneCodis Help page

  6. Stanton A. Glantz. Primer of Biostatistics, Fourth edition, McGraw‐Hill Inc., New York, 1997. No. of pages: xvi+473+computer program

  7. Proto-oncogene c-Fos

  8. Transcription factor AP-1 / JUN

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Just a naive question from a non-specialist. Why would Genecodis be shut down? It is too popular or too expensive to maintain?

hi
I found out today that GeneCodis just moved to another domain
-> http://genecodis.genyo.es/ (from http://genecodis.cnb.csic.es/)
It works great.
updated my post with screenshots

According to
https://www.researchgate.net/post/Is_Genecodis_3_out_of_service_now

GeneCodis is down due to server maintainance and the deployment of the next version 4. We hope to release it as soon as possible during this summer.

Cool! Thanks for doing the research for me :D

Nice guideline --> Resteem!

Best

thanks )



This post has been voted on by the SteemSTEM curation team and voting trail. It is elligible for support from @curie.

If you appreciate the work we are doing, then consider supporting our witness stem.witness. Additional witness support to the curie witness would be appreciated as well.

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!

Please consider setting @steemstem as a beneficiary to your post to get a stronger support.

Please consider using the steemstem.io app to get a stronger support.

Congratulations @alexbiojs! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You received more than 10000 upvotes. Your next target is to reach 15000 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Are there are other Tools comparable to genecodis?

Posted using Partiko Android

hi
I found out today that GeneCodis just moved to another domain
-> http://genecodis.genyo.es/ (from http://genecodis.cnb.csic.es/)
It works great.
updated my post with screenshots

As far as I remember, there were 68 tools for ontological analysis back in 2015-2016.
I guess in 2019 we have even more.
But GeneCodis is one of the best of them

The novelty of GeneCodis3 is that as well as evaluating single annotations it is able to determine combinations of annotations that are significantly associated to the analyzed list.

source