HELP
In the following section we provide a description of some of the parameters CIDER generates. These are there to help give some intuition as to when and where CIDER-derived parameters are useful. In addition, there is an FAQ further down the page. If anything is unclear, please contact us for clarification, and we can update the information accordingly.
Quickfire help
Does CIDER predict disorder?
NO. CIDER does NOT provide any disorder prediction features. There are many IDP webservers and databases which can help with this (notably the
D2P2 database). The fraction of disorder promoting residues is SIMPLY the fraction of residue which are *predicted* to be disorder promoting - it does NOT explicitly predict disorder!
Is κ always a relevant parameter?
NO. For sequences with a low FCR, κ is probably a far less useful parameter. Moreover, κ only informs on the charge mixing. Often that charge mixing is informative regarding protein conformations, but again, that is not
necessarily the case.
Why are you warning me that my protein is highly proline rich?
Sequences with a high proline content may not conform to the conformational classifications dictated by the diagram-of-states. As a result, if you have a highly proline rich sequence, the conformations are likely to be more expanded than the diagram-of-states would predict. For proline rich sequences the order parameter Ω,
described below[5], may be relevant. &Omega is calculated by CIDER and avaiable via the 'Data' download option after a set of sequences have been analyzed.
What can CIDER give me?
Fundamentally, CIDER provides you with a set of parameters which can help asses, to a basic approximation, the kinds of conformations your IDP may form. It does not explicitly make predictions. The values calculated are exact, but how they translate to structure and function is a much more challenging question.
κ (kappa)
κ is a parameter to describe the extent of charged amino acid mixing in a sequence. For a sequence of fixed composition, as κ goes from 0 to 1, the sequences can be thought of as becoming less well mixed with respect to the positive and negative residues. Crucially, the κ value of a sequence is always normalized by the most segregated sequence for that sequence composition. This has the effect that comparing sequences with different sequence compositions (notably fraction of charged and uncharged residues) is not appropriate. A useful parameter to combine with κ is the fraction of charged residues (FCR). As the fraction of charged residues increases, the relative impact of how those charges are spread across a sequence becomes more significant.
As an example, below we show a reproduction of the 'EK' peptide sequences from the Das & Pappu[2] paper, in which κ is defined. To the right of each sequence we also include the sequence's κ value;
EKEKEKEKEK EKEKEKEKEK EKEKEKEKEK EKEKEKEKEK EKEKEKEKEK | 0.009
KEKKKEKKEE KKEEKEKEKE KEEKKKEEKE KEKEKKKEEK EKEEKKEEEE | 0.0139
EEEKKEKKEE KEEKKEKKEK EEEKKKEKEE KKEEEKKKEK EEEEKKKKEK | 0.0273
EEEEKKKKEE EEKKKKEEEE KKKKEEEEKK KKEEEEKKKK EEEEKKKKEK | 0.0450
EEKKEEEKEK EKEEEEEKKE KKEKKEKKKE EKEKEKKKEK KKKEKEEEKE | 0.0624
KEKKKEKEKK EKKKEEEKKK EEEKEKKKEE KKEKKEKKEE EEEEEKEEKE | 0.0951
EKEKEEKKKE EKKKKEKKEK EEKKEKEKEK KEEEEEEEEE KEKKEKKKKE | 0.1458
EEEEEKKKKK EEEEEKKKKK EEEEEKKKKK EEEEEKKKKK EEEEEKKKKK | 0.1941
EEKEEEEEEK EEEKEEKKEE EKEKKEKKEK EEKKEKKKKK KKKKKKKEEE | 0.2721
EEEEEKEEEE EEEEEEEKEE KEKKKKKKEK KKKKKKEKEK KKKEKKEEKK | 0.3554
EEEEEEEEEE EKEEEEKEEK EEKEKKKKKK KKKKKKKKKK KKEEKKEEKE | 0.5283
KEEEEEEEKE EKEEEEEEEE EKEEEEKEEK KKKKKKKKKK KKKKKKKKKE | 0.6101
KKEKKKEKKE EEEEEEEEEE EEEEEEEEEK EEKKKKKKKK KKKKKKKEKK | 0.6729
EKKKKKKKKK KKKKKKKKKK KKEEEEEEEE EEEEEEEEEE KKEEEEEKEK | 0.7666
KEEEEKEEEE EEEEEEEEEE EEEEEEEKKK KKKKKKKKKK KKKKKKKKKK | 0.8764
EEEEEEEEEE EEEEEEEEEE EEEEEKKKKK KKKKKKKKKK KKKKKKKKKK | 1.00
Ω (Omega)
The patterning parameter Ω is somewhat analogous the parameter κ, in that it describes the patterning of charged+proline residues with respect to all other residues. The parameter is described in detail in our recent JACS paper [5].
Hydropathy
The hydropathy parameter is a re-scaled Kyte-Doolittle[3] hydropathy value which lies between 0 (least hydrophobic) and 9 (most hydrophobic). This value is simply the residue average value for the entire sequence. Note that for the linear hydropathy plots, we use a value which is simply another rescaled Kyte-Doolittle value that lies between 0 and 1, instead of between 0 and 9.
Disorder promoting
Residues can broadly be categorized into disorder promoting or order promoting, as defined by Dunker & Uversky[1]. For the 'disorder promiting' parameter, we carry out a binary classification of each residue into of order promoting (W, F, Y, I, M, L, V, N, C) and disorder promoting (T,A,G,R,D,H,Q,K,S,E,P). The 'disorder promoting' result in the analysis reflects the fraction of residues in a sequence which form in to the disorder promoting set.
Regions on the diagram-of-states
This defines the location on the diagram-of-states where your sequence lies. The diagram is divided into the the following five regions
- Weak polyampholytes & polyelectrolytes: These sequences are typically globules and tadpoles
- Janus sequences: Collapsed or expanded sequences, where their behavior may depend on other factors (salt concentration, ligand binding, cis-interactions etc.)
- Strong polyampholytes: Coils, hairpins and chimeras - here the types of structures which form may depend on κ
- Negatively charged strong polyelectrolytes: Swollen coils, due to polyanionic repulsion
- Positively charged strong polyelectrolytes: Swollen coils, due to polycationic repulsion
These classifications provide a rough estimate of the kinds the ensembles adopted by an IDP sequence. If the proline content of a sequence is high, these predictions can become much less accurate, due to proline's inflexibility and propensity to form poly-proline II helices (PPII).
Frequently Asked Questions (FAQs)
- Why can I only analyze 10 sequences at once?
CIDER provides a set of tools that make analyzing an arbitrary sequence of interest straightforward and intuitive. The analysis results can be downloaded as a structured text file, or as a high-resolution (200 DPI) .png
file. However, CIDER is not appropriate for running medium or high throughput analysis.
For this, we direct you to localCIDER
, a Python package which runs on Linux, OSX, and Windows, and can carry out all the analysis CIDER provides and more.
- How long is downloadable content hosted on the webserver?
The downloadable content is hosted for between 20 minutes and one hour, depending on a number of factors. We recommend you download your results as soon as they're ready to avoid any timeout issue.
- Is κ the only parameter I need to explain IDP behavior?
Almost certainly not. κ can be extremely useful for sequences that have a high fraction of charged residues, and for comparing sequences of identical sequence composition. However, it should not be treated as the only parameter that matters when considering IDP ensemble behavior.
- I really wish the server would calculate parameter x...
If there's a feature you think CIDER is missing, please contact us and let us know. CIDER has been explicitly developed with rapid updates in mind, and even if a feature is inappropriate for CIDER, it may be something we can add to localCIDER. Our goal is to make this as useful as possible, so any suggestions are gratefully received.
- I really want to carry out analysis of a larger number of sequences, but I thought a "Python" was a type snake - could someone in the lab help me?
Probably! Drop me an email at alex.holehouse@wustl.edu and we'll see if we can figure it out!
- What data do you store, and how is it used?
The only data stored in a persistent manner is usage statistics (specifically how many sequences are submitted, how long they are, and the IP address used to submit those sequences). We do this to help protect ourselves against malicious activity, and to help assess what kinds of data people are analyzing. These data cannot be tied to any of actual sequences submitted - in other words, neither we nor anyone else can determine what sequences people are looking at from the persistent data.
To provide downloadable content (i.e. the plots and the text files containing sequence parameters) and we do store this information in a totally anonymized manner for a short period of time (20 minutes to 1 hour, depending on a number of factors). These files are saved with randomized filename and are only stored to facilitate downloading. They are automatically deleted periodically, and there is no way to map a sequence to the IP address which submitted it.
By not storing any information other than the aforementioned details, we hugely reduce the risk to your data, and avoid situations where we may be responsible for loosing passwords, email addresses, etc. If you have any concerns regarding this please contact us directly.