CAST - Faq: CAST:Chip-Seq Analysis System Tool

Scheduled Maintenance

Due to a cluster migration analyses submission is temporarly disabled.
We will back online as soon as possible







 
Your browser (unknown) is not fully supported. Some features may not properly work, we warmly recommend to use a different browser.
Supported browser are: Mozilla Firefox, Google Chrome, Internet Explorer, Opera and Safari×

FAQ

Frequently asked questions
We warmly invite you to read our Help section too. If you still have some doubts about our tool, please write us from our feedback page.

Analysis

  1. What input files are supported?
  2. How many samples can be uploaded and analyzed? How many analysis can be run? Are there any time limits?
  3. Can I upload multiple files? And multiple samples?
  4. How long does a typical analysis take?
  5. What do you recommend before running an analysis?
  6. How can i use best the standard pipeline and MACS software?
  7. How can my parameters influence analysis results?
  8. Why is CAST forcing me to apply a control file for MACS peak calling?
  9. Do i need an input DNA as control?

Archive

  1. What is a study? What's the difference with a project?

Results

  1. How can I quickly learn how to browse results?
  2. Is it possibile to download the results of my analysis?
  3. What makes so difficult to find significant peaks?
  4. What is the FDR column? What is a reasonable threshold?
  5. Why am i getting high FDR for every peak?



Analysis

1. What input files are supported?
The CAST tool supports:
  • short-read data-sets produced by Illumina sequencing platforms (FASTQ)
  • several standard file formats (SRA, BAM)
  • compressed archives (zip1, tar, gzip, bz or bz2 compression are admitted)

CAST automatically detects the type of uploaded file and chooses the necessary program to decompress it.

[1] Warning: compressed archives obtained with Mac OS X require the windows-compatibility flag
2. How many samples can be uploaded and analyzed? How many analysis can be run? Are there any time limits?
An account with User rank can:
  • create up to 2 studies
  • upload up to 12 files
  • build up to 2 analyses
  • run 1 analysis at a time
3. Can I upload multiple files? And multiple samples?
Each sample has to be uploaded as a single file: you can upload multiple files if you have multiple samples.
The general rule is: 1 sample = 1 file, no merge operation will be executed by our system.
4. How long does a typical analysis take?
The amount of time required by an analysis execution is influenced by different factors, such as:
  • The amount of files uploaded
  • The sequencing region (genome-wide or targeted)
  • The number of jobs waiting for execution on our servers
However, you can find an estimated execution time in your analysis monitoring page.
5. What do you recommend before running an analysis?
We recommend that you understand and tweak the peak finder parameters for your data set.
Once the reads have been aligned to the reference genome you can run different analyses on the same BAM file and compare the results.
6. How can i use best the standard pipeline and MACS software?
We recommend that you read Zhang et al. (2008) for a detailed explanation of the MACS peak finding algorithm.
We strongly recommend that you always look at the MACS logfile to see how well MACS did on your data set:
failures to complete the MACS analysis are often related to the experimental data and/or the chosen analysis parameters.
7. How can my parameters influence analysis results?
Setting the right bandwidth and mfold parameters for your data set is important.
If these parameters are set too stringently, MACS is unable to find enough high-quality peaks and will exit with an error.
Moreover getting the closest bandwith to DNA sonication size helps to tune MACS in order to call the right peaks.
8. Why is CAST forcing me to apply a control file for MACS peak calling?
Because control is used for calculating enrichment significance, to provide more rigorous filtering of false positives
and accurate methods for ranking high confidence peak calls.
We recommend that you use a real control in your experimental setup.
The experiment and control samples should have a comparable (high) number of reads.
MACS simply linearly scales (normalizes) the number of reads and therefore noise will be scaled in the same way as signal.
9. Do i need an input DNA as control?
No. MACS can also be applied to identify differential peaks between two conditions by treating one of the samples as the control.
However, calculated FDR value should be ignored, as peaks from either sample are likely to be biologically meaningful in this case.

Archive

1. What is a study? What's the difference with a project?
According to the EBI/ENA data format standard, a study contains information about the a single sequencing project (more analysis can be run in a single study). So, in practice, a study contains all the information about a project.

Results

1. How can I quickly learn how to browse results?
Click on Results example (on the top navigation menu) to follow a guided tour of the results pages.
2. Is it possibile to download the results of my analysis?
Yes, you have two options:
  • Downloading the results directly from the analysis monitoring page.
  • From the Results page, after applying any filter, with the DOWNLOAD link.
Warning!
Data browsing and downloading has been optimized with caching/sessioning.
While this ensures better performance in page loading, as a drawback opening different tabs on different samples may lead to data misconfiguration.
Please ensure to filter and download one sample at the time.
3. What makes so difficult to find significant peaks?
Not all genomic regions are equal.
Things like sequencing biases, mapping biases, chromatin and copy number variations, and repeat structures create regional differences in ChIP-Seq data.
However MACS addresses some of these issues by looking at the background noise in a control sample and the direct surroundings of a potential peak.
4. What is the FDR column? What is a reasonable threshold?
In an ideal experiment, with control and samples well balanced, peaks with FDR < 1% are not likely to be false positives.
5. Why am i getting high FDR for every peak?
MACS computes FDR based on the theoretical Poisson distribution about ChIP and control libraries.
In reality, ChIP usually has a much smaller coverage than the control. Therefore, there will be many regions that show only enrichment in the control library. And sometimes, these enrichments can be highly significant, depending on the experimental design, anti-body efficiency, sequencing machine, etc.
Therefore, we can see many significant positive peaks being tagged with "100%" FDR.

Back to top

Login



Forgot your password? Click here to reset×