Difference between revisions of "HLab:Workflows"
(Created page with "=Hardison lab workflows= ==Purpose== To make datasets consistant and comparable the data is processed through uniform workflows. ==Galaxy workflows== There are workflows set ...") |
m |
||
(5 intermediate revisions by the same user not shown) | |||
Line 16: | Line 16: | ||
:*Transcription Factor Workflow ~/biostar/tfWorkflow | :*Transcription Factor Workflow ~/biostar/tfWorkflow | ||
:*Histone Workflow ~/biostar/histWorkflow | :*Histone Workflow ~/biostar/histWorkflow | ||
− | :*RNA-Seq Workflow ~/ | + | :*RNA-Seq Workflow ~/biostar/rnaWorkflow |
:*IDR Workflow ~/biostar/idrWorkflow | :*IDR Workflow ~/biostar/idrWorkflow | ||
+ | :*DNase Workflow ~/biostar/dnaseWorkflow | ||
+ | :*ATAC Workflow ~/biostar/atacWorkflow | ||
− | In general to run a workflow, create a directory on the scratch drive and put the input files in this directory. For all but the IDR workflow this will be fastq files. For additional information needed by some workflows the commonly used files are under ~/group/genomes. This would include files with chromosome sizes and gene sets. Use the config file to tell the workflow which file to use. Copy the example config file from the workflow directory and set the variables in the file as appropriate for your data. Run the workflow by running the submit-jobs script for the workflow giving it the full path to your config file. The jobs will be added to the queue and you will be emailed as they are | + | In general to run a workflow, create a directory on the scratch drive and put the input files in this directory. For all but the IDR workflow this will be fastq files. For additional information needed by some workflows the commonly used files are under ~/group/genomes. This would include files with chromosome sizes and gene sets. Use the config file to tell the workflow which file to use. Copy the example config file from the workflow directory and set the variables in the file as appropriate for your data. Run the workflow by running the submit-jobs script for the workflow giving it the full path to your config file. The jobs will be added to the queue and you will be emailed as they are finished. Copy the results back to BX using an interactive job in the copy is large. In each workflow directory you will find a notes.txt file with more help for specifics to that workflow. |
Under the ~/biostar directory there are other workflows that were set up for special cases, and a file with more notes (workflowNotes.txt) which suggests one manor of transfering the files from BioSTAR to BX. | Under the ~/biostar directory there are other workflows that were set up for special cases, and a file with more notes (workflowNotes.txt) which suggests one manor of transfering the files from BioSTAR to BX. | ||
+ | |||
+ | ==ENCODE workflows== | ||
+ | |||
+ | [https://github.com/ENCODE-DCC/long-rna-seq-pipeline Long RNA-seq pipeline]<br /> | ||
+ | [https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#heading=h.9ecc41kilcvq ChIP-seq pipeline] | ||
+ | |||
+ | |||
+ | Return the [[HLab:Main]] |
Latest revision as of 13:24, 31 October 2016
Contents
Hardison lab workflows
Purpose
To make datasets consistant and comparable the data is processed through uniform workflows.
Galaxy workflows
There are workflows set up in Galaxy for processing histone modifications and ChIP-Seq data. Due to various problems these are not used much anymore.
BioSTAR workflows
BioSTAR is a computational cluster at PSU. You will need a Penn State username and to apply for access to the cluster. For general information on the disk directory structure and the resources available see the User Guide. To access the cluster type "ssh user@biostar.psu.edu" from the command line on your computer. Once logged in see ~/biostar/biostarNotes.txt for some notes on commands I find useful.
There are several workflows set up on BioSTAR.
- Transcription Factor Workflow ~/biostar/tfWorkflow
- Histone Workflow ~/biostar/histWorkflow
- RNA-Seq Workflow ~/biostar/rnaWorkflow
- IDR Workflow ~/biostar/idrWorkflow
- DNase Workflow ~/biostar/dnaseWorkflow
- ATAC Workflow ~/biostar/atacWorkflow
In general to run a workflow, create a directory on the scratch drive and put the input files in this directory. For all but the IDR workflow this will be fastq files. For additional information needed by some workflows the commonly used files are under ~/group/genomes. This would include files with chromosome sizes and gene sets. Use the config file to tell the workflow which file to use. Copy the example config file from the workflow directory and set the variables in the file as appropriate for your data. Run the workflow by running the submit-jobs script for the workflow giving it the full path to your config file. The jobs will be added to the queue and you will be emailed as they are finished. Copy the results back to BX using an interactive job in the copy is large. In each workflow directory you will find a notes.txt file with more help for specifics to that workflow.
Under the ~/biostar directory there are other workflows that were set up for special cases, and a file with more notes (workflowNotes.txt) which suggests one manor of transfering the files from BioSTAR to BX.
ENCODE workflows
Long RNA-seq pipeline
ChIP-seq pipeline
Return the HLab:Main