Latest revision as of 14:24, 31 October 2016

Hardison lab workflows

Purpose

To make datasets consistant and comparable the data is processed through uniform workflows.

Galaxy workflows

There are workflows set up in Galaxy for processing histone modifications and ChIP-Seq data. Due to various problems these are not used much anymore.

BioSTAR workflows

BioSTAR is a computational cluster at PSU. You will need a Penn State username and to apply for access to the cluster. For general information on the disk directory structure and the resources available see the User Guide. To access the cluster type "ssh user@biostar.psu.edu" from the command line on your computer. Once logged in see ~/biostar/biostarNotes.txt for some notes on commands I find useful.

There are several workflows set up on BioSTAR.

Transcription Factor Workflow ~/biostar/tfWorkflow
Histone Workflow ~/biostar/histWorkflow
RNA-Seq Workflow ~/biostar/rnaWorkflow
IDR Workflow ~/biostar/idrWorkflow
DNase Workflow ~/biostar/dnaseWorkflow
ATAC Workflow ~/biostar/atacWorkflow

In general to run a workflow, create a directory on the scratch drive and put the input files in this directory. For all but the IDR workflow this will be fastq files. For additional information needed by some workflows the commonly used files are under ~/group/genomes. This would include files with chromosome sizes and gene sets. Use the config file to tell the workflow which file to use. Copy the example config file from the workflow directory and set the variables in the file as appropriate for your data. Run the workflow by running the submit-jobs script for the workflow giving it the full path to your config file. The jobs will be added to the queue and you will be emailed as they are finished. Copy the results back to BX using an interactive job in the copy is large. In each workflow directory you will find a notes.txt file with more help for specifics to that workflow.

Under the ~/biostar directory there are other workflows that were set up for special cases, and a file with more notes (workflowNotes.txt) which suggests one manor of transfering the files from BioSTAR to BX.

ENCODE workflows

Long RNA-seq pipeline
ChIP-seq pipeline

Return the HLab:Main

@@ Line 19: / Line 19: @@
 :*IDR Workflow ~/biostar/idrWorkflow
 :*DNase Workflow ~/biostar/dnaseWorkflow
+:*ATAC Workflow ~/biostar/atacWorkflow
 In general to run a workflow, create a directory on the scratch drive and put the input files in this directory. For all but the IDR workflow this will be fastq files. For additional information needed by some workflows the commonly used files are under ~/group/genomes. This would include files with chromosome sizes and gene sets. Use the config file to tell the workflow which file to use. Copy the example config file from the workflow directory and set the variables in the file as appropriate for your data. Run the workflow by running the submit-jobs script for the workflow giving it the full path to your config file. The jobs will be added to the queue and you will be emailed as they are finished. Copy the results back to BX using an interactive job in the copy is large. In each workflow directory you will find a notes.txt file with more help for specifics to that workflow.

Difference between revisions of "HLab:Workflows"

Latest revision as of 14:24, 31 October 2016

Contents

Hardison lab workflows

Purpose

Galaxy workflows

BioSTAR workflows

ENCODE workflows

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools