Difference between revisions of "SLab:Run Processing"
(→signal processing) |
(→signal processing) |
||
Line 115: | Line 115: | ||
</pre> | </pre> | ||
− | The <code>firecrest</code> directory is for reprocessing images, the <code>bustard</code> directory is for recalling bases, and the <code>gerald</code> directory is for aligning reads to a reference genome. Each directory has a <code>doit</code> script with a sample invocation. | + | The <code>firecrest</code> directory is for reprocessing images, the <code>bustard</code> directory is for recalling bases, and the <code>gerald</code> directory is for aligning reads to a reference genome. Each directory has a <code>doit</code> script with a sample invocation. The submit scripts should be ron on c1.persephone.bx.psu.edu. |
+ | |||
+ | <pre> | ||
+ | % ssh c1.persephone | ||
+ | c1% cd /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/illumina/submit/gerald | ||
+ | c1% ./doit | ||
+ | </pre> | ||
Before aligning reads to a reference genome (by using <code>GERALD</code>), you need to create an appropriate <code>GERALD</code> configuration file. We've been placing these configuration files inside the run directories in a file called <code>config.txt</code> | Before aligning reads to a reference genome (by using <code>GERALD</code>), you need to create an appropriate <code>GERALD</code> configuration file. We've been placing these configuration files inside the run directories in a file called <code>config.txt</code> |
Revision as of 16:11, 26 March 2010
Contents
454
- rigs
- schuster-flx1
- schuster-flx2
- schuster-flx3
- schuster-flx4
on-rig processing
- run directories are stored in /data
- /data/YYYY_MM_DD/R_YYYY_MM_DD_HH_MM_SS_RIGNAME_OPERATOR_RUNNAME
- when a run finishes processing, it calls the /usr/local/rig/bin/postAnalysisScript.sh script
- rsync's run directory to s2:/zfs/md1k-4/data/sequencing/temp/454
- ssh's to c1.persephone to submit job
- depending on run
- calls c1.persephone:/usr/local/bin/submit-signalProcessing.sh
- calls c1.persephone:/usr/local/bin/submit-fullProcessing.sh
- depending on run
- status email is sent to 454pipeline@bx.psu.edu
- our postAnalysisScript.sh is kept in /home/adminrig/postAnalysisScript directory on each rig
- revision controlled using rcs
- % co -l postAnalysisScript.sh
- % vi postAnalysisScript.sh
- % ci -u postAnalysisScript.sh
- Makefile in this directory installs our version into /usr/local/rig/bin
- % make install
- revision controlled using rcs
signal processing
- signal processing for runs is performed on the persephone cluster
- depending on run
- uses qsub to submit job using c1.persephone:/usr/local/bin/signalProcessing.qsub
- uses qsub to submit job using c1.persephone:/usr/local/bin/fullProcessing.qsub
- status email is sent to 454pipeline@bx.psu.edu
Before exiting signal processing jobs signal that processing is done by touching a file with the same name as the run directory:
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/temp/454/.processing_finished/RUN_DIR_NAME
staging
A cron job on s2 checks the /zfs/md1k-4/data/sequencing/temp/454/.processing_finished directory once a minute to see if any signal processing jobs have finished. When it finds a finished signal processing job, it moves it to the staging directory:
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/staging/454/RUN_DIR
Once the run has been copied to the staging directory, the files in the run directory are modified as needed to make sure they have the correct owner, group, and permissions.
archive
To archive a run, it needs to be moved into one of the archive folders (md1k-1, md1k-2, md1k-3 on s3) or (md1k-4, md1k-5, md1k-6 on s2).
The /zfs/md1k-N/archive filesystem is compressed and exported read-only.
s3:/zfs/md1k-{1,2,3}/archive/sequencing/454/YYYY/YYYY_MM_DD/ s2:/zfs/md1k-{4,5,6}/archive/sequencing/454/YYYY/YYYY_MM_DD/
After the run has been archived, the links in the following directory need to be modified to reflect the location of the run.
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/454
Illumina
- systems
- illumina-ga
on-system processing
Samba is running on s2.persephone so that the Illumina GA can copy it's data directly to s2.
- The Illumina GA copies it's data to one of three locations (decided by the operator)
- s2.persephone\\illumina-4 which is on s2.persephone:/zfs/md1k-4/data/illumina
- s2.persephone\\illumina-5 which is on s2.persephone:/zfs/md1k-5/data/illumina
- s2.persephone\\illumina-6 which is on s2.persephone:/zfs/md1k-5/data/illumina
- Using the current software, both image analysis and base calling are performed on-system
- SCS2.5/RTA1.5
- SCS2.6/RTA1.6
staging
After we receive an email from the operator informing us that a tun has completed, we copy it to the staging directory using one of these commands:
For runs that used illumina-4
% mv /zfs/md1k-4/data/illumina/RUN_NAME /zfs/md1k-4/data/sequencing/staging/illumina/RUN_NAME
For runs that used illumina-5
% mv /zfs/md1k-5/data/illumina/RUN_NAME /zfs/md1k-5/data/sequencing/staging/illumina/RUN_NAME
For runs that used illumina-4
% mv /zfs/md1k-6/data/illumina/RUN_NAME /zfs/md1k-6/data/sequencing/staging/illumina/RUN_NAME
We then create a symlink to the run in AFS: (replace md1k-4-data with md1k-5-data or md1k-6-data as necessary)
% cd /afs/.bx.psu.edu/depot/data/schuster_lab/sequencing/staging/illumina % ln -s /nfs/s2.persephone.bx.psu.edu/md1k-4-data/sequencing/staging/illumina/RUN_NAME % cd /afs/.bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/flat % ln -s /nfs/s2.persephone.bx.psu.edu/md1k-4-data/sequencing/staging/illumina/RUN_NAME % afs-control release data.schuster_lab
signal processing
When runs come off of the Illumina GA, their images have already been processed and bases have already been called. Scripts for processing Illumina runs can be found here:
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/illumina/submit
The firecrest
directory is for reprocessing images, the bustard
directory is for recalling bases, and the gerald
directory is for aligning reads to a reference genome. Each directory has a doit
script with a sample invocation. The submit scripts should be ron on c1.persephone.bx.psu.edu.
% ssh c1.persephone c1% cd /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/illumina/submit/gerald c1% ./doit
Before aligning reads to a reference genome (by using GERALD
), you need to create an appropriate GERALD
configuration file. We've been placing these configuration files inside the run directories in a file called config.txt
GERALD
config file for 100318_HWUSI-EAS610_0009
1278:ELAND_GENOME /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/staging/illumina/reference/mm8 4:ELAND_GENOME /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/staging/illumina/reference/VaMs102 12478:ANALYSIS eland_extended 356:ANALYSIS sequence USE_BASES Y36 QCAL_SOURCE upstream ELAND_SET_SIZE 60 EMAIL_LIST illumina-pipeline@bx.psu.edu EMAIL_SERVER smtp EMAIL_DOMAIN bx.psu.edu WEB_DIR_ROOT https://badger.bx.psu.edu/illumina
GERALD
config file for 100211_HWUSI-EAS610_0005
USE_BASES Y76,Y76 1235678:ANALYSIS sequence_pair 4:ANALYSIS eland_pair 4:ELAND_GENOME /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/illumina/reference/hg18 QCAL_SOURCE upstream ELAND_SET_SIZE 60 EMAIL_LIST illumina-pipeline@bx.psu.edu EMAIL_SERVER smtp EMAIL_DOMAIN bx.psu.edu WEB_DIR_ROOT https://badger.bx.psu.edu/illumina
For more examples see:
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/flat/*/config.txt
archive
To archive a run, it needs to be moved into one of the archive folders (md1k-1, md1k-2, md1k-3 on s3) or (md1k-4, md1k-5, md1k-6 on s2).
The /zfs/md1k-N/archive filesystem is compressed and exported read-only.
s3:/zfs/md1k-{1,2,3}/archive/sequencing/illumina/YYYY/YYYY_MM_DD/ s2:/zfs/md1k-{4,5,6}/archive/sequencing/illumina/YYYY/YYYY_MM_DD/
After the run has been archived, the links in the following directory need to be modified to reflect the location of the run.
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina