Difference between revisions of "HLab:Steps"

From CCGB
Jump to: navigation, search
(Created page with "== Bcl – qseq - fastq == ===1. Get run folder name=== from Cheryl. (RUN_NAME) ===2. Create RUN_NAME directory=== under /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to...")
 
(Demultiplexing)
Line 17: Line 17:
 
Run 32 finished in about 2 hours using less than 4G of RAM.
 
Run 32 finished in about 2 hours using less than 4G of RAM.
 
==Demultiplexing==
 
==Demultiplexing==
 +
* Even if none of the lanes are multiplexed these steps and program will rename the fastq files (by creating symlinks).  If it is a mix it will do each lane as appropriate.  The index column in the SampleSheet should be empty for these lanes.
 +
* To demultiplex some specific lanes or all lanes, create a SampleSheet.csv in the /fastq directory of that run (FASTQ_DIRECTORY in config file). It has to have as many lines as the final number of demultiplexed files desired, with one line specifying info for each file. This is a CSV version of excel file Cheryl sends.  (As specified in CASAVA 1.7 users guide - /usr/local/CASAVA-1.7.0/share/CASAVA-1.7.0/docs/cassava/CASAVA1.7_User_Guide_15011196_A.pdf).
 +
* Make sure the file has unix line endings. If it was created using Excel on a Mac, it will have MacOSX line endings. Run Cathy’s line-ends program to change the line endings.
 +
*:::~cathy/bin/line-ends<br />
 +
*:::Usage:  /afs/bx.psu.edu/home/cathy/bin/line-ends <target> <filename> > output <br />
 +
*:::where target = win, mac, or unix
 +
*:::Example: /afs/bx.psu.edu/home/cathy/bin/line-ends unix run26_SampleSheet.csv > run26_SampleSheet_endunix.csv
 +
* The “unknown” files contains fastq reads that were not assigned to any specific index because the index sequence had mismatches.
 +
* The file RUN_NAME/fastq/info.txt has the original fastq file names,  and some stats for that lane; including read length, total reads, count of good reads, and the percent of good reads.
 +
 +
# To begin demultiplexing, run screen on mal or desired machine.
 +
# cd new/runName/fastq/
 +
# ~giardine/illumina/demultiplex.pl RUN_NAME sampleSheet.csv

Revision as of 13:06, 3 December 2012

Bcl – qseq - fastq

1. Get run folder name

from Cheryl. (RUN_NAME)

2. Create RUN_NAME directory

under /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/job_output (example: mkdir 120111_SN407_0185_BD0F1EABXX )

3. Make a config file (CONFIG_FILE)

In /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/ Copy the previous run’s config file and change the info in it to reflect the updated information. Basically, wherever you spot the run name of some old run in this file, change it to the current RUN_NAME. Give your own email address.

4. Submit jobs

In the same directory, i.e. /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/ run the following command:

submit-jobs CONFIG_FILE

This submits jobs to the cluster (persephone). Use qstat to check on progress. It will email at the start and end of each job. The error and output files are written to the directory you created under job_output. If a job errors out use qdel JOBID to delete it. (or qdel –u username to delete all jobs) The output qseq and fastq files will be in the location specified in the CONFIG_FILE. During the conversion to fastq the reads are filtered based on the pass filter field of the qseq file.

Run 32 finished in about 2 hours using less than 4G of RAM.

Demultiplexing

  • Even if none of the lanes are multiplexed these steps and program will rename the fastq files (by creating symlinks). If it is a mix it will do each lane as appropriate. The index column in the SampleSheet should be empty for these lanes.
  • To demultiplex some specific lanes or all lanes, create a SampleSheet.csv in the /fastq directory of that run (FASTQ_DIRECTORY in config file). It has to have as many lines as the final number of demultiplexed files desired, with one line specifying info for each file. This is a CSV version of excel file Cheryl sends. (As specified in CASAVA 1.7 users guide - /usr/local/CASAVA-1.7.0/share/CASAVA-1.7.0/docs/cassava/CASAVA1.7_User_Guide_15011196_A.pdf).
  • Make sure the file has unix line endings. If it was created using Excel on a Mac, it will have MacOSX line endings. Run Cathy’s line-ends program to change the line endings.
    ~cathy/bin/line-ends
    Usage: /afs/bx.psu.edu/home/cathy/bin/line-ends <target> <filename> > output
    where target = win, mac, or unix
    Example: /afs/bx.psu.edu/home/cathy/bin/line-ends unix run26_SampleSheet.csv > run26_SampleSheet_endunix.csv
  • The “unknown” files contains fastq reads that were not assigned to any specific index because the index sequence had mismatches.
  • The file RUN_NAME/fastq/info.txt has the original fastq file names, and some stats for that lane; including read length, total reads, count of good reads, and the percent of good reads.
  1. To begin demultiplexing, run screen on mal or desired machine.
  2. cd new/runName/fastq/
  3. ~giardine/illumina/demultiplex.pl RUN_NAME sampleSheet.csv