Difference between revisions of "HLab:Steps"

From CCGB
Jump to: navigation, search
Line 1: Line 1:
== Bcl – qseq - fastq ==
+
== HiSeq 2000 ==
# Get run folder name from Cheryl. (RUN_NAME)
+
# Make sure sample sheet does not include quotes.
# Create RUN_NAME directory under /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/job_output (example: mkdir 120111_SN407_0185_BD0F1EABXX )
+
# configureBclToFastq.pl --input-dir <BaseCalls_dir> --output-dir <Unaligned> --sample-sheet <BaseCalls_dir>/SampleSheet.csv --no-eamss --mismatches 1 --fastq-cluster-count 0
# Make a config file (CONFIG_FILE) in /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/
+
#* output-dir should be under fastq directory, year, run folder, then fastq.
#: Copy the previous run’s config file and change the info in it to reflect the updated information. Basically, wherever you spot the run name of some old run in this file, change it to the current RUN_NAME. Give your own email address.
+
# cd output-dir
#In the same directory, i.e.  /afs/bx.psu.edu/depot/data/hardison_lab/illumina/bcl_to_fastq/ run the following command:
+
# On mal: (screen or) nohup make -j 4
#:submit-jobs CONFIG_FILE
+
# concat read1, read2 fastq files
 
+
# Put symlinks in production folder pointing to fastq files
This submits jobs to the cluster (persephone).  Use qstat to check on progress.  It will email at the start and end of each job.  The error and output files are written to the directory you created under job_output.  If a job errors out use qdel JOBID to delete it.  (or qdel –u username to delete all jobs)
+
# Copy data from illumina-9 to rawseq folder
The output qseq and fastq files will be in the location specified in the CONFIG_FILE.  During the conversion to fastq the reads are filtered based on the pass filter field of the qseq file. 
 
 
 
Run 32 finished in about 2 hours using less than 4G of RAM.
 
 
 
==Demultiplexing==
 
* Even if none of the lanes are multiplexed these steps and program will rename the fastq files (by creating symlinks).  If it is a mix it will do each lane as appropriate.  The index column in the SampleSheet should be empty for these lanes.
 
* To demultiplex some specific lanes or all lanes, create a SampleSheet.csv in the /fastq directory of that run (FASTQ_DIRECTORY in config file). It has to have as many lines as the final number of demultiplexed files desired, with one line specifying info for each file. This is a CSV version of excel file Cheryl sends.  (As specified in CASAVA 1.7 users guide - /usr/local/CASAVA-1.7.0/share/CASAVA-1.7.0/docs/cassava/CASAVA1.7_User_Guide_15011196_A.pdf).
 
* Make sure the file has unix line endings. If it was created using Excel on a Mac, it will have MacOSX line endings. Run Cathy’s line-ends program to change the line endings.
 
*:::~cathy/bin/line-ends<br />
 
*:::Usage:  /afs/bx.psu.edu/home/cathy/bin/line-ends <target> <filename> > output <br />
 
*:::where target = win, mac, or unix
 
*:::Example: /afs/bx.psu.edu/home/cathy/bin/line-ends unix run26_SampleSheet.csv > run26_SampleSheet_endunix.csv
 
* The “unknown” files contains fastq reads that were not assigned to any specific index because the index sequence had mismatches.
 
* The file RUN_NAME/fastq/info.txt has the original fastq file names, and some stats for that lane; including read length, total reads, count of good reads, and the percent of good reads.
 
 
 
# To begin demultiplexing, run screen on mal or desired machine.
 
# cd new/runName/fastq/
 
# ~giardine/illumina/demultiplex.pl RUN_NAME sampleSheet.csv
 
 
 
==FTP files to Galaxy==
 
Shell script to ftp files all at once.  This will take a while, be sure to run screen.  Go to Get Data and load the files into your history in less than 3 days from the ftp or they may be deleted.  Put your password and email in place of the one in the example.
 
 
 
  pagscr
 
  kinit; aklog
 
  export GALAXYPASS=yourpassword
 
 
 
  ftp -ivn main.g2.bx.psu.edu <<END
 
    user giardine@bx.psu.edu $GALAXYPASS
 
    mput *Read*.fq
 
    bye
 
  END
 
 
 
==Groom reads and add to library==
 
# Use the galaxy tool (FASTQ groomer) to groom the fastq reads.  (score format Sanger, input & output)
 
# Rename the files to their previous name changing .fq to .groomed so you know which ones are groomed.  Put the file name in the notes box also (.fq). 
 
# Import into the correct library and folder.  Edit the information. Put the old filename in the Message box.  Change the file name to match the pattern
 
#: ddMonYYYY_ln9999_#index_Read#_description_here_groomed_reads
 
# Put groomed read counts in Google docs (if have write access)
 
  
 +
== NextSeq 500 ==
 +
# Copy data from learfan back to rawseq directory.
 +
# Copy sample sheet to top directory of run folder, name must be SampleSheet.csv
 +
# On mal: (screen or) nohup '''bcl2fastq2''' --runfolder-dir ~/hlab/reorg/rawseq/<year>/<run folder> -p 3 -d 2 --barcode-mismatches 1
 +
#* needs min 16G RAM per core (3x16=48G out of 60G)
 +
# Make symlinks in fastq folder to rawseq Data/Intensities/BaseCalls
 +
# Put symlinks in production folder pointing to fastq files
  
 
==Trouble shooting==
 
==Trouble shooting==
 
# Things to make sure before you begin
 
# Things to make sure before you begin
#:*echo $PATH
+
# Make sure the file has unix line endings. If it was created using Excel on a Mac, it will have MacOSX line endings. Run Cathy’s line-ends program to change the line endings.
#::you should get /sge/ in your path
+
#*:~cathy/bin/line-ends<br />
#:*You need access to Persephone users group. The default is you do not have access.
+
#*:Usage: /afs/bx.psu.edu/home/cathy/bin/line-ends <target> <filename> > output <br />
# In case of trouble, look at error logs in job_output/RUN_NAME
+
#*:where target = win, mac, or unix
# If you get this error in the email: "failed to set AFS token"
+
#*:Example: /afs/bx.psu.edu/home/cathy/bin/line-ends unix run26_SampleSheet.csv > run26_SampleSheet_endunix.csv
#: Usually, just logging out and logging in again solves the problem. Relogin and use qdel –u username to delete all your old jobs. Then resubmit the jobs as before.
 
# If you get a permissions error, most likely you do not have access to the cluster used for the jobs. Email admin.
 

Revision as of 13:46, 8 August 2014

HiSeq 2000

  1. Make sure sample sheet does not include quotes.
  2. configureBclToFastq.pl --input-dir <BaseCalls_dir> --output-dir <Unaligned> --sample-sheet <BaseCalls_dir>/SampleSheet.csv --no-eamss --mismatches 1 --fastq-cluster-count 0
    • output-dir should be under fastq directory, year, run folder, then fastq.
  3. cd output-dir
  4. On mal: (screen or) nohup make -j 4
  5. concat read1, read2 fastq files
  6. Put symlinks in production folder pointing to fastq files
  7. Copy data from illumina-9 to rawseq folder

NextSeq 500

  1. Copy data from learfan back to rawseq directory.
  2. Copy sample sheet to top directory of run folder, name must be SampleSheet.csv
  3. On mal: (screen or) nohup bcl2fastq2 --runfolder-dir ~/hlab/reorg/rawseq/<year>/<run folder> -p 3 -d 2 --barcode-mismatches 1
    • needs min 16G RAM per core (3x16=48G out of 60G)
  4. Make symlinks in fastq folder to rawseq Data/Intensities/BaseCalls
  5. Put symlinks in production folder pointing to fastq files

Trouble shooting

  1. Things to make sure before you begin
  2. Make sure the file has unix line endings. If it was created using Excel on a Mac, it will have MacOSX line endings. Run Cathy’s line-ends program to change the line endings.
    • ~cathy/bin/line-ends
      Usage: /afs/bx.psu.edu/home/cathy/bin/line-ends <target> <filename> > output
      where target = win, mac, or unix
      Example: /afs/bx.psu.edu/home/cathy/bin/line-ends unix run26_SampleSheet.csv > run26_SampleSheet_endunix.csv