Difference between revisions of "SLab:File System Layout"

From CCGB
Jump to: navigation, search
(Sequencing Data)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= /afs/bx.psu.edu/depot/data/schuster_lab =
+
= Schuster Lab Root Directory =
 +
* /afs/bx.psu.edu/depot/data/schuster_lab
  
 
The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.
 
The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.
  
 +
The volume mounted at this location is RO-replicated. The main purpose of this root data.schuster_lab volume is to contain mounpoints to other volumes, as well as the symlink farm under ''sequencing/''
  
= Sequencing Data =
+
== scratch/ ==
 +
Individual global scratch volumes. On request only (admin-at-bx.psu.edu).
 +
 
 +
== sequencing/ ==
  
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing
Line 10: Line 15:
 
Access to all of the Schuster Lab's sequencing data is through the sequencing directory.  This directory contains archived run directories as well as directories for runs that are currently in progress.
 
Access to all of the Schuster Lab's sequencing data is through the sequencing directory.  This directory contains archived run directories as well as directories for runs that are currently in progress.
  
== archive ==
+
Most of the top-level run directories under sequencing/ are symlinks managed by the new symlink management system: http://github.com/phalenor/ssdfs
 
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/
 
** 454/
 
*** by_date/
 
*** flat/
 
*** by_project/
 
** illumina/
 
*** by_date/
 
*** flat/
 
*** by_project/
 
 
 
The sequencing archive directory contains a read-only archive of all sequencing runs and contains a sub directory for each sequencing platform.  Within each sequencing platform directory, runs are organized by the date the run was started.
 
 
 
Directory for the 454 R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/454/by_date/2010/2010_01_22/R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9
 
</pre>
 
 
 
Directory for the Illumina 100201_HWUSI-EAS610_0006 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/by_date/2010/2010_02_01/100201_HWUSI-EAS610_0006
 
</pre>
 
 
 
 
 
In addition, each sequencing platform directory contains a <code>flat/</code> sub directory that contains symbolic links to every run for that platform.
 
  
<code>flat/</code> directory for the 454 R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9 run.
+
SSDFS is not currently capable of delegating commands to normal users, and requires some finesse when executing certain commands due to NFS permissions. Email admin-at-bx.psu.edu if something under sequencing/ doesn't look right.
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/454/flat/R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9
 
</pre>
 
  
<code>flat/</code> directory for the Illumina 100201_HWUSI-EAS610_0006 run.
+
In short, datasets are organized into "volumes" distributed between 2 servers and multiple NFS filesystems. 454 datasets are organized by one volume per month, and Illumina datasets are one volume per dataset. SSDFS fully abstracts the actual location of the volume (server/filesystem). If you need direct access to a volume, it will be under ''/afs/bx.psu.edu/depot/data/schuster_lab/.ssdfs/vol/by-name/<name>''
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/flat/100201_HWUSI-EAS610_0006
 
</pre>
 
  
 +
Layout:
 +
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/
 +
** 454/ (all 454 runs)
 +
*** $YEAR/
 +
**** $YEAR/$YEAR-$MONTH/
 +
*** incoming/
 +
** illumina/ (all illumina runs)
 +
*** $YEAR/
 +
**** $YEAR/$RUN_NAME/
 +
*** Instruments/
 +
*** HWUSI-EAS610/
 +
*** incoming/
 +
**** illumina-4 -> /nfs/s2.persephone.bx.psu.edu/md1k-4-data/illumina/
 +
**** illumina-5 -> /nfs/s2.persephone.bx.psu.edu/md1k-5-data/illumina/
 +
**** illumina-6 -> /nfs/s2.persephone.bx.psu.edu/md1k-6-data/illumina/
 +
** scripts/ (Various shell scripts, qsub scripts, etc)
 +
*** 454/
 +
** support/
 +
*** 454/
 +
*** illumina/
  
The <code>by_project/</code> directories can be used to organize all runs for a specific project
+
== support/ ==
 +
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/
  
For the chestnut project:
+
The sequencing support directory contains manuals, software, and reference genomes used during sequencing and sequence processing.
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/by_project/chestnut/runs/100211_HWUSI-EAS610_0005
 
</pre>
 
  
== bin ==
+
= projects/ =  
 +
* /afs/bx.psu.edu/depot/data/schuster_lab/projects
  
== staging ==
+
The projects directory contains directories for project specific data including the Tasmanian Devil, KB1, and Woolly Mammoth projects.
  
== support ==
+
= users/ =
 +
* /afs/bx.psu.edu/depot/data/schuster_lab/users
  
== temp ==
+
The users directory contains directories for people within the Schuster Lab.

Latest revision as of 16:47, 4 October 2010

Schuster Lab Root Directory

  • /afs/bx.psu.edu/depot/data/schuster_lab

The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.

The volume mounted at this location is RO-replicated. The main purpose of this root data.schuster_lab volume is to contain mounpoints to other volumes, as well as the symlink farm under sequencing/

scratch/

Individual global scratch volumes. On request only (admin-at-bx.psu.edu).

sequencing/

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing

Access to all of the Schuster Lab's sequencing data is through the sequencing directory. This directory contains archived run directories as well as directories for runs that are currently in progress.

Most of the top-level run directories under sequencing/ are symlinks managed by the new symlink management system: http://github.com/phalenor/ssdfs

SSDFS is not currently capable of delegating commands to normal users, and requires some finesse when executing certain commands due to NFS permissions. Email admin-at-bx.psu.edu if something under sequencing/ doesn't look right.

In short, datasets are organized into "volumes" distributed between 2 servers and multiple NFS filesystems. 454 datasets are organized by one volume per month, and Illumina datasets are one volume per dataset. SSDFS fully abstracts the actual location of the volume (server/filesystem). If you need direct access to a volume, it will be under /afs/bx.psu.edu/depot/data/schuster_lab/.ssdfs/vol/by-name/<name>

Layout:

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/
    • 454/ (all 454 runs)
      • $YEAR/
        • $YEAR/$YEAR-$MONTH/
      • incoming/
    • illumina/ (all illumina runs)
      • $YEAR/
        • $YEAR/$RUN_NAME/
      • Instruments/
      • HWUSI-EAS610/
      • incoming/
        • illumina-4 -> /nfs/s2.persephone.bx.psu.edu/md1k-4-data/illumina/
        • illumina-5 -> /nfs/s2.persephone.bx.psu.edu/md1k-5-data/illumina/
        • illumina-6 -> /nfs/s2.persephone.bx.psu.edu/md1k-6-data/illumina/
    • scripts/ (Various shell scripts, qsub scripts, etc)
      • 454/
    • support/
      • 454/
      • illumina/

support/

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/

The sequencing support directory contains manuals, software, and reference genomes used during sequencing and sequence processing.

projects/

  • /afs/bx.psu.edu/depot/data/schuster_lab/projects

The projects directory contains directories for project specific data including the Tasmanian Devil, KB1, and Woolly Mammoth projects.

users/

  • /afs/bx.psu.edu/depot/data/schuster_lab/users

The users directory contains directories for people within the Schuster Lab.