Difference between revisions of "SLab:File System Layout"

From CCGB
Jump to: navigation, search
(Users Directory)
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:
 
The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.
 
The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.
  
= Sequencing Directory =
+
The volume mounted at this location is RO-replicated. The main purpose of this root data.schuster_lab volume is to contain mounpoints to other volumes, as well as the symlink farm under ''sequencing/''
 +
 
 +
== scratch/ ==
 +
Individual global scratch volumes. On request only (admin-at-bx.psu.edu).
 +
 
 +
== sequencing/ ==
  
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing
Line 10: Line 15:
 
Access to all of the Schuster Lab's sequencing data is through the sequencing directory.  This directory contains archived run directories as well as directories for runs that are currently in progress.
 
Access to all of the Schuster Lab's sequencing data is through the sequencing directory.  This directory contains archived run directories as well as directories for runs that are currently in progress.
  
== archive ==
+
Most of the top-level run directories under sequencing/ are symlinks managed by the new symlink management system: http://github.com/phalenor/ssdfs
 
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/
 
** 454/
 
*** by_date/
 
*** flat/
 
*** by_project/
 
** illumina/
 
*** by_date/
 
*** flat/
 
*** by_project/
 
 
 
The sequencing archive directory contains a read-only archive of all sequencing runs and contains a sub directory for each sequencing platform.  Within each sequencing platform directory, runs are organized by the date the run was started.
 
 
 
Directory for the 454 R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/454/by_date/2010/2010_01_22/R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9
 
</pre>
 
 
 
Directory for the Illumina 100201_HWUSI-EAS610_0006 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/by_date/2010/2010_02_01/100201_HWUSI-EAS610_0006
 
</pre>
 
 
 
 
 
In addition, each sequencing platform directory contains a <code>flat/</code> sub directory that contains symbolic links to every run for that platform.
 
 
 
<code>flat/</code> directory for the 454 R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/454/flat/R_2010_01_22_16_12_03_FLX11080447_John_ChineseChestnut_Run9
 
</pre>
 
 
 
<code>flat/</code> directory for the Illumina 100201_HWUSI-EAS610_0006 run.
 
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/flat/100201_HWUSI-EAS610_0006
 
</pre>
 
 
 
 
 
The <code>by_project/</code> directories can be used to organize all runs for a specific project
 
  
For the chestnut project:
+
SSDFS is not currently capable of delegating commands to normal users, and requires some finesse when executing certain commands due to NFS permissions. Email admin-at-bx.psu.edu if something under sequencing/ doesn't look right.
<pre>
 
/afs/bx.psu.edu/depot/data/schuster_lab/sequencing/archive/illumina/by_project/chestnut/runs/100211_HWUSI-EAS610_0005
 
</pre>
 
  
== staging and temp ==
+
In short, datasets are organized into "volumes" distributed between 2 servers and multiple NFS filesystems. 454 datasets are organized by one volume per month, and Illumina datasets are one volume per dataset. SSDFS fully abstracts the actual location of the volume (server/filesystem). If you need direct access to a volume, it will be under ''/afs/bx.psu.edu/depot/data/schuster_lab/.ssdfs/vol/by-name/<name>''
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/staging/
 
** 454/
 
** illumina/
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/temp/
 
** 454/
 
** illumina
 
  
The sequencing staging and sequencing temp directories are used to hold sequencing runs that are still being processed (either the run is not finished, or signal processing is being done).
+
Layout:
 +
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/
 +
** 454/ (all 454 runs)
 +
*** $YEAR/
 +
**** $YEAR/$YEAR-$MONTH/
 +
*** incoming/
 +
** illumina/ (all illumina runs)
 +
*** $YEAR/
 +
**** $YEAR/$RUN_NAME/
 +
*** Instruments/
 +
*** HWUSI-EAS610/
 +
*** incoming/
 +
**** illumina-4 -> /nfs/s2.persephone.bx.psu.edu/md1k-4-data/illumina/
 +
**** illumina-5 -> /nfs/s2.persephone.bx.psu.edu/md1k-5-data/illumina/
 +
**** illumina-6 -> /nfs/s2.persephone.bx.psu.edu/md1k-6-data/illumina/
 +
** scripts/ (Various shell scripts, qsub scripts, etc)
 +
*** 454/
 +
** support/
 +
*** 454/
 +
*** illumina/
  
== support ==
+
== support/ ==
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/
 
* /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/
** 454/
 
** illumina/
 
  
 
The sequencing support directory contains manuals, software, and reference genomes used during sequencing and sequence processing.
 
The sequencing support directory contains manuals, software, and reference genomes used during sequencing and sequence processing.
  
= Projects Directory =  
+
= projects/ =  
 
* /afs/bx.psu.edu/depot/data/schuster_lab/projects
 
* /afs/bx.psu.edu/depot/data/schuster_lab/projects
  
 
The projects directory contains directories for project specific data including the Tasmanian Devil, KB1, and Woolly Mammoth projects.
 
The projects directory contains directories for project specific data including the Tasmanian Devil, KB1, and Woolly Mammoth projects.
  
= Users Directory =
+
= users/ =
 
* /afs/bx.psu.edu/depot/data/schuster_lab/users
 
* /afs/bx.psu.edu/depot/data/schuster_lab/users
  
 
The users directory contains directories for people within the Schuster Lab.
 
The users directory contains directories for people within the Schuster Lab.

Latest revision as of 17:47, 4 October 2010

Schuster Lab Root Directory

  • /afs/bx.psu.edu/depot/data/schuster_lab

The Schuster Lab's data can be found in the /afs/bx.psu.edu/depot/data/schuster_lab directory.

The volume mounted at this location is RO-replicated. The main purpose of this root data.schuster_lab volume is to contain mounpoints to other volumes, as well as the symlink farm under sequencing/

scratch/

Individual global scratch volumes. On request only (admin-at-bx.psu.edu).

sequencing/

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing

Access to all of the Schuster Lab's sequencing data is through the sequencing directory. This directory contains archived run directories as well as directories for runs that are currently in progress.

Most of the top-level run directories under sequencing/ are symlinks managed by the new symlink management system: http://github.com/phalenor/ssdfs

SSDFS is not currently capable of delegating commands to normal users, and requires some finesse when executing certain commands due to NFS permissions. Email admin-at-bx.psu.edu if something under sequencing/ doesn't look right.

In short, datasets are organized into "volumes" distributed between 2 servers and multiple NFS filesystems. 454 datasets are organized by one volume per month, and Illumina datasets are one volume per dataset. SSDFS fully abstracts the actual location of the volume (server/filesystem). If you need direct access to a volume, it will be under /afs/bx.psu.edu/depot/data/schuster_lab/.ssdfs/vol/by-name/<name>

Layout:

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/
    • 454/ (all 454 runs)
      • $YEAR/
        • $YEAR/$YEAR-$MONTH/
      • incoming/
    • illumina/ (all illumina runs)
      • $YEAR/
        • $YEAR/$RUN_NAME/
      • Instruments/
      • HWUSI-EAS610/
      • incoming/
        • illumina-4 -> /nfs/s2.persephone.bx.psu.edu/md1k-4-data/illumina/
        • illumina-5 -> /nfs/s2.persephone.bx.psu.edu/md1k-5-data/illumina/
        • illumina-6 -> /nfs/s2.persephone.bx.psu.edu/md1k-6-data/illumina/
    • scripts/ (Various shell scripts, qsub scripts, etc)
      • 454/
    • support/
      • 454/
      • illumina/

support/

  • /afs/bx.psu.edu/depot/data/schuster_lab/sequencing/support/

The sequencing support directory contains manuals, software, and reference genomes used during sequencing and sequence processing.

projects/

  • /afs/bx.psu.edu/depot/data/schuster_lab/projects

The projects directory contains directories for project specific data including the Tasmanian Devil, KB1, and Woolly Mammoth projects.

users/

  • /afs/bx.psu.edu/depot/data/schuster_lab/users

The users directory contains directories for people within the Schuster Lab.