Go to the U of M home page
School of Physics & Astronomy
School of Physics and Astronomy Wiki

User Tools


computing:department:unix:file_storage

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
computing:department:unix:file_storage [2013/05/09 22:09] allancomputing:department:unix:file_storage [2016/02/02 13:52] (current) – [sshfs] allan
Line 1: Line 1:
-<box 20% right right-clear red|More Information> +
-{{indexmenu>:computing:department:unix}} +
-</box>+
 ====== Data storage on Unix ====== ====== Data storage on Unix ======
  
-===== Home directory storage and backups =====+===== Home directories =====
  
 Your home directory on the Unix cluster has a usage quota, to avoid too much space being taken by individual users. The standard quota is normally somewhere around 10GB. You can see your current usage and quota by visiting [[https://www.physics.umn.edu/resources/myphys/|MyPhys]]. Your home directory on the Unix cluster has a usage quota, to avoid too much space being taken by individual users. The standard quota is normally somewhere around 10GB. You can see your current usage and quota by visiting [[https://www.physics.umn.edu/resources/myphys/|MyPhys]].
Line 10: Line 8:
 If you run into your quota, and after reviewing your file usage find that you still need more space, you can reply to the warning email to request a change from us. Please understand, however, that the amount of space available is limited. Your home directory is not intended for large research data sets, for which separate project-specific storage should be used. If you run into your quota, and after reviewing your file usage find that you still need more space, you can reply to the warning email to request a change from us. Please understand, however, that the amount of space available is limited. Your home directory is not intended for large research data sets, for which separate project-specific storage should be used.
  
-Your physics email is stored on a separate system, and is not part of the home directory quota.+**Home directories should not be used for data-intensive computations, such as input or output for condor batch jobs.**
  
-The home directories are backed up nightly. You can retrieve recently-deleted files from [[:computing:department:unix:backup|backup]] yourself.+The home directories are backed up nightly, and in addition filesystem "snapshots" are created every few hours (currently at 09:00, 12:00, 15:00, 18:00).
  
-===== Project Data directories =====+===== Shared project areas =====
  
-Other file systems are provided for research or project-specific data, under the **/data** hierarchy. This storage space is **purchased by the research group**. It can take the form of simple single drives in linux workstations, part of the shared research RAID pool, or dedicated RAID systems for large-scale storage needs.+If you need a shared area for a group or project where multiple people can develop or store codewe can create such an area and back it up for you. These are made available under the ''/local'' hierarchy. These are intended for relatively small-scale usage, such as shared program areas - large data sets should be stored in the research data areas.
  
-These file systems are usually named either after the research group group, or with the name of the computer which hosts it, and contain further directories organized by user or by project. These areas should be used for large data sets and storage for local processes. Note that these areas are provided by the //automounter// - they are not activated until they are first accessed, so they won't necessarily appear in the output of commands like ''df''.+===== Research data storage =====
  
-**Warning** these areas are generally **not backed up**, other than by special requestIf you have a large amount of data which requires backupyou should talk to us about the available options.+Other file systems are provided for research or project-specific data, under the **/data** hierarchy. This storage space is **purchased by the research group**. It can take the form of simple single drives in linux workstations (though this is discouraged)part of a shared research storage pool, or dedicated systems for large-scale storage needs.
  
-===== "Scratch" storage =====+These file systems are usually named either after the research group group (for a fileserver volume), or with the name of the workstation which hosts it, and contain further directories organized by user or by project. These areas should be used for large data sets and storage for local processes. Note that these areas are provided by the //automounter// - they are not activated until they are first accessed, so they won't necessarily appear in the output of commands like ''df''.
  
-You can find shared temporary filesystems under **/scratch**, which may be used for scratch space for local processes. Don't place any files you may want to keep long-term here - files which have not been accessed for 30 days or more may be purged from this area, or when a workstation is updated.+===== Backups =====
  
-Your scratch directory on the local system is always named **/scratch/local///username//**, and can be accessed using the environment variable ''SCRATCHDIR''. You can access non-local scratch directories using the path **/scratch///hostname/////username//**.+<note warning> 
 +  Unix **home directories** are backed up daily. 
 +    Older files may need to be recovered from snapshots or from tape. Complete tape backups are currently made monthly and retained for 6 months.
  
-If running jobs under Condor, the environment variable ''CONDOR_SCRATCH_DIR'' gives the name of a directory where the job may place temporary data files.+  * **Research data areas are not backed up** 
 +    * Data areas on our ZFS storage have nightly //snapshots// which are kept for 2 days - these can help you recover from accidentally deleted files. 
 +    * Linux RAID storage does not have snapshots. 
 +    * If you have critical research data which requires backup, this can be arranged for moderate data sizes (eg < 10TB). Please talk to us about the options. 
 +    * Local data drives in workstations are **never** backed up. 
 +</note> 
 + 
 +===== Local "scratch" storage ===== 
 + 
 +"Scratch" space is space that is not backed up, and generally only used for temporary storage. 
 + 
 +Your scratch directory on the local system is always named **/scratch/local///username//**, and can be accessed using the environment variable ''$SCRATCHDIR''. Since it's directly connected to the computer you're using, access to data on it is generally faster than to network storage (such as your home directory). This can make it a good choice for processing bulky data. However, don't place any files you may want to keep long-term here - files which have not been accessed for 30 days or more may be purged from this area, or when a workstation is updated. 
 + 
 +<note>When running jobs under Condor, the environment variable ''CONDOR_SCRATCH_DIR'' gives the name of a directory where the job may place temporary data files.</note>
  
 ===== Temporary directories ===== ===== Temporary directories =====
Line 36: Line 49:
  
  
-===== Remote access to file systems using sshfs ===== +===== Remote access to data ===== 
-New with scientific linux 5.x you can mount any filesystem you have ssh access to as a filesystem using fuse (the userspace filesystem driver)You can use this to access files from other systems outside of Tate Lab.+ 
 +The security model of Unix NFS is too weak to permit exporting it to any systems which are not part of the Physics clusterSome alternative methods of transferring data include... 
 + 
 +==== globus ==== 
 + 
 +Please see [[:computing:department:data:moving:globus]] 
 + 
 +==== sshfs ==== 
 + 
 +You can use //fuse// (the userspace filesystem driver) to mount any remote filesystem which you have ssh access to. Although this is not a high-performance solution, you can use this to access files from non-Physics linux systems. For example:
  
   mkdir ~/mnt  #create a place to put it, can be called anything you want.   mkdir ~/mnt  #create a place to put it, can be called anything you want.
computing/department/unix/file_storage.1368155360.txt.gz · Last modified: 2013/05/09 22:09 by allan