Firewalled Environment Storage Overview: Difference between revisions

From UCSC Genomics Institute Computing Infrastructure Information

 
(9 intermediate revisions by the same user not shown)
Line 17: Line 17:
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| style="font-weight:bold; text-align:left;" | Default Hard Quota
| 31 GB
| 31 GB
| 16 TB
| 15 TB
|-
|-
| style="font-weight:bold; text-align:left;" | Total Capacity
| style="font-weight:bold; text-align:left;" | Total Capacity
| 19 TB
| 19 TB
| 500 TB
| 800 TB
|- style="text-align:left;"
|- style="text-align:left;"
| style="font-weight:bold;" | Access Speed
| style="font-weight:bold;" | Access Speed
| Moderate (Spinning Disk)
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
| Very Fast (NVMe Flash Media)
|- style="text-align:left;"
|- style="text-align:left;"
Line 38: Line 38:
'''Groups Directories (/private/groups/groupname)'''
'''Groups Directories (/private/groups/groupname)'''


The group storage directories are created per PI, and each group directory has a default 15TB soft quota and 16TB hard quota.  For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab.  Request access to that group directory and you will then be able to write to it.  Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.
The group storage directories are created per PI, and each group directory has a default 15TB hard quota.  For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab.  Request access to that group directory and you will then be able to write to it.  Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.


On the compute servers you can check your group's current quota usage by using the '/usr/bin/viewquota' command.  You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name).  If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:
On the compute servers you can check your group's current quota usage by using the 'getfattr' command.  You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name).  If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:


  $ viewquota hausslerlab
  $ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab
   
   
  Project quota on /export (/dev/mapper/export)
  getfattr: Removing leading '/' from absolute path names
  Project ID  Used  Soft  Hard Warn/Grace 
# file: private/groups/hausslerlab
  ---------- ---------------------------------
ceph.dir.rbytes="6522955553147"
  hausslerlab  1.8T    15T    16T 00 [------]
 
That number is in bytes.  So divide by 1,000,000,000,000 and you get '6.522 TB'.  That is how much data is currently being used.
 
To check the max quota limit, use this command:
 
$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab
getfattr: Removing leading '/' from absolute path names
  # file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"
 
And 15000000000000 divided by 1,000,000,000,000 is 15 TB.
 
== Storage Quota Alerting ==
 
If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email '''cluster-admin@soe.ucsc.edu''' with the following information:
 
  1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
  3: What email addresses you want on the alert list


After setup, our alerting system will alert folks on that email list ''every 4 hours'' until the quota in question is reduced to an amount under the alerting % threshold you asked for.  So it is a bit noisy, but will force folks to delete data in order to stop the alerts.  When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.


'''Soft Versus Hard Quotas'''
== /data/scratch Space on the Servers ==


We use soft and hard quotas for disk space.
Each server will generally have a local /data/scratch filesystem that you can use to store temporary files.  '''BE ADVISED''' that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else.  Do not store important data there.  If it is important, it should be moved somewhere else very soon after creation.


Once you exceed a directory's soft quota, a one-week countdown timer starts. When that timer runs out, you will no longer be able to create new files or write more data in that directory.  You can reset the countdown timer by dropping down to under the soft quota limit.
== Backups ==


You will not be permitted to exceed a directory's hard quota at all. Any attempt to try will produce an error; the precise error will depend on how your software responds to running out of disk space.
/private/groups is backed up weekly on Friday nights (which usually takes several days to complete). Please note that the following directories in the tree '''WILL NOT''' be backed up:


When quotas are first applied to a directory, or are reduced, it is possible to end up with more data or files in the directory than the quota allows forThis outcome does not trigger deletion of any existing data, but will prevent creation of new data or files.
tmp/
temp/
TMP/
TEMP/
cache/
.cache/
scratch/
  *.tmp/


== /scratch Space on the Servers ==
So if you have data that you know isn't important and should be excluded from the backups, put them in a directory suffixed with ".tmp".  Such as this example:


Each server will generally have a local /scratch filesystem that you can use to store temporary files. '''BE ADVISED''' that /scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else.  Do not store important data there.  If it is important, it should be moved somewhere else very soon after creation.
/private/groups/clusteradmin/mybams.tmp/

Latest revision as of 18:26, 14 May 2024

Storage

Our servers mount two types of shared storage; home directories and group storage directories. These home directories will mount over the network to all shared compute servers and the phoenix cluster, so any server you login to will have these filesystems available:

Filesystem Specifications

Filesystem
/private/home /private/groups
Default Soft Quota 30 GB 15 TB
Default Hard Quota 31 GB 15 TB
Total Capacity 19 TB 800 TB
Access Speed Very Fast (NVMe Flash Media) Very Fast (NVMe Flash Media)
Intended Use This space should be used for login scripts, small bits of code or software repos, etc. No large data should be stored here. This space should be used for large computational/shared data, large software installations and the like.

Home Directories (/private/home/username)

Your home directory will be located as "/private/home/username" and has a 30GB soft quota and a 31GB hard quota. Your home directory is meant for small scripts and login data, or a git repo. Please do not try to store large data there or computer on large jobs using data in your home directory.

Groups Directories (/private/groups/groupname)

The group storage directories are created per PI, and each group directory has a default 15TB hard quota. For example, if David Haussler is the PI that you report to directly, then the directory would exist as /private/groups/hausslerlab. Request access to that group directory and you will then be able to write to it. Each of those group directories are shared by the lab it belongs to, so you must be wary of everyone's data usage and share the 15TB available per group accordingly.

On the compute servers you can check your group's current quota usage by using the 'getfattr' command. You can only check the quota of a group you are part of (you would be a member of the UNIX group of the same name). If you wanted to check the quota usage of /private/groups/hausslerlab for example, you would do:

$ getfattr -n ceph.dir.rbytes /private/groups/hausslerlab

getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.dir.rbytes="6522955553147"

That number is in bytes. So divide by 1,000,000,000,000 and you get '6.522 TB'. That is how much data is currently being used.

To check the max quota limit, use this command:

$ getfattr -n ceph.quota.max_bytes /private/groups/hausslerlab

getfattr: Removing leading '/' from absolute path names
# file: private/groups/hausslerlab
ceph.quota.max_bytes="15000000000000"

And 15000000000000 divided by 1,000,000,000,000 is 15 TB.

Storage Quota Alerting

If you and/or folks in your lab would like an automated alert when the /private/groups/labname quota is getting to a certain percentage of fullness, we can set that up for you and others in your lab. Just email cluster-admin@soe.ucsc.edu with the following information:

1: Which directory you would like to watch quotas on (i.e. /private/groups/somelab)
2: What % full you would like an email alert at
3: What email addresses you want on the alert list

After setup, our alerting system will alert folks on that email list every 4 hours until the quota in question is reduced to an amount under the alerting % threshold you asked for. So it is a bit noisy, but will force folks to delete data in order to stop the alerts. When the system notices that the quota usage has decreased to under the alert threshold, you will receive one final email with an "OK" notification that things are OK now.

/data/scratch Space on the Servers

Each server will generally have a local /data/scratch filesystem that you can use to store temporary files. BE ADVISED that /data/scratch is not backed up, and the data there could disappear in the event of a disk failure or anything else. Do not store important data there. If it is important, it should be moved somewhere else very soon after creation.

Backups

/private/groups is backed up weekly on Friday nights (which usually takes several days to complete). Please note that the following directories in the tree WILL NOT be backed up:

tmp/
temp/
TMP/
TEMP/
cache/
.cache/
scratch/
*.tmp/

So if you have data that you know isn't important and should be excluded from the backups, put them in a directory suffixed with ".tmp". Such as this example:

/private/groups/clusteradmin/mybams.tmp/