Computational Genomics Kubernetes Installation: Difference between revisions

From UCSC Genomics Institute Computing Infrastructure Information

No edit summary
Line 63: Line 63:
         resources:
         resources:
           requests:
           requests:
             cpu: "30"
             cpu: "1"
             memory: "30G"
             memory: "2G"
            ephemeral-storage: "2G"
           limits:
           limits:
             cpu: "31"
             cpu: "2"
             memory: "40G"
             memory: "3G"
            ephemeral-storage: "3G"
         command: ["/bin/bash", "-c"]
         command: ["/bin/bash", "-c"]
         args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
         args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']

Revision as of 22:34, 4 September 2019

The Computational Genomics Group has a Kubernetes Cluster running on several large instances in AWS. The current cluster makeup includes two worker nodes, each with the following specs:

* 96 CPU cores (3.1 GHz)
* 384 GB RAM
* 3.3 TB Local NVMe Flash Storage
* 25 Gb/s Network Interface 

Getting Authorized to Connect

If you require access to this kubernetes cluster, contact Benedict Paten asking for permission to use it, then pass on that permission via email to:

cluster-admin@soe.ucsc.edu

Let us know which group you are with and we can authorize you to use the cluster in the correct namespace.

Authenticating to Kubernetes

We will authorize (authz) you to use the cluster on the server side, but you will also need to authenticate (authn) using your '@ucsc.edu' email address and a unique Java Web token. These credentials are installed in ~/.kube/config in whatever machine you are coming from to get to the cluster.

To authenticate and get your base kubernetes configuration, go to this URL (below), which will ask you to authenticate to Google. Use your '@ucsc.edu' email address as the login. It will then ask you to authenticate via CruzID Gold if your web browser doesn't already have the authentication token cached:

https://cg-kube-auth.gi.ucsc.edu

Once you authenticate (via username/password and 2-factor auth for CruzID Gold)), it will pass you back to the 'https://cg-kube-auth.gi.ucsc.edu' website and it should confirm authentication on the top with a message saying "Successfully Authenticated". If you see any errors in red, but are sure you typed in your password and 2-factor auth correctly, click on the above link again (https://cg-kube-auth.gi.ucsc.edu) and authenticate a second time, which should work. There is a quirk where the web token doesn't always pass back to us correctly on the first try.

Upon success, you will be able to click the blue "Download Config File" button, which contains your initial kubernetes config file. Copy this file to your home directory as ~/.kube/config. Follow the directions on the web page to insert your "namespace:" line as directed. We will let you know which namespace to use.

Testing Connectivity

Once your ~/.kube/config file is set up correctly, you should be able to connect to the cluster. All our shared servers here at the Genomics Institute have the 'kubectl' command installed on them, but if you are coming from somewhere else, just make sure the "kubectl" utility is installed on that machine.

A quick test should go as follows:

$ kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
k1.kube       Ready    <none>   13h   v1.15.3
k2.kube       Ready    <none>   13h   v1.15.3
master.kube   Ready    master   13h   v1.15.3

Running Pods and Jobs

When running jobs and pods on kubernetes, you will always want to specify "requests" and "limits" on resources, otherwise you pods will get stuck with the default limits which are tiny (to protect against runaway pods).

Here is a good example of a job file that specifies limits:

job.yml

apiVersion: batch/v1
kind: Job
metadata:
  name: $USER-$TS
spec:
  backoffLimit: 0
  ttlSecondsAfterFinished: 30
  template:
    spec:
      containers:
      - name: magic
        image: robcurrie/ubuntu
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "1"
            memory: "2G"
            ephemeral-storage: "2G"
          limits:
            cpu: "2"
            memory: "3G"
            ephemeral-storage: "3G"
        command: ["/bin/bash", "-c"]
        args: ['for i in {1..100}; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
      restartPolicy: Never

View the Cluster's Current Activity

You can take a look at current resource consumption by taking a look at our Ganglia Cluster monitor tool:

https://ganglia.gi.ucsc.edu/

That website requires a username and password:

username: genecats
password: KiloKluster

That's mostly for keeping the scrip kiddies and bots from banging on it.

Once you get in, you should see a drop-down menu near the top left of the screen near "Genomics Institute Grid". From the drop-down menu, select "CG Kubernetes Cluster". It will take you to a page detailing the current resource usage and activity on the nodes. This can be useful for see if anyone else is using the whole cluster, or just to get an idea of how many resources are available for your batch of jobs to assign to the cluster.