Farm April 2023 - 22.04 Upgrade Notes

The operating system that powers the Farm cluster has been upgraded to Ubuntu 22.04. There are a number of changes that users should be aware of. 

1. SSH to nodes is no longer permitted. You can get an interactive shell on a node in two different ways:

  • To get a new job with a shell:
    • srun --partition=PartitionName --time=5:00:00 --ntasks=1 --cpus-per-task=4 --mem=1G --pty /bin/bash -l
  • To get a shell within an existing job:
    • srun --jobid=your-running-job_ID_here --pty /bin/bash -l

2. Previous modules have an added prefix of "deprecated/" and may or may not work correctly. The modules that have seen the most usage have been reinstalled with a new system, so look for those first with module avail -l . If you need software installed, please email the Farm Help Desk at  farm-hpc@ucdavis.edu. Please keep in mind that the queue will be longer than usual as we work towards smoothing out issues as a result of the upgrade.

3. For job submission using Slurm, modules are now purged on nodes before the sbatch file is loaded. All required modules must be loaded in sbatch files or manually re-loaded in srun. 

4. Memory limits for jobs will now be enforced. For the past year, a bug in Slurm was preventing it from killing jobs when they used more than their requested memory. If your job exits with a JobState of OUT_OF_MEMORY or a Reason of OutOfMemory, or you see oom-kill event(s) in StepId=NNNNNNNN.batch, some of your processes may have been killed by the cgroup out-of-memory handler. The fix is to request more RAM with the --mem= argument and re-submit your job.

5. The use of the /scratch/ partition on the login node has been deprecated. If you need to store data in /scratch/ then you should be running that activity on compute nodes, all of which have /scratch/ partitions. 

6. New accounts are now created via the High Performance Personnel Onboarding (HiPPO) web portal at: https://hippo.ucdavis.edu/Farm/. HiPPO does not currently include accounts before 4/14/23. If your sponsor is not listed, please open a support ticket with the Farm Help Desk  at farm-hpc@ucdavis.edu requesting your sponsor be added. Please include their name, email address, and the cluster you are trying to access.

7. Slurm jobs were reset as a result of the upgrade and will need to be re-submitted.

8. As part of the transition to the HPC Core Facility, farm.cse.ucdavis.edu has been rebranded to farm.hpc.ucdavis.edu. The old DNS entry will remain usable during the transition time.

9. The Farm head/login node now thinks of itself as farm.farm.hpc.ucdavis.edu. This is intentional. 

10. If you do not see your home directory or group/PI directory, the most likely cause is a mis-conversion by our automation system. Please open a ticket. 

 

Please let us know if you have questions by emailing farm-hpc@ucdavis.edu.