- Know the resources needed for your analysis
In order to share the load of file servers, we would like to know who has simulations that does a lot of disk reads and writes. Reading and writing of a lot of small files reduce performance and have an impact on the proper functioning of the cluster. We have seen cases where the clusters were not accessible because of excessive disk reads and writes. Therefore we have stripped file serving responsibility of Master node and divide this responsibility to dedicated file servers.
If a simulation has many disk reads and writes, one can solve this problem by using the local scratch storage on the compute nodes. We can discuss some solutions to these kind of problems.
- Try to estimate how long your analysis will run
This is important especially for submitting jobs to shared resources and to the queue. We want everybody to have equal shots while running programs on shared resources. Since the long jobs tie these resources to that job, those resources will not be available to other users during that time. Hence we do highly suggest not to run jobs longer than 10 days on shared queues and resources. One should use group owned resources and queues for longer jobs. If we see systematic abuse of common resources, we will do something about it.
- Number of job submitted to main queue
We have setup the system for fair share of the resources respecting the number of hardware owned by groups. However nothing replaces the common sense of a human being. Please use your common sense when you submit job namely consider number of jobs you submit, length of jobs and resources they require. If you are not sure about the resources required by your simulations do testings. Halting the operation of the cluster due to ignorance or knowingly will not be tolerated. If you are not sure, please ask before doing something wrong.
monitoring number of job submitted to main queue: Use following command qstat -u [your unetname] | wc -l
- Number of job submitted to privileged queues
There is no special privileges among a research group users, they are all equal. Therefore we left fair share of resources in a group to that group. For example one research may require more computer resources then the other project at some point in time and they may reverse the roles later. Hence this is a dynamic process and it is better to leave these decisions to groups. We will not participate sharing resources among group members, this is internal business of a group. The groups can appoint a manager for their own queue and this person can be in charge of their queue. Under special circumstances we may act under professors request to suspend or kill jobs belonging to a group user running on special queues.
- Amount of data
Each group user will have an initial storage allocation of the order of 30GB for their home folders. If one needs more than the above amount we can discuss the details and find a solution. The home folder physically lives on the file servers. However, remember that we do not provide any data backup solution. The allocated storage space has to be used for data production, and analysis. Not for permanent storage of important data. We highly recommend backing and storing up your data to some place safe.
monitoring disk usage: use the command quota -s