Electrical and Computer Engineering Department

bar

Help request

ECE Main Page
CSG Main Page
Information
-ECE FAQ
-ECE Building Map
-ECE acccount information
Policies
-CSG Policies
-Email
-Processing
-Network Use
-Software Policy
Computer Labs
-228
-229
-232
-250 (IITL)
Stats
-ECE1 Load Stats
-Network Stats


ECE PROCESSING GUIDELINES

 

ECE PROCESSING GUIDELINES

By. Trevor Ellermann

 

Introduction:

 

Machines:

            ece2, ece3 and Draco

            Timosa and Tiveza

            shell

            Other

 

Processing:

            Number of jobs and caching

            Batch processing       

            Nice

            Matlab

 

Scratch space

            /tmp and /var/tmp

            /scratch

            Backups

 

Introduction:

            ECE has limited computing resources.  On the other hand, because of the nature of the Electrical and Computer Engineering disciplines, there is a great demand for computing resources within the department. This demand continues to grow and has caused noticeable misuse of the available computing resources.

In particular, because of the misuse, some individuals end up using a lot more of the computing resources than others. This can cause major problems and interferes with research, graduate, and undergraduate studies. However, I believe that a lot of this misuse is largely by accident and due to a lack of knowledge on how to use the resources fairly and efficiently. I have written these pages to help educate those using the ECE resources and to help eliminate the problems that ECE is experiencing.

If you have any questions about anything contained here, please do not hesitate to email me at trevor@ece.arizona.edu. I do not claim to be an expert on computing efficiency but I do have a fair amount of experience and feel that I have a good understanding of the issues involved. If you find any inaccuracies please be sure to let me know.

 

Machines:

            ece2 and ece3

 

            These machines are both Dual Processor machines and are intended to be the core application processing machines for the Department. All users are allowed to run up to 2 resource intensive jobs at a time on either of these machines for any amount of time. They occasionally have to go down for maintenance.  Consequently, if you will be running a very long job (over a week), you may want to check with help@ece.arizona.edu to make sure your job will not be killed by a scheduled downtime.

If you are running Matlab, please see the special note about it in the next section.

 

For more information and specs on these machines please visit:

 

http://ece.arizona.edu/csg/systems/ece2.shtml

http://ece.arizona.edu/csg/systems/ece3.shtml

 

            Timosa(208) and Tiveza(204)

 

            These 2 machines are also Dual Processor machines. However, they are a shared resource among all the SunRay terminals in the teaching labs that they service; and they are intended to handle only desktop sessions. Because these are shared desktop resources, users are not allowed to run more than 1 resource intensive job at a time. Also, jobs that are run on these machines may not be left running for a long period of time (more than a day). Please use ece2 and ece3 for long term processing.

If you have a question about whether or not it is OK to run your process on these machines, please do not hesitate to email help@ece.arizona.edu and ask. 

If running Matlab please see the special note about it in the next section.

 

For more information and specs on these machines please visit:

 

http://ece.arizona.edu/csg/systems/tiveza.shtml

http://ece.arizona.edu/csg/systems/timosa.shtml

 

            Shell (shellfish, shelltoe)

 

            Shell is a cluster of 2 single processor machines. These machines are intended for fast email, personal web updates, and shell access. No processor intensive jobs may be run on these machines for more then just a few moments. They have lower specs then ece2/ece3 but because of the low load kept on them frequently are faster for shell/email access. Processor intensive jobs including Matlab will be killed on these machines.

 

For more information and specs on these machines please visit:

 

http://ece.arizona.edu/csg/systems/shell.shtml

 

            Draco

 

            Draco is a cluster of 8 machines that has been built to teach cluster computing. While you can run 1 processor intensive job on them for any amount time, it is not recommended you do so because of the specs on the machines. Unless you are using MPI, it is recommended that you use either ece2/ece3 for computing or shell for email/shell access.

 

If running Matlab please see the special note about it in the next section.

 

For more information and specs on these machines please visit:

 

http://ece.arizona.edu/csg/systems/draco.shtml

 

            Other machines

 

            There are many other computers that exist in the Department, but NONE of them are open to general student use. All SOLARIS machines, other than those specifically mentioned above, are owned by a specific faculty member or associated research group or are administrative machines.

Using them for any reason without permission of the owner is strictly prohibited.

Using a machine that you do not have permission to use will result in account suspension and may even result in administrative discipline. Using other people’s computing resources is akin to stealing and will be dealt with severely. Your account may be disabled without warning if you are caught accessing any computing resource without authorization.

This is becoming a major problem. The Computer Systems Group and the Faculty are serious about stopping such misuses of the Departmental Computing Resources and Policies.

If you have any question about this issue, please contact help@ece.arizona.edu.


Efficient processing:     

 

            Caching and IO

 

            We have noticed a common trend in the Department of users running many jobs at once. This is a terrible waste of resources and is very inconsiderate of other users on the systems. Below, I will discuss briefly 2 primary reasons why this is a waste of computing resources.

            First, the efficiency booster of modern CPUs is the cache. The cache is a small amount of very fast memory kept close to the CPU that stores data commonly needed by a process.  A cache in an extreme circumstance may improve the efficiency of a CPU by over 100%. Modern systems do a very good job of keeping and predicting what should be stored in cache on a CPU. This caching is achieved partially on a per process basis. However, only one intensive process may run on a given processor at any given time. When the process switches, the cache is flushed and has to be reloaded for the new process. This is normal and necessary, but very slow.  One would want to minimize how often it happens. If you are running, for example, 10 CPU intensive jobs on one system at a time, cache reloading happens quite frequently. If you run a large enough number of jobs at a time on one processor, you basically negate the usefulness of the cache and, consequently, you can greatly reduce the efficiency of the processor. In fact, running many CPU intensive jobs at the same time on a single processor can, in extreme circumstances, take more than twice as long to run in comparison to running them all back-to-back one at a time. Running jobs back-to-back is commonly called batch processing and is discussed further below.

            Second, waiting for IO (Input or Output) is a major item that can greatly reduce the efficiency of a processor. IO goes hand-in-hand with caching. The time a CPU spends waiting for a bit of data to complete an IO step to internal (cache, RAM) or external (disk) memory is time not available for computing.  As the number of IO intensive processes on a system increases, the time the CPU spends waiting for IO increases and, hence, the slower the IO becomes. Again this means that running many jobs at once can be much slower then running them sequentially as a batch job.

            These 2 issues can greatly reduce the efficiency of a processor. These issues and simply sharing CPU time fairly with other users is why one is allowed only a limited number of processes on a system. To enforce this, users running more than the allowed number of processes will have the most recent processes killed. Repeat offenders may have their accounts suspended and will be referred to an administrative review.

 

            Batch processing

 

            Above I discussed why it is inefficient to run many jobs at the same time on any given system.  This does not mean that you cannot complete all the processing that you need to have done. Batch processing is a solution.

Batch processing is basically running jobs sequentially, i.e., one job after another, until all the jobs are finished. Batch processing can be accomplished simply by writing a shell script with each process on a separate line. A simple shell script might just look like:

 

#!/bin/sh

program datafile1

program datafile2

program datafile3

 

Simply write these lines to a file, batch.sh, and then give this file execute permission. An example of how to give this file execute permissions is simply

 

chmod o+x batch.sh

 

Once this is accomplished, simply run the script by typing

./batch.sh

 

followed by a carriage return.  This simple example of a shell script running a set of programs can be readily extended to more complex batch processing procedures including, for instance,  creating new directories, storing computed data to those directories, etc.

 

If you need help with writing batch jobs, please do not hesitate to email help@ece.arizona.edu

 

            Nice (command)

 

            There are two types of system priorities on a SOLARIS system. One is simply called the priority. For example, a process with a priority of 0 will have the highest priority on the system. The system decides the priority of a process based on a number of things. One is a user-defined priority called a nice.

A nice is exactly what it sounds like; it makes users play nice on a system. The lower the nice number of a job the higher the priority. A nice can be set from -20 to 19. All user jobs start with a nice of 0.  A user can raise the nice of a job, thus giving it a lower priority. Only administrators can lower the nice of a process, hence, raise its priority. Giving a process a higher nice does not mean that it will not be processed nor does it necessarily mean that it will take longer to run. Even if a process is niced to 19, it will be able to use 100% of the CPU if all of the CPU is available.

Students are required to run all CPU intensive jobs run on an ECE system at a nice of 11.

Higher priority is given to small shell/email processes to ensure that they run reasonably fast.  To set the nice of your job, you start the job with the nice command. For example, to run your job “process’ with nice set to +11, you would type

 

nice +11 process

 

If you already have a process running, you can change its nice by using the command renice. This is done as follows.

 

Renice 11 <pid>

 

where <pid> is the process ID of the job. If you need help finding the process ID of a job, please email help@ece.arizona.edu

 

Even if you don't start a job with a nice of 11, the CSG staff has the authority to renice it. People who repeatedly do not renice jobs after being asked to do so may have their accounts suspended and will be referred to an administrative review.

 

 

Matlab

 

            Because it is the most common processor intensive application used on most ECE systems, I want to clarify a few things about your responsibility using Matlab.

First, if you are going to use it for intensive processing, Matlab must be run at a nice of 11. Also, your Matlab process must follow the guidelines of maximum number of jobs per system. This means that no more than one at a time on timosa, tiveza or any one of the draco machines and no more than two at a time on ece2 and ece3.

Second, Matlab may run away when closed improperly. You MUST quit Matlab before logging out of a machine. If you do not, it runs away and consumes a lot of the resources on whatever machine it is running on.  As a result, the CSG ends up having to kill a lot of Matlab processes that it believes to be runaways. The CSG wastes their time guessing whether a MatLab process is a runaway or not.  The way the CSG can tell if a process is a runaway is if it does not have a controlling terminal. There are very few special circumstances where normal Matlab jobs do not have a controlling terminal.

If you need to run it without a controlling terminal for any reason, you must email help@ece.arizona.edu or the CSG has the authority to kill any MatLab job that it thinks is a runaway.

 

 

Scratch space:

 

            /tmp and /var/tmp

 

            These areas on a given system are intended for very short-term use and only for small files. The directory /tmp is used for swap space.   It uses up resources and generally is cleared on reboot. The directory /tmp should only be used for very short-term use, such as a program writing out to a temp file for the duration it is running.  Any program that writes to /tmp should then clean up the file when the program exits. The directory /var/tmp is on the root file system. It gets deleted whenever space is needed on the system.  Both of these file systems are not intended for long-term scratch space. There really is not a good reason for students to use these directories for long periods of time (more then a day).

 

The CSG has the authority to purge these directories without notice as necessary.

 

            /scratch

 

            The directory /scratch is a special space only available on ece3. It is a disk dedicated to scratch space for users. There is no space limitation on, but we expect users to be reasonable. The CSG does browse through the files on it to make sure it is being used appropriately.  This directory is NOT BACKED UP!  Some guidelines for using /scratch include the following.

All data must be kept in a directory with the same name as your username. This directory must have 700 level permissions. Thus, if you want to use /scratch, you would type the following:

 

mkdir /scratch/myusername

chmod 700 /scratch/myusername

 

            This space is NOT for long term storage. It should only be used for ECE projects and should be promptly deleted after the project is completed. Inappropriate use will result in the immediate loss of access to /scratch. If the CSG notices that there are problems developing with continued inappropriate usage of /scratch, the CSG has the authority to implement quotas for the offenders.

 

            Backups

 

            Unless otherwise communicated, a user’s home directory and mail spool are the only areas on a system that are officially backed-up. NOTE: no data outside of your home directory or email is backed-up. This means that all scratch spaces are not backed up. These backups are intended to protect us from catastrophic events that may cause a major loss of data. They are NOT intended to protect users from accidentally deleting files or emails. Deleted files or emails may not be restored. Please keep this in mind when deleting files or emails.

 

            If you have questions or comments about anything contained in this document, please do not hesitate to email help@ece.arizona.edu and let us know.

 

bar
 
Phone:(520)621-2434     Fax:(520)621-8076
Electrical and Computer Engineering Department, The University of Arizona
P.O. Box 210104     1230 E. Speedway     Tucson, AZ 85721-0104
Advisors/Application Information     Computer Systems Group  
 

University of Arizona