![]() |
|
ECE PROCESSING GUIDELINES By. Trevor Ellermann ece2,
ece3 and Draco Timosa
and Tiveza shell Other Number
of jobs and caching Batch
processing Nice Matlab /tmp
and /var/tmp /scratch
Backups ECE has limited computing resources. On the other hand, because of the nature of the Electrical and Computer Engineering disciplines, there is a great demand for computing resources within the department. This demand continues to grow and has caused noticeable misuse of the available computing resources. In
particular, because of the misuse, some individuals end up using a lot more of
the computing resources than others. This can cause major problems and
interferes with research, graduate, and undergraduate studies. However, I
believe that a lot of this misuse is largely by accident and due to a lack of
knowledge on how to use the resources fairly and efficiently. I have written
these pages to help educate those using the ECE resources and to help eliminate
the problems that ECE is experiencing. If
you have any questions about anything contained here, please do not hesitate to
email me at trevor@ece.arizona.edu. I do not claim to be an expert on computing
efficiency but I do have a fair amount of experience and feel that I have a
good understanding of the issues involved. If you find any inaccuracies please
be sure to let me know. ece2
and ece3 These
machines are both Dual Processor machines and are intended to be the core application
processing machines for the Department. All users are allowed to run up to 2
resource intensive jobs at a time on either of these machines for any amount of
time. They occasionally have to go down for maintenance. Consequently, if you will be running a very
long job (over a week), you may want to check with help@ece.arizona.edu to make
sure your job will not be killed by a scheduled downtime. If you
are running Matlab, please see the special note about it in the next section. For more
information and specs on these machines please visit: http://ece.arizona.edu/csg/systems/ece2.shtml http://ece.arizona.edu/csg/systems/ece3.shtml Timosa(208)
and Tiveza(204) These 2 machines are also Dual
Processor machines. However, they are a shared resource among all the SunRay
terminals in the teaching labs that they service; and they are intended to
handle only desktop sessions. Because these are shared desktop resources, users
are not allowed to run more than 1 resource intensive job at a time. Also, jobs
that are run on these machines may not be left running for a long period of
time (more than a day). Please use ece2 and ece3 for long term processing.
If
you have a question about whether or not it is OK to run your process on these
machines, please do not hesitate to email help@ece.arizona.edu and ask. If
running Matlab please see the special note about it in the next section. For
more information and specs on these machines please visit: http://ece.arizona.edu/csg/systems/tiveza.shtml http://ece.arizona.edu/csg/systems/timosa.shtml Shell
(shellfish, shelltoe) Shell is a cluster of 2 single processor machines. These machines are intended for fast email, personal web updates, and shell access. No processor intensive jobs may be run on these machines for more then just a few moments. They have lower specs then ece2/ece3 but because of the low load kept on them frequently are faster for shell/email access. Processor intensive jobs including Matlab will be killed on these machines. For more
information and specs on these machines please visit: http://ece.arizona.edu/csg/systems/shell.shtml Draco Draco is a cluster of 8 machines
that has been built to teach cluster computing. While you can run 1 processor
intensive job on them for any amount time, it is not recommended you do so
because of the specs on the machines. Unless you are using MPI, it is
recommended that you use either ece2/ece3 for computing or shell for
email/shell access. If
running Matlab please see the special note about it in the next section. For
more information and specs on these machines please visit: http://ece.arizona.edu/csg/systems/draco.shtml Other
machines There are many other computers that
exist in the Department, but NONE of them are open to general student use. All
SOLARIS machines, other than those specifically mentioned above, are owned by a
specific faculty member or associated research group or are administrative
machines. Using
them for any reason without permission of the owner is strictly prohibited. Using a machine that you do not have permission to use will result in account suspension and may even result in administrative discipline. Using other people’s computing resources is akin to stealing and will be dealt with severely. Your account may be disabled without warning if you are caught accessing any computing resource without authorization. This is becoming a major problem. The Computer Systems Group and the Faculty are serious about stopping such misuses of the Departmental Computing Resources and Policies. If you have any question about this issue, please contact help@ece.arizona.edu. Caching
and IO We have noticed a common trend in
the Department of users running many jobs at once. This is a terrible waste of
resources and is very inconsiderate of other users on the systems. Below, I
will discuss briefly 2 primary reasons why this is a waste of computing
resources. First, the efficiency booster of
modern CPUs is the cache. The cache is a small amount of very fast memory kept
close to the CPU that stores data commonly needed by a process. A cache in an extreme circumstance may
improve the efficiency of a CPU by over 100%. Modern systems do a very good job
of keeping and predicting what should be stored in cache on a CPU. This caching
is achieved partially on a per process basis. However, only one intensive
process may run on a given processor at any given time. When the process
switches, the cache is flushed and has to be reloaded for the new process. This
is normal and necessary, but very slow.
One would want to minimize how often it happens. If you are running, for
example, 10 CPU intensive jobs on one system at a time, cache reloading happens
quite frequently. If you run a large enough number of jobs at a time on one
processor, you basically negate the usefulness of the cache and, consequently,
you can greatly reduce the efficiency of the processor. In fact, running many
CPU intensive jobs at the same time on a single processor can, in extreme
circumstances, take more than twice as long to run in comparison to running
them all back-to-back one at a time. Running jobs back-to-back is commonly
called batch processing and is discussed further below. Second, waiting for IO (Input or
Output) is a major item that can greatly reduce the efficiency of a processor.
IO goes hand-in-hand with caching. The time a CPU spends waiting for a bit of
data to complete an IO step to internal (cache, RAM) or external (disk) memory
is time not available for computing. As
the number of IO intensive processes on a system increases, the time the CPU
spends waiting for IO increases and, hence, the slower the IO becomes. Again
this means that running many jobs at once can be much slower then running them
sequentially as a batch job. These 2 issues can greatly reduce
the efficiency of a processor. These issues and simply sharing CPU time fairly
with other users is why one is allowed only a limited number of processes on a
system. To enforce this, users running more than the allowed number of processes
will have the most recent processes killed. Repeat offenders may have
their accounts suspended and will be referred to an administrative review. Batch
processing Above I discussed why it is
inefficient to run many jobs at the same time on any given system. This does not mean that you cannot complete
all the processing that you need to have done. Batch processing is a solution. Batch
processing is basically running jobs sequentially, i.e., one job after another,
until all the jobs are finished. Batch processing can be accomplished simply by
writing a shell script with each process on a separate line. A simple shell
script might just look like: #!/bin/sh program
datafile1 program
datafile2 program
datafile3 Simply
write these lines to a file, batch.sh, and then give this file execute
permission. An example of how to give this file execute permissions is simply chmod
o+x batch.sh Once this
is accomplished, simply run the script by typing ./batch.sh
followed by a carriage return. This simple example of a shell script running a set of programs can be readily extended to more complex batch processing procedures including, for instance, creating new directories, storing computed data to those directories, etc. If you
need help with writing batch jobs, please do not hesitate to email
help@ece.arizona.edu Nice
(command) There are two types of system priorities on a SOLARIS system. One is simply called the priority. For example, a process with a priority of 0 will have the highest priority on the system. The system decides the priority of a process based on a number of things. One is a user-defined priority called a nice. A nice is exactly what it sounds like; it makes users play nice on a system. The lower the nice number of a job the higher the priority. A nice can be set from -20 to 19. All user jobs start with a nice of 0. A user can raise the nice of a job, thus giving it a lower priority. Only administrators can lower the nice of a process, hence, raise its priority. Giving a process a higher nice does not mean that it will not be processed nor does it necessarily mean that it will take longer to run. Even if a process is niced to 19, it will be able to use 100% of the CPU if all of the CPU is available. Students
are required to run all CPU intensive jobs run on an ECE system at a nice of
11. Higher
priority is given to small shell/email processes to ensure that they run
reasonably fast. To set the nice of
your job, you start the job with the nice command. For example, to run your job
“process’ with nice set to +11, you would type nice +11
process If you already have a process
running, you can change its nice by using the command renice. This is done as
follows. Renice 11
<pid> where <pid> is the process ID
of the job. If you need help finding the process ID of a job, please email
help@ece.arizona.edu Even if you don't start a job with a
nice of 11, the CSG staff has the authority to renice it. People who repeatedly
do not renice jobs after being asked to do so may have their accounts suspended
and will be referred to an administrative review. Matlab Because it is the most common
processor intensive application used on most ECE systems, I want to clarify a
few things about your responsibility using Matlab. First,
if you are going to use it for intensive processing, Matlab must be run at a
nice of 11. Also, your Matlab process must follow the guidelines of maximum
number of jobs per system. This means that no more than one at a time on
timosa, tiveza or any one of the draco machines and no more than two at a time
on ece2 and ece3. Second,
Matlab may run away when closed improperly. You MUST quit Matlab before logging
out of a machine. If you do not, it runs away and consumes a lot of the
resources on whatever machine it is running on. As a result, the CSG ends up having to kill a lot of Matlab
processes that it believes to be runaways. The CSG wastes their time guessing
whether a MatLab process is a runaway or not.
The way the CSG can tell if a process is a runaway is if it does not
have a controlling terminal. There are very few special circumstances where
normal Matlab jobs do not have a controlling terminal. If
you need to run it without a controlling terminal for any reason, you must
email help@ece.arizona.edu or the CSG has the authority to kill any MatLab job
that it thinks is a runaway. /tmp
and /var/tmp These
areas on a given system are intended for very short-term use and only for small
files. The directory /tmp is used for swap space. It uses up resources and generally is cleared on reboot. The
directory /tmp should only be used for very short-term use, such as a program
writing out to a temp file for the duration it is running. Any program that writes to /tmp should then
clean up the file when the program exits. The directory /var/tmp is on the root
file system. It gets deleted whenever space is needed on the system. Both of these file systems are not intended
for long-term scratch space. There
really is not a good reason for students to use these directories for long
periods of time (more then a day). The
CSG has the authority to purge these directories without notice as necessary. /scratch
The directory /scratch is a special
space only available on ece3. It is a disk dedicated to scratch space for
users. There is no space limitation on, but we expect users to be reasonable.
The CSG does browse through the files on it to make sure it is being used
appropriately. This directory is NOT
BACKED UP! Some guidelines for using
/scratch include the following. All
data must be kept in a directory with the same name as your username. This
directory must have 700 level permissions. Thus, if you want to use /scratch,
you would type the following: mkdir
/scratch/myusername chmod 700
/scratch/myusername This space is NOT for long term
storage. It should only be used for ECE projects and should be promptly deleted
after the project is completed. Inappropriate use will result in the immediate
loss of access to /scratch. If the CSG notices that there are problems
developing with continued inappropriate usage of /scratch, the CSG has the
authority to implement quotas for the offenders. Backups Unless otherwise communicated, a
user’s home directory and mail spool are the only areas on a system that are
officially backed-up. NOTE: no data outside of your home directory or email
is backed-up. This means that all scratch spaces are not backed up. These
backups are intended to protect us from catastrophic events that may cause a
major loss of data. They are NOT intended to protect users from accidentally
deleting files or emails. Deleted files or emails may not be restored. Please
keep this in mind when deleting files or emails. If you have questions or comments
about anything contained in this document, please do not hesitate to email
help@ece.arizona.edu and let us know. |
|
Phone:(520)621-2434 Fax:(520)621-8076 Electrical and Computer Engineering Department, The University of Arizona P.O. Box 210104 1230 E. Speedway Tucson, AZ 85721-0104 Advisors/Application Information Computer Systems Group ![]() |