I installed the torque queuing system on my local machine (quad core, so if I give it 3 slots I can still work normally with one core reserved for firefox, a media player, a text editor, a compiler and interactive runs). However the torque system is a total pain, both to set up and to use. So I uninstalled (saving my config files, so if for some reason I wanted to revert I could), and I switched to the SGE (Sun Grid Engine) queue. Installing is mostly a breeze, however I didn’t document my steps as I went along, so this might not be flawless. I’ll try again later just to check but it might be a while.
sudo apt-get install gridengine-master gridengine-exec gridengine-client gridengine-qmon gridengine-common
Then you’re going to want to fire up qmon with sudo permissions and create your queue. You’re going to want to go to queue control, add a new queue name as suits your fancy, add a host which is your machine, it needs to resolve, i.e. ping myhostname knows what you’re talking about. This can be accomplished by adding a line to /etc/hosts or probably by just using your hostname or 127.0.0.1. I set slots to 4, which means it can use up all the cpu power when it needs to, I found I was still able to work normally when the queue is loaded (i.e. cpu usage will be at 100% on each core for the next 24 hours at least). But I could see setting it to 3 if you have a quad core machine. You’re also going to want to go to “host configuration” and add your machine as a submit host. This should allow you to submit jobs from the local machine to the queue on the local machine. It’s kind of a pseudoqueue, but it gets my work done faster, so I’m not complaining. Leave a comment if you have any problems, I may be able to help you out.
One thing that you might want to change from the default setup is schedule interval under scheduler in qmon. This is how often sge checks if there is an empty slot. If you submit say 1000 jobs that take only a 1 minute each that’s 1000 cpu minutes or over 4 hours on a 4 slot cluster. If the scheduling interval is 15 seconds, your jobs will wait on average 7.5 seconds before starting, making the total time 1000(60+7.5) or 1125 cpu minutes, which is 10% longer, admittedly it is rare that you submit one minute jobs in such quantities, but I find myself sometimes submitting a large number of short jobs and given that the cost of increasing the scheduling frequency is so low I changed it from 15 seconds to 5 seconds. I have since seen a nice uptick in cpu utilization when running batches of short jobs.