PBS scripts and I hate supercomputers

Well I hate it when they don’t work, which seems like always. Also make sure to end your pbs script with

“wait

Very critical that there is a wait after you do everything you want, and a newline at the end. Otherwise you’ll run into all sorts of troubles and have no idea what’s going on. I’ve now spent an entire day tracking down that problem on two different occasions, how I wish I had remembered the problem from the first time around.

Posted in Tech. Tags: , . Leave a Comment »

Two things

1st, when submitting pbs jobs using a submission file, always add wait to the end, otherwise the queuing system will kill whatever you have running. It doesn’t give you a very insightful error message generally just a seg fault.

2nd,
Old Ted – Everyone has an opinion on how long it takes to recover from a breakup:
Lily: Half the length of the relationship
Marshall: A week for every month you were together
Robin: Exactly 10,000 drinks however long that takes
Barney: You can’t measure something like this in time, there’s a series of steps… from her bed to the front door, bam out of there, next!

Installing SGE on ubuntu (single machine local install)

I installed the torque queuing system on my local machine (quad core, so if I give it 3 slots I can still work normally with one core reserved for firefox, a media player, a text editor, a compiler and interactive runs). However the torque system is a total pain, both to set up and to use. So I uninstalled (saving my config files, so if for some reason I wanted to revert I could), and I switched to the SGE (Sun Grid Engine) queue. Installing is mostly a breeze, however I didn’t document my steps as I went along, so this might not be flawless. I’ll try again later just to check but it might be a while.

sudo apt-get install gridengine-master gridengine-exec gridengine-client gridengine-qmon gridengine-common

Then you’re going to want to fire up qmon with sudo permissions and create your queue. You’re going to want to go to queue control, add a new queue name as suits your fancy, add a host which is your machine, it needs to resolve, i.e. ping myhostname knows what you’re talking about. This can be accomplished by adding a line to /etc/hosts or probably by just using your hostname or 127.0.0.1. I set slots to 4, which means it can use up all the cpu power when it needs to, I found I was still able to work normally when the queue is loaded (i.e. cpu usage will be at 100% on each core for the next 24 hours at least). But I could see setting it to 3 if you have a quad core machine. You’re also going to want to go to “host configuration” and add your machine as a submit host. This should allow you to submit jobs from the local machine to the queue on the local machine. It’s kind of a pseudoqueue, but it gets my work done faster, so I’m not complaining. Leave a comment if you have any problems, I may be able to help you out.

One thing that you might want to change from the default setup is schedule interval under scheduler in qmon. This is how often sge checks if there is an empty slot. If you submit say 1000 jobs that take only a 1 minute each that’s 1000 cpu minutes or over 4 hours on a 4 slot cluster. If the scheduling interval is 15 seconds, your jobs will wait on average 7.5 seconds before starting, making the total time 1000(60+7.5) or 1125 cpu minutes, which is 10% longer, admittedly it is rare that you submit one minute jobs in such quantities, but I find myself sometimes submitting a large number of short jobs and given that the cost of increasing the scheduling frequency is so low I changed it from 15 seconds to 5 seconds. I have since seen a nice uptick in cpu utilization when running batches of short jobs.

Posted in Tech. Tags: , , , . 4 Comments »