So I used to think the folks at SourceForge were just overworked and under-appreciated. They've worked really hard recently to show me this is not the case. Sure, it's a "free" service, but it's there to sell their premium services, and they get a lot of exposure being the place to go for open-source development.
They've been having growing pains for a while; a few months ago, they had a major CVS outage. Open-source development the world around ground to a standstill for days while they worked to get hardware up. Since then, they've been planning on transitioning folks to subversion, and to a new CVS infrastructure. Did I mention that in the meantime, the anonymous CVS has been frozen and out-of-date since March? This outage was only "repaired" for developers; user access has been broken all this time.
Since then, we've been planning on moving to our own server (donated by xs4all). As you've seen in previous posts, I've made progress towards that end. In the meantime, our expectation was to get in on the "new CVS" beta. We've been preparing versions of Fink that can "phone home" to figure out what the CVS repository should be, so we'd be prepared for when the switch came.
2 days ago, another CVS outage occurred. After a day of waiting without the site status page being updated, I opened 1 of (I think) hundreds of open unanswered bugs about CVS being down, making note that they should at least update the site status, so people would stop opening bugs. I went to the #sourceforge channel on IRC and asked about it, and they said there were network driver problems. They got it fixed, and CVS was back up, for, literally, 30 seconds. It turns out that machine also had a hard drive out. After being told it would be up once the hard drive was replaced, I wandered off to nap some more. I'd been out sick for 2 days and was really hoping to commit the culmination of 2 weeks of work on KDE fixes for outstanding issues that were (and are) affecting nearly all Fink KDE users. I had made the last few changes while I was at home, sick, and could finally commit and get back to sleep.
Hours pass, and I finally ask, "I hate to keep bugging you guys, but it seems the site status never gets updated during outages like this. Were there problems putting in new drives? Is there something else keeping our cvs from coming back up?" The response: "I'll update site status tomorrow with details, I don't have full details at the moment, but I do know it will be down over night, at minimum." Understandable, things happen. But the fact that the site status page wasn't the first thing updated as soon as a problem was found shows me how much they care about their users. You can save a lot of ill will by just telling people what's going on.
Today, the site status gets updated:
( 2006-05-10 04:43:14 - Project CVS Service )
As of 2006-05-09 the developer CVS
server had a disk-failure. As the new CVS infrastructure is in
its final phases of rollout, we'll be deploying it, in place of
the current infrastructure, by end of week. We'll be sending
out an email to project administrators with further details
later in the day, regarding how to access the new CVS servers
and the changes that occurred with the new infrastructure.
So one thing to note. Anonymous CVS for the new servers is at a new CVSROOT. Everyone that wants to access the new CVS repository will need to check out fresh, or manually munge their CVS/Root files. This is the eventuality we were going to handle gracefully, which is now impossible.
Additionally, shell access to the project servers (ie, the way we log in to modify pages on http://fink.sourceforge.net/) is down, so basically:
- we can't get to CVS
- when we do, our users won't be able to get to CVS
- we can't update the old CVS to give users a version of fink that can find the new CVS
- we can't update the web site to tell people what's going on
- once we can, the only option we have for our users is going to require manual intervention of some form, at the very least switching to
selfupdate-rsync
This is intensely frustrating. It would be one thing if it were just hardware and server issues. These things happen. But the total lack of disaster planning and managing of user resources (and user expectations) is just abysmal. The timing makes it all that much worse, since we were literally prepared to finally move away from SourceForge's spotty service in the next few weeks (or, at the very least, months).
But don't worry, SourceForge, we'll be reducing your server load just as soon as we're able. You can count on a little less load in the future.