Today our site experienced an outage that lasted several hours. Once our staff became aware of the outage, our emergency backup site was activated for community support.
First, I'd like to let you all know that the outage wasn't the result of a hacking attempt or anything so glamorous. Rather, it was the very boring condition of failed hardware.
Specifically, the main power supply in our master database server decided to die. While some attempts were made to resurrect the machine, the power supply refused to perform its primary duty of supplying power. As a result, our database server was deprived of electrons and would not boot.
We have moved our database drives to our replica database server and got it booted back up. All the databases crucial for running our services have checked out okay. If you notice any problems that didn't exist previously, especially database errors, let us know.
I know ya'll often go unappreciated for all the great work and time put into running this site. So I want to give ya'll my heartfelt Thank you for all that ya'll do. And for working so hard to get the site back online.
thank you
i try to seach she not work
can check please
Nice work
Glad everything is back up and running!
Glad to have it back though :)
I believe our search server was running on the slave SQL server, which is now disabled. I'm not sure if we'll be able to get it to return until we get the new power supply and boot it back up.
No worries Motoko-chan! Do what ya got to do to get back to 100% . Thanks for the updates also! 8) :D :)
EDIT: I hope this didn't hurt CodeFest at all too. They didn't have to take time away from that to deal with this?
No, those that could attend (I was unable to attend due to work) were still able to meet. I did keep them informed of developments by phone.
Quote from: Motoko-chan on February 25, 2009, 01:55:19 AM
No, those that could attend (I was unable to attend due to work) were still able to meet. I did keep them informed of developments by phone.
Great! Looking forward to hearing what they get done.
We all got here tonight -- the last will be here in the early afternoon tomorrow :) Thanks for asking.
We'll try to get the search issue resolved as fast as we can :)
Considering the fact that you actually lost hardware, you guys sure got back on line fast I think... :)
Well, we have two database servers - a master and slave used for replication. We pretty much just swapped the hard drives from one to the other. Problem is, the sphinx search index was on the second server, which is now the one without a power supply
Quote from: fords8 on February 25, 2009, 02:05:30 AM
Quote from: Motoko-chan on February 25, 2009, 01:55:19 AM
No, those that could attend (I was unable to attend due to work) were still able to meet. I did keep them informed of developments by phone.
Great! Looking forward to hearing what they get done.
Well, thus far we managed to get everyone here who was on an airline flight, and those who got here early have managed not to kill each other. We've had a few discussions about how much we love this project, and how we all want to double our pay (It's a joke since we are all unpaid volunteers!) but alas, we haven't gotten drunk, nor have we had any of the famous orgies that most 'business' meetings tend to manage.
Ok, seriously... metallica will be posting some of the details of our conferences to the Blog as time permits.
Great work getting the site back - as had been mentioned in this thread, you guys should be congratulated about how quick you got the site back up considering the power supply failure.
Great work!
I'm glad smf is online again... :)
Wow !!! I was actually having symptoms of SMF Withdrawl Disorder...............
Quote from: TW1ST3D on February 25, 2009, 07:30:36 AM
Wow !!! I was actually having symptoms of SMF Withdrawl Disorder...............
Your not the one who spent over 12 hours in travel time to get somewhere :P
/me injects SMF into himself.
Glad to see things back up and running. :)
Traveling does stink. But at least you are with people that like the samething you do. Now that is some coding power in one room! I wish I was there just to learn somethings!
Quote
As a result, our database server was deprived of electrons and would not boot
Lmfao, that's the most nice phrased explanation i've ever seen for a failing power supply :P
Good job getting it back up :)
Quote from: CoreISP on February 25, 2009, 11:54:46 PM
Quote
As a result, our database server was deprived of electrons and would not boot
Lmfao, that's the most nice phrased explanation i've ever seen for a failing power supply :P
Good job getting it back up :)
i actually quite enjoyed that line too :D
fortunately.. the power supply failure didn't cause other hardware damage.. (always my greatest fear. power supply dies and takes pc with it..) lol
Any idea when we'll get search back?
It is down at the moment as we moved our backup sql server to be our primary.
Quote from: SleePy on February 26, 2009, 09:46:12 AM
It is down at the moment as we moved our backup sql server to be our primary.
I realize that search is down. My question was about an estimate of when it will be back up.
It all depends on how quickly we can get replacement parts.
Thank you.
Quote from: Motoko-chan on February 24, 2009, 10:55:09 PM
Specifically, the main power supply in our master database server decided to die. While some attempts were made to resurrect the machine, the power supply refused to perform its primary duty of supplying power. As a result, our database server was deprived of electrons and would not boot.
Looks like you needed:
(https://www.simplemachines.org/community/proxy.php?request=http%3A%2F%2Fwww.zath.co.uk%2Fwp-content%2Fuploads%2F2009%2F01%2Fenergizer-advanced-lithium-batteries.jpg&hash=0be4fa115279925b8e76703139cc2353064fb4e2)
:D
This is when dual power supplies /w a batter backup work wonders. Wouldn't be without them!
Great job guys, keep it up!
As search can not work currently due to the fact that the search daemon, sphinx, and the associated index for it, were located on the DB02 server which we put the disks from DB01 in, we opted to temporarily provide a google search field that will search site:simplemachines.org for the search query entered, rather than showing the "unable to access search daemon" error.
As this is using the input field to plug a search into google, we do realize that many options that are familliar are not there and will not work. However, we feel that this is a better solution than having no search ability at all. This is especially a problem for our team and charter members whom have access to areas that the google bot cannot spider. We apologize for that inconvenience, but again, for support and customization services, we feel that it is best to provide a method of search in the interim that will work on a basic level.
Until we can either get the search daemon and index on the current db server or get a replacement power supply for the remaining DB server, the normal SMF search will be unavailable. There's not currently a lot we can do about it otherwise as the volume of searches done here will likely kill the remaining DB server and the site would be made consistently unavailable if we were to use a normal index.
Thanks to everyone for all the work you guys do and are doing to try and rectify this problem and keeping this great site and software going! :D
If this is a matter of money to get this fixed, I am sure many here would be more than happy to give a few dollars to help out.
Thanks for the updates and all the hard work. VERY appreciated from a small dog website in Toronto! My members all thank you to. They freak when they can't get in...the world is collapsing, Armageddon is here! But all seems good now. World is not collapsing. Armageddon is not nearing, Pug People all over Toronto are happy now. THANKS!!!!!!!!
Quote from: adamnchris on February 28, 2009, 11:43:50 AM
Thanks for the updates and all the hard work. VERY appreciated from a small dog website in Toronto! My members all thank you to. They freak when they can't get in...the world is collapsing, Armageddon is here! But all seems good now. World is not collapsing. Armageddon is not nearing, Pug People all over Toronto are happy now. THANKS!!!!!!!!
Hm,
I dont think that the downtime of this website will have affected your site.
If your site was down at the same time, it's pure coincidence :P
Earlier today this forum crashed my browser. :(
Now the forum is getting slower and slower for me again. Hope everything will be sorted out soon.
We're aware of the problems. All the while our other server is not fixed, the site will be going slower than usual. We should hopefully have everything fixed sometime next week
Just a note, as the forum is (for the most part) fully indexed by Google while the search is down here, you can use their services to possibly find a result.
http://www.google.com/search?q=search&domains=simplemachines.org&sitesearch=simplemachines.org
From all the posts above, it seems to me that SMF forum is self hosted, because with a hosting company such problem would have been fix within an hour or two because they have more resources.
Quote from: GrannyD on February 28, 2009, 06:46:39 AM
Thanks to everyone for all the work you guys do and are doing to try and rectify this problem and keeping this great site and software going! :D
If this is a matter of money to get this fixed, I am sure many here would be more than happy to give a few dollars to help out.
Granny
If a call is made for that, i'm also willing.
Quote from: Kenny01 on March 01, 2009, 04:15:02 PM
From all the posts above, it seems to me that SMF forum is self hosted, because with a hosting company such problem would have been fix within an hour or two because they have more resources.
Most of the diagnostics were made with the host we are co-locating with. However, because of time differences and some issues with their support area (they recently switched to a new support system), it took longer than it otherwise might.
Also, almost all the people authorized as contact points were either not able to handle the back and forth (because of the codefest) or were otherwise busy (both the server admin and I have busy day jobs).
Quote from: Kenny01 on March 01, 2009, 04:23:08 PM
Quote from: GrannyD on February 28, 2009, 06:46:39 AM
If this is a matter of money to get this fixed, I am sure many here would be more than happy to give a few dollars to help out.
If a call is made for that, i'm also willing.
It isn't a money issue currently, it's more a matter of having our server admin get a chance to check on the warranty status of the part and to order a new one. We're all very busy individuals in the team (remember, we don't get paid and almost all of us either have full time jobs, classes, or both), but we're trying to go as fast as we can.
Search is now working again.
I'd like to thank everyone for the kind words, understanding and generosity.
As Motoko-chan posted, the issue is that we are all volunteers and sometimes things happen in real life that take precedence over things happening on the internet. It shouldn't be too long until we get back to full strength. In the meantime, we may be a bit slow, but we are up and running again.
Wonderful guys, you're all great.
While you're at it, you might also want to go ahead and fix the remaining bugs in SearchAPI-Sphinx.php running on this site.
http://www.simplemachines.org/community/index.php?topic=127672.msg1939169#post_full_sphinx_fix_list
O:)
It's a step by step procedure.
Well, I'm glad to have the search feature back :)