News:

Wondering if this will always be free?  See why free is better.

Main Menu

Recent Downtime

Started by dschwab9, January 19, 2008, 02:02:07 AM

Previous topic - Next topic

dschwab9

As you all know, we just had quite a long downtime.  Just wanted to post an update to let you guys know what happened and assure you that we weren't hacked or anything like that.

Things started out as a higher than normal CPU load and slightly sluggish server and pretty quickly went south.  I arrived at the datacenter expecting a runaway process or something was making the machine unreachable.  Instead, what I found was a server that was hot enough to fry eggs on and a seriously corrupted file system.  Below is, best I can tell, the chain of events that occured:

1.  The thermal management system on the motherboard wasn't properly throttling fan speed, possibly for quit a while.  This caused the CPU temperature to hoover near the danger zone off approximately 65 degrees C.
2.  CPU load rose higher than normal, possibly caused be heavy traffic and/or a bug on the latest site upgrade.
3.  The increase in CPU load caused a proportional increase in heat being produced.
4.  The CPU temperature rose to over 75 degrees C, well into the critical range.
5.  The CPU became flakey and ultimately caused the entire box to come down, corrupting nearly every file that was open at the time.

Then, the attempted repair:
1.  Ran fsck on the disks, repaired numerous errors, and began to restore corrupted files.
2.  Site appears to be relatively stable in maintenance mode.   In preporation to come out of maintenance mode and call things "good", we rebooted the box to see if any more errors pop up.  A few more file system errors were found and repaired.
3.  Another reboot was performed to see if the box would come up cleanly.  This time, we are greated with thousand upon thousands of file system problems, many of which were unrecoverable.
4.  At this point, I made the decision to leave the datacenter, go home, and bring in another server to replace the entire machine.
5.  New machine was installed and all apps reconfigured.
6.  A couple of critical directories were not properly backed up, so the old server was disassembled, the drives mounted in another machine, and a reconstruction done on that data.

So, it has been a looonnggggg night.  In order to prevent future problems like this, the fan speed control has been disabled on all of the servers.

shadow82x

Great to see SMF is finally up again. Some hard work there eh Derek?

Also I suppose avatars were cleared?
Colin B
Former Spammer, Customize, & Support Team Member

Aaron

Wow Derek, you've certainly been busy. Great job on bringing us back this fast, considering circumstances. Thanks for all the effort! :)

JayBachatero

Just as a side note.  Due to the file corruption some attachments were lost.  Including mods and themes.  Mod authors please look over your mods and check to see if any files are missing.

If you experience any errors or anything just let us know and we will look into it.  We are still working on a few things like downloads.  The page should be back up later on tonight. :)

BTW THANKS DEREK.  YOU THE MAN :)
Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

nick09

well I'm happy to see the site is back.

EdwinK

Glad to see the site up again ;)
|| foto-site ||

LightningMk6

Well done on getting the site up and running again.

I B D

As it was mentioned the Theme Site may be missing files from your themes, we have just checked themes and all that is missing is thumbnail images, if you have a theme on the Theme Site can we please ask that you update your theme with a new image where possible.  All archive files are still there and are downloading fine.

The Tornado

well
i was trying to open the site all the day
thanks GOD it's here again ....

but i saw that u v updated the forums to smf 2.0 Beta 2 !!
when can it be available for members to update their forums to this version ?
help yourself by helping others

IT Group Syria Official Website

TheWrath!

hoo-ah for smf. good jobs fixing the problem guys!

Zetan

Good job on a hard mission.. just a heads up,
The Search seems to be broken.

shadow82x

Quote from: The Tornado on January 19, 2008, 02:31:37 AM

but i saw that u v updated the forums to smf 2.0 Beta 2 !!
when can it be available for members to update their forums to this version ?

There is no set time. Its ready when its ready. Currently a testing beta is aviable to charter members. (http://www.simplemachines.org/charter/)
Colin B
Former Spammer, Customize, & Support Team Member

I B D

Quote from: The Tornado on January 19, 2008, 02:31:37 AM
when can it be available for members to update their forums to this version ?

im afraid you will have to wait for a newer version of smf2 before its being released in public, also until its released as the stable it should not be used on a live environment

gffb

Nice to see you back bt when searching I get the following message.

The search API could not be found! Please contact the admin to check they have uploaded the correct files.

Gary

There will be at least one more charter beta before a public release.
Gary M. Gadsdon
Do NOT PM me unless I say so
War of the Simpsons
Bongo Comics Fan Forum
Youtube Let's Plays

^ YT is changing monetisation policy, help reach 1000 sub threshold.

nick09

Quotewhen can it be available for members to update their forums to this version ?

months,years possible.

but the current 2.0 is not recommended to try on major boards only test ones.

besides you have to be a charter members to even try the beta.

CiOooo

how a loooong night for u and a long time for us ;)

nice to see you back ;)



Cheryl619


Eugeniu

Ah, that sounds terrible :(. Not to mention buying a new server out of nowhere probably wasn't fun either...

Glad to see that SimpleMachines is back online :).

Dude111

Glad the site is back :)

SMF is awesome!!

Advertisement: