Pages: [1] 2 3 4 5 :: one page |
|
Author |
Thread Statistics | Show CCP posts - 13 post(s) |
|

CCP Fallout

|
Posted - 2010.06.30 13:35:00 -
[1]
As you know, CCP moved the Tranquility servers to a much larger and cooler server room and added new switches in the process. The downtime took longer than expected. CCP Yokai's newest dev blog fills us in on the events of the day.
Fallout Associate Community Manager CCP Hf, EVE Online Contact us |
|

XXSketchxx
Gallente Remote Soviet Industries Important Internet Spaceship League
|
Posted - 2010.06.30 13:42:00 -
[2]
First.
|

Chiana Torrou
|
Posted - 2010.06.30 13:46:00 -
[3]
Contrary to many others who post on the forums I still think you all did a really good job in the face of very difficult circumstances.
Thanks for all the hard work - and the free skill points
|

Brolly
Caldari Icarus' Wings
|
Posted - 2010.06.30 13:46:00 -
[4]
Fantastic stuff, nice to keep in the loop.
I would have been surprised if the game was up in 6 hours tbh as we all know how funky computers can be. Great job though, kudos to all involved with the move.
|

Ban Doga
|
Posted - 2010.06.30 13:48:00 -
[5]
Not sure if I missed anything but all that failed was the new storage area network for the database? Since that happened while TQ was offline why was losing transactions something you wanted to avoid? Which transactions could have gone lost?
|

Baeryn
22nd Black Rise Defensive Unit
|
Posted - 2010.06.30 13:51:00 -
[6]
Edited by: Baeryn on 30/06/2010 13:51:23
Quote: ...we began recovering the corrupted transaction logs, and replaying them to fill in any missing data...
This happens in almost every MSSQL emergency recovery I've ever been involved in. MySQL, on the other hand, usually goes much more smoothly.
Thanks for busting ass to get it back up for us, though! Role Playing Games by RolePlayGateway |

schwar2ss
Caldari
|
Posted - 2010.06.30 13:52:00 -
[7]
Edited by: schwar2ss on 30/06/2010 13:53:18 Thanks for the feedback. Obviously you didn't feed the DB-hamsters very well. On a serious note: how can a db brought offline (and online later on) in a messy state? These procedures are meant to finish all transactions, save the logs, disconnect all users and detach the file from the Ddbms. Did these errors occur during replication of the logs when shutting the db down?
|

Devin Maximus
|
Posted - 2010.06.30 13:52:00 -
[8]
First of all thanks for the transparency of the issue. Having worked with providing internet based services before (on a MUCH smaller scale ofc) and having been bitten by bad data I appreciate the time spent making sure my shiny internet ships were all still in their hangar. I'd rather see a day lost then be missing a pretty ship I had just bought.
And to all you whiners out there why don't you step outside, take in the sky and perhaps go for a hike? Me thinks you've been stuck in a pod too long. ;)
|

Tanjia Guileless
|
Posted - 2010.06.30 13:54:00 -
[9]
"What are we doing to prevent this?"
Migrating to a serious database product?
|

FingerThief
Gallente
|
Posted - 2010.06.30 14:01:00 -
[10]
Originally by: Tanjia Guileless "What are we doing to prevent this?"
Migrating to a serious database product?
Define serious, just so I can get a few more laughs out of your post. Fighting like Don Quixote, one windmill at a time. |
|

Mashie Saldana
Red Federation
|
Posted - 2010.06.30 14:06:00 -
[11]
So what caused the problem on the SAN in the first place: Broken hardware or misconfiguration?
|

Grez
M. Corp Daisho Syndicate
|
Posted - 2010.06.30 14:11:00 -
[12]
Edited by: Grez on 30/06/2010 14:12:45
Originally by: Tanjia Guileless "What are we doing to prevent this?"
Migrating to a serious database product?
This has been done to death...
Seems don't know much about DB's. Oracle would be too slow for what CCP need, hampering performance. MySQL is not robust enough compared to other DB's and has retention issues. MSSQL on the other hand is in all honesty, perfect for what CCP need. Don't even bother mentioning other DB's like D2, etc. They're not even in the same category as Oracle and MSSQL.
There is a reason CCP choose MSSQL, and I have enough faith in them to believe they have tested all available DB's and know which is best for TQ.
This is probably one of the many posts that will end up quoting yours and laughing, waiting for your 30+ internet years of professional internet lawyer-ism and database management to aid CCP in their not-so-srs bidniz. ---
|

Raquel Smith
Caldari Freedom-Technologies Eych Four Eks Zero Ahr
|
Posted - 2010.06.30 14:19:00 -
[13]
Originally by: CCP Fallout As you know, CCP moved the Tranquility servers to a much larger and cooler server room and added new switches in the process. The downtime took longer than expected. CCP Yokai's newest dev blog fills us in on the events of the day.
I work in IT and COMPLETELY UNDERSTAND unforseen happenings as a result of maintenances!
I've had a routine security update corrupt an entire LDAP database, it caused a week of instability and hassle.
Thanks for the blog post.
-- Creator of The Ruby API Library |

Batolemaeus
Caldari Vauryndar Dalharil
|
Posted - 2010.06.30 14:20:00 -
[14]
Originally by: Tanjia Guileless "What are we doing to prevent this?"
Migrating to a serious database product?
CouchDB?
Nice blog btw., and thanks for explaining what went wrong. We're still missing a few pictures and fancy graphs though. 
|

wapko
The Ankou Systematic-Chaos
|
Posted - 2010.06.30 14:21:00 -
[15]
that burrito analogy.... srsly ...
you did good work. it is much appreciated.. now get some pics and show us the shineys :)
|

Paknac Queltel
Caldari Provisions
|
Posted - 2010.06.30 14:26:00 -
[16]
I, for one, am glad my Raven is still here. 
Originally by: Tanjia Guileless "What are we doing to prevent this?"
Migrating to a serious database product?
To say MSSQL is not a serious database product is somewhat unfair.
Especially since the corruption mentioned here happened below the database engine.
Do you expect a car to keep driving in a straight line after the ground underneath it disappears? - Paknac Queltel
|

Shintai
Gallente Arx Io Orbital Factories Arx Io
|
Posted - 2010.06.30 14:34:00 -
[17]
Originally by: Tanjia Guileless "What are we doing to prevent this?"
Migrating to a serious database product?
Troll detected. Or just someone who never worked with a DB or any serious DB atleast.
Hey my DB with 500 entries and nothing else around works all the time. CCP must suck!!  --------------------------------------
Abstraction and Transcendence: Nature, Shintai, and Geometry |

Amy Garzan
Gallente The Warp Rats Intrepid Crossing
|
Posted - 2010.06.30 14:38:00 -
[18]
Pictures? -------------------------------------------------- 101010 The Answer to Life, The Universe, and Everything |

Regat Kozovv
Caldari Alcothology
|
Posted - 2010.06.30 14:42:00 -
[19]
Big Iron is just that. Big. And Hard. "Stuff" happens and you're up till 2AM trying to figure out what happened to your meticulous planning.
I'm sure I can speak for many not posting here in that we appreciate the hard work done.
Also, thanks for the SP reimbursement. I thought it was a perfect way to compensate and a class act.
Thanks for the dev blog and hope you guys learned some new stuff from it. 
Originally by: CCP Atropos THIS IS WHY WE CAN'T HAVE NICE THINGS.
|

Dusty Meg
Shock-Wave Industrys Astro Lux Aedificatiae
|
Posted - 2010.06.30 14:47:00 -
[20]
Great job guys. You chose to go the way that no other game has gone and got the problems with it. Its still a magical thing you can do keeping the TQ server running (some whats smoothly )
|
|
|

Chribba
Otherworld Enterprises Otherworld Empire
|
Posted - 2010.06.30 14:49:00 -
[21]
Edited by: Chribba on 30/06/2010 14:50:07 Expected hardware photos to come to, left dissapointed. But now to read the text... 
Secure 3rd party service | my in-game channel 'Holy Veldspar' |
|

T'ealk O'Neil
|
Posted - 2010.06.30 14:52:00 -
[22]
Would it not be an idea in future when doing any patching / moving to take a backup of the database as it stands before starting - that way a recovery is simple, rather than trying to repair everything, which takes forever
|

Alexa Lanxia
|
Posted - 2010.06.30 15:04:00 -
[23]
You asked for questions so here we go. I've never seen putting "actual load on the storage area network" cause corrupt database tables. You might get routing problems, zoning problems, reservations issues - I've seen many strange problems in the past. But you usually either can access your target LUNs or you can't so I'm not sure what to make of that, care to elaborate? Was it human error or did the actual hardware have a problem?
(What's your switch-vendor anyway, Brocade, Cisco or something more obscure if you don't mind me asking?)
|

Louis deGuerre
Gallente Amicus Morte Shock an Awe
|
Posted - 2010.06.30 15:06:00 -
[24]
Needs more pictures of cabling, hamsters and elephants fighting CCP Soundwave 
Nice blog  Sol: A microwarp drive? In a battleship? Are you insane? They arenĘt built for this! Clear Skies - The Movie
|

Mynxee
|
Posted - 2010.06.30 15:06:00 -
[25]
Thanks for that summary of what happened. I imagine things were extremely stressful on many fronts throughout the entire effort.
Life In Low Sec |

Mabrick
Mabrick Mining and Manufacturing
|
Posted - 2010.06.30 15:21:00 -
[26]
Pesky SANs. If the IT god of thunder had meant for contiguous data to be broken up and spread across hell's half acre of magnetic storage the IT god of thunder would not have created contiguous data! I've always shook my head at the fact that we create monstrous databases to organize our data and then scatter the bits across so many platters with nothing more to cover our backsides than a few thin mathematical algorithms. The vision of nice, clean logically related tables spread willy-nilly everywhere just makes me shudder.
CCP did it right. Good job and many thanks! 
|

Commander Azrael
Red Federation
|
Posted - 2010.06.30 15:22:00 -
[27]
Edited by: Commander Azrael on 30/06/2010 15:25:53
Originally by: T'ealk O'Neil Would it not be an idea in future when doing any patching / moving to take a backup of the database as it stands before starting - that way a recovery is simple, rather than trying to repair everything, which takes forever
Apart from a DB backup being massive, they did back it up. If you read the dev blog they chose the lengthier option of fixing the corrupted entries instead of rolling back. Which do you prefer? An extended downtime? or logging in to find ISK missing from your missions you ran and that shiny ship you bought no longer there?
Originally by: Alexa Lanxia (What's your switch-vendor anyway, Brocade, Cisco or something more obscure if you don't mind me asking?)
http://www.eveonline.com/devblog.asp?a=blog&bid=769
Primarily Cisco.
|

T'ealk O'Neil
|
Posted - 2010.06.30 15:26:00 -
[28]
Edited by: T''ealk O''Neil on 30/06/2010 15:26:17
Originally by: Commander Azrael Apart from a DB backup being massive, they did back it up. If you read the dev blog they chose the lengthier option of fixing the corrupted entries instead of rolling back. Which do you prefer? An extended downtime? or logging in to find ISK missing from your missions you ran and that shiny ship you bought no longer there?
I suggest you re-read. They had A backup, but if they had taken a backup as the first step before starting any work then no isk would have been lost as nobody would have been logged in between those times.
|

Laconis Dax
Children of Armok
|
Posted - 2010.06.30 15:27:00 -
[29]
You're doing fine work, CCP. I know that couldn't have been fun or easy.
And thanks for sharing the details with us. Always nice to get my fix of infrastructure ****. 
|

Lolion Reglo
Death Incarnate INC
|
Posted - 2010.06.30 15:31:00 -
[30]
Well thank you for telling us what all happened that day. I was one of the patient ones who said do the job right so it doesn't happen again so i found the xp bonus to be a nice surprise and a nice gift. took 2 days off logistics V for me .
However i still don't think you guys deserved half the stuff on the facebook page that you did. its one thing to be harsh towards you and say get the server up NOW and hold you accountable for your service you provide but an entirely different thing to verbally abuse you guys.
|
|
|
|
|
Pages: [1] 2 3 4 5 :: one page |
First page | Previous page | Next page | Last page |