Pages: 1 2 [3] 4 5 :: one page |
Author |
Thread Statistics | Show CCP posts - 13 post(s) |

Zathi Shaitan
Illiteracy Combatants
|
Posted - 2010.06.30 23:34:00 -
[61]
MSSQL always was a fail cascade, is still a fail cascade, and will continue being a fail cascade.
---- http://loseloose.com/
http://youryoure.com/
|

Swidgen
|
Posted - 2010.06.30 23:59:00 -
[62]
Originally by: Amida Ta So the big question remains unanswered in the blog:
"after finding the root cause"
So what was the root cause?
LOL, you were expecting actual information? This is one of the least informative dev postings in a long time, and that's quite an accomplishment. |

Itseban Tvi
|
Posted - 2010.07.01 00:17:00 -
[63]
Appreciate all the hard work, and the swift explanation and skill repayment. I seriously winced for you guys when I heard about what was happening. I have been there.
|

Comnitus Ultima
|
Posted - 2010.07.01 00:20:00 -
[64]
Originally by: Knaar I just wanted to say that you guys are doing an awesome job. Despite coming up against Finagle's Law you made the right decisions.
One thing you should do is realize that every whiner is actually a hopelessly addicted customer that needs his/her fix and gets super grumpy without it. We wouldn't be hopelessly addicted unless you all were doing something severely wonderful. So in reality every whine is just an admission of your magnificence in disguise.
Well... yeah, but... eh... but I mean... err... ahh...
Damn, he's right.
|

cBOLTSON
Caldari Shadow Legion. Talos Coalition
|
Posted - 2010.07.01 00:27:00 -
[65]
Edited by: cBOLTSON on 01/07/2010 00:28:53 I guess sometimes **** happends and you have to do the best you can. Thanks for taking time to explain what happened in this blog :)
EDIT: I too would like to know what the root cause of the problem was (-_o)
|

Daan Sai
OHiTech
|
Posted - 2010.07.01 01:33:00 -
[66]
Good recovery under pressure. You can never have too many backups! Hope the root cause wasn't an unplugged node :)
--------------------------------- Internet Submarines is Serious Business ---------------------------------
|

VaL Iscariot
|
Posted - 2010.07.01 01:34:00 -
[67]
To be honest it made me sick to read all the people whining and complaining about how long it was taking. Everyone was given a full weeks warning that this downtime was going to happen, and to be prepared for it. Instead, the so called 'mature' player base of Eve Online was found to be no more then a bunch of World of Warcraft drop out whiners. It was only a day and a half and people were up in arms about "HOW DARE CCP TRY TO UPGRADE THE GAME I PLAY EVEN THOUGH THEY'RE DOING THE EXACT THING I WHINE ABOUT AND WISH THEY'D DO ON A DAILY BASIS!! I WANT FREE NAO!!1!" the worst part being that CCP obliged them, thus giving the minority that post on the forums more voice then what they deserve.
Next time CCP, dig through the f*cktards and just ignore them. Giving in to them only makes it worse. Also, don't be insulted by them either. They don't know what it takes. (though I'm sure your going to find a so called 'game developer for Call of Duty 4, and a few 'Blizzard programmers' in here too to tell you just how its done )
Thanks for all the fish VaL
|

MC187
|
Posted - 2010.07.01 02:22:00 -
[68]
http://i3.photobucket.com/albums/y67/nl37tgt/automotivator.jpg
hehe
|

Syekuda
State Protectorate
|
Posted - 2010.07.01 02:31:00 -
[69]
Originally by: MC187 http://i3.photobucket.com/albums/y67/nl37tgt/automotivator.jpg
hehe

on a serious note, I hate to say this but it must of been a very difficult decision. I think you didn't expect that much hatred. I guess thats a proof of love to this game
just a small request, were all (well some of us are anyway) adults, please tell us the real time it takes and dont update the news so in the next 30 minutes its going live...when it dont go live and your not sure. If it takes another 6 hours, fine just be straight about it, be honest. We can take it.
|

Lillandra Peregrine
|
Posted - 2010.07.01 03:05:00 -
[70]
nice work ccp. and thanks for explaining what happened, appreciate the transparency. :)
|

Asperath Fernandez
|
Posted - 2010.07.01 03:08:00 -
[71]
Originally by: T'ealk O'Neil Edited by: T''ealk O''Neil on 30/06/2010 15:26:17
Originally by: Commander Azrael Apart from a DB backup being massive, they did back it up. If you read the dev blog they chose the lengthier option of fixing the corrupted entries instead of rolling back. Which do you prefer? An extended downtime? or logging in to find ISK missing from your missions you ran and that shiny ship you bought no longer there?
I suggest you re-read. They had A backup, but if they had taken a backup as the first step before starting any work then no isk would have been lost as nobody would have been logged in between those times.
Not understanding the context of the word "transaction" in this case, ftw. 
|

Hiyoshi Maru
|
Posted - 2010.07.01 03:20:00 -
[72]
I have worked in IT for over 15 years now and seen a lot of projects and rollouts in a lot of places (NZ, Oz, UK, Netherlands, Germany, Singapore to name a few). It is extremely rare things go smoothly, there will always be an issue.
It is excellent to see not only did you take the time to sort the issue correctly rather than hammer a fix in and deal with it later, but you have also publically stated the essence of what happened. This is extremely rare and very commendable.
Thanks for the hard work you do, the efforts made by all the unseen people in the background who make this work for all of us, and thanks for your honesty. It is this kind of relationship that CCP tries to engender that makes EVE the brilliant game that it is.
|

Rhok Relztem
Caldari CGMA Synergist Syndicate
|
Posted - 2010.07.01 04:53:00 -
[73]
All I have to say is...
CCP and everyone involved from top to bottom - one very big class act.
I sincerely hope some of the other game developers out there are taking notes.
|

Niccolado Starwalker
Gallente Shadow Templars
|
Posted - 2010.07.01 05:41:00 -
[74]
Originally by: CCP Fallout As you know, CCP moved the Tranquility servers to a much larger and cooler server room and added new switches in the process. The downtime took longer than expected. CCP Yokai's newest dev blog fills us in on the events of the day.
As always, good work CCP!
Originally by: Dianabolic Your tears are absolutely divine, like a fine fine wine, rolling down your cheeks until they flow down the river of LOL.
|

Monkey M3n
The Collective Against ALL Authorities
|
Posted - 2010.07.01 05:47:00 -
[75]
stupid IT noobs
tl;dr You suck
|

Nofonno
Amarr Aperture Harmonics K162
|
Posted - 2010.07.01 07:37:00 -
[76]
After several years in EAO (enterprise aplications operations) in a major multinational corp, I've had also had my share of failed moves and transitions.
Also, I've read, and also composed, many after action reports that smoothed the actual ****-up we made for the customer, so no-one would get too ****ed and we'd play it together for years to come.
Though I'm an UNIX guy and know squat about M$ enterprise software range, but I know my share about SANs. This all smells rather of a human error that a technical one (as it usually is in IT) -- something must've happened during the trip to the actual disk array drives, most probably one or more were mishandled and produced unrecoverable data errors.
Who knows... I, as a paying customer, don't mind too much, since all I had is in its place and I'm not in a time pressure. It could've ended much worse, so, kudos to CCP.
Better luck next time  ---
A scientist must be an optimist at heart - to have the strength to rally against a chorus of voices saying "it cannot be done". |

Apaximander
|
Posted - 2010.07.01 07:42:00 -
[77]
MMO players really need to stop whining about outages. It's the nature of a game like this that there will be unexpected downtime; it happens. It's not as if we were locked out for a week or anything, either.
|

sjw7
Caldari CompleXion Industries
|
Posted - 2010.07.01 08:08:00 -
[78]
Like some others who have posted here I have spent many years working in Enterprise IT from support to design and implimentation of all types of projects similar to the one CCP have been telling us about.
It wasnt clear from the blogs but it seems that the move to the new room included some new kit as well as the reuse of some of the old kit. This is always a pain but good planning can minimise configuration issues. The explanation of the database corruption seems odd. A poorly configured SAN will either slow an application down or stop it working completely. The only time i have come across any kind of database problem which was caused by the SAN is in the Quorum disk of an MS cluster when using storage level replication. It was a very specific event that caused the problem and the corruption was with the cluster and not the actual data itself. It deffinately seems that someone didnt follow the golden rule of system upgrades which is 'Take a backup first.' As Eve was shut down a backup would clear the SQL transaction logs and there would be no need to replay them to get data back. Someone should really hold their hand up and say that they messed this bit up rather than blame the hardware.
For those saying that its just a game and people should quit complaining I will point out that Eve is a paid for service. Just because its a game doesnt change this one bit. If your TV company stopped sending you a signal for a day and a half you would probably complain its the same with a whole host of other services you pay for as well.
Also for those Microsoft haters out there you need to realise that mySQL is not an alternative to MSSQL in an enterprise environment. Firstly it doesnt scale and secondly when things have gone horribly wrong you can call Microsoft (as long as you have the support agreement) and you will get the problem fixed. I have had to do this in the past and after dealing with many vendors support departments I can assure you that MS is one of the very best when you have a premier support agreement. You will get nothing like the same level of support with mySQL which is best suited to running forums and small installations. The other big player is Oracle but once you pick one you dont change as its a hell of alot of work.
|

Temai
Gallente
|
Posted - 2010.07.01 08:54:00 -
[79]
Thanks for working so hard to get the servers back up, i work in IT my self.. and i know how it feals whena network die's..and lots of not happy people know where the server is and wait for you to arrive to "ask" you to fix it... remind me to recharge my tazer
on another note blowing out shoes proof that your doing something right.. you cant build good stuff with out melting something... or seting fire to someone.. ^^
- Temai Row Row Fight The Power - Libera Me
|

Fujiko MaXjolt
Caldari Templar Republic
|
Posted - 2010.07.01 09:23:00 -
[80]
To put things in perspective, "that other" MMO out there had a patchday yesterday that was supposed to be 12 hours, but got extended to 18 hours over 3 times, no explanation/information at all...
My wife was not a happy Panda - atleast with EVE we get info and a complete debriefing after the fact ;-)
Oh, and also we got a nice gift :-D
|

Libin Herobi
|
Posted - 2010.07.01 10:00:00 -
[81]
Looking forward to not receiving any answers in this thread.
|
|

CCP Yokai

|
Posted - 2010.07.01 10:11:00 -
[82]
Originally by: Libin Herobi Looking forward to not receiving any answers in this thread.
Patch day today.
I'll start grinding through responses after downtime today.
Thanks!
|
|

Gallosek
|
Posted - 2010.07.01 11:14:00 -
[83]
Originally by: sjw7 Also for those Microsoft haters out there you need to realise that mySQL is not an alternative to MSSQL in an enterprise environment. Firstly it doesnt scale and secondly when things have gone horribly wrong you can call Microsoft (as long as you have the support agreement) and you will get the problem fixed. I have had to do this in the past and after dealing with many vendors support departments I can assure you that MS is one of the very best when you have a premier support agreement. You will get nothing like the same level of support with mySQL which is best suited to running forums and small installations. The other big player is Oracle but once you pick one you dont change as its a hell of alot of work.
I would like to point out that mySQL is owned by Oracle, and was owned by SUN Microsystems before that. Either of whom would provide you the support you commend Microsoft for and yes, configured correctly mySQL scales nicely.
Not that I am suggesting CCP should migrate to mySQL. The pain caused by the incumbent solution (of any description) usually has to be very high to justify the enormous costs of migration (for any job you do, work out what it costs based on the number of hours it takes against your hourly salary, for laughs see how much income you waste by needing sleep).
|
|

CCP Yokai

|
Posted - 2010.07.01 11:32:00 -
[84]
Originally by: T'ealk O'Neil Edited by: T''ealk O''Neil on 30/06/2010 15:26:17
Originally by: Commander Azrael Apart from a DB backup being massive, they did back it up. If you read the dev blog they chose the lengthier option of fixing the corrupted entries instead of rolling back. Which do you prefer? An extended downtime? or logging in to find ISK missing from your missions you ran and that shiny ship you bought no longer there?
I suggest you re-read. They had A backup, but if they had taken a backup as the first step before starting any work then no isk would have been lost as nobody would have been logged in between those times.
This is correct we did have our normal backup that was a few hours out of date. And yes, having the backup run to after Down Time is the right way to do it... and what we are doing every time now. For todays client patch we did this too. So from this point forward we should have a full copy of the DB at a point where no transactions need to be run.
|
|
|

CCP Yokai

|
Posted - 2010.07.01 11:34:00 -
[85]
Originally by: T'ealk O'Neil Would it not be an idea in future when doing any patching / moving to take a backup of the database as it stands before starting - that way a recovery is simple, rather than trying to repair everything, which takes forever
Bonus points for being the first to say it... again. Dead right and the process now on everything even remotely risky.
|
|
|

CCP Yokai

|
Posted - 2010.07.01 11:36:00 -
[86]
Originally by: Camios Edited by: Camios on 30/06/2010 15:59:58 You were grinning, that means that this photo has been taken before the mess.
cool pics btw
This was one of the last prep moves before the TQ move. Mainly just making sure the Ethernet systems were in good order.
|
|
|

CCP Yokai

|
Posted - 2010.07.01 11:45:00 -
[87]
Originally by: Amida Ta So the big question remains unanswered in the blog:
"after finding the root cause"
So what was the root cause?
I waited on posting the exact details until after I had quite a few or our vendor experts chime in to make certain we had the root.
One of the links to our RAM SAN Storage corrupted data being written to the storage device.
The exact bit that failed cannot be identified because frankly rather than tinkering with every link, transceiver, and switch port on the route I just nuked it form orbit. We replaced the fiber, moved to a new pair of transceivers, new port on patch panels, and even a new port on the switches.
Once we did this... the errors went away on the SAN and we had storage normalized.
I hope that helps clarify... Where the root of the issue that caused the corruption was and what we fund to be the problem. The nuking from orbit bit is not my preferred method of troubleshooting, but again... choice of get TQ online faster or satisfy my desire for empirical data... I choose TQ.
|
|

Sturmwolke
|
Posted - 2010.07.01 11:56:00 -
[88]
Lol, as suspected. However, I greatly appreciate the candidness in replies from CCP Yokai. If anything, thumbs up for showing accountability. |

Hack Harrison
Caldari
|
Posted - 2010.07.01 12:11:00 -
[89]
Originally by: CCP Yokai
Originally by: Amida Ta So the big question remains unanswered in the blog:
"after finding the root cause"
So what was the root cause?
I waited on posting the exact details until after I had quite a few or our vendor experts chime in to make certain we had the root.
One of the links to our RAM SAN Storage corrupted data being written to the storage device.
The exact bit that failed cannot be identified because frankly rather than tinkering with every link, transceiver, and switch port on the route I just nuked it form orbit. We replaced the fiber, moved to a new pair of transceivers, new port on patch panels, and even a new port on the switches.
Once we did this... the errors went away on the SAN and we had storage normalized.
I hope that helps clarify... Where the root of the issue that caused the corruption was and what we fund to be the problem. The nuking from orbit bit is not my preferred method of troubleshooting, but again... choice of get TQ online faster or satisfy my desire for empirical data... I choose TQ.
Its the only way to be sure!!!
|

Moraguth
Amarr Dynaverse Corporation Sodalitas XX
|
Posted - 2010.07.01 12:12:00 -
[90]
Thank You CCP Yokai. I know how the urge to find the exact problem would be almost overwhelming. Thank you for the details too.
Originally by: CCP Yokai
Originally by: Amida Ta So the big question remains unanswered in the blog:
"after finding the root cause"
So what was the root cause?
I waited on posting the exact details until after I had quite a few or our vendor experts chime in to make certain we had the root.
One of the links to our RAM SAN Storage corrupted data being written to the storage device.
The exact bit that failed cannot be identified because frankly rather than tinkering with every link, transceiver, and switch port on the route I just nuked it form orbit. We replaced the fiber, moved to a new pair of transceivers, new port on patch panels, and even a new port on the switches.
Once we did this... the errors went away on the SAN and we had storage normalized.
I hope that helps clarify... Where the root of the issue that caused the corruption was and what we fund to be the problem. The nuking from orbit bit is not my preferred method of troubleshooting, but again... choice of get TQ online faster or satisfy my desire for empirical data... I choose TQ.
good game
Hoc filum tradit - This thread delivers.
|
|
|
Pages: 1 2 [3] 4 5 :: one page |
First page | Previous page | Next page | Last page |