|
Author |
Thread Statistics | Show CCP posts - 8 post(s) |
|

CCP Wrangler

|
Posted - 2008.12.28 13:26:00 -
[1]
We're currently experiencing startup issues with Tranquility, please check back here for more information as it becomes available.
Update (14:24): Tranquility is now up and running.
Wrangler Community Manager CCP Hf, EVE Online Email |
|
|

CCP Wrangler

|
Posted - 2008.12.28 14:33:00 -
[2]
Originally by: Kweel Nakashyn
Originally by: CCP Wrangler Update (14:24): Tranquility is now up and running.
What happened ?
We're putting together details on that and will post them as soon as we're done. 
Wrangler Community Manager CCP Hf, EVE Online Email |
|
|

CCP Wrangler

|
Posted - 2008.12.28 14:56:00 -
[3]
At downtime today, we had our SQL Server start showing errors while the downtime jobs were running. The cause of that problem is still unknown, but 40 minutes after the first indication of a problem in the logs, the SQL Server failed over. This caused a downtime job that had not completed to abort without finishing. The downtime job in question makes temporary changes to the database that, if interrupted at the wrong time, can cause the SQL Server to go to 100% CPU load and because almost totally unresponsive at a startup. Due to the nature of the problem it was difficult... almost impossible to look at the running state of the server and because of that, it took a long time to find the cause of the issue.
When we were unable to find the cause of the problem, a 10 people taskforce from CCP was mobilized to their laptops, and more people put on standby to help with solving the issue. After some investigative work, the cause of the problem was found, a missing index in the standings system, it was fixed and we were able to start up. The missing index was not immediately obvious due job logs not updating as the server failed over.
Wrangler Community Manager CCP Hf, EVE Online Email |
|
|

CCP Wrangler

|
Posted - 2008.12.28 14:58:00 -
[4]
We have a correction, at least two of us were at our workstations and not on laptops! Prism X felt that this was very valuable information and therefore insisted on this being corrected.
Wrangler Community Manager CCP Hf, EVE Online Email |
|
|

CCP Prism X
Gallente C C P

|
Posted - 2008.12.28 15:00:00 -
[5]
Originally by: CCP Wrangler We have a correction, at least two of us were at our workstations and not on laptops! Prism X felt that this was very valuable information and therefore insisted on this being corrected.
I'll send you a complete review of necessary changes to conform to proper aesthetics later today. 
~ Prism X EvE Database Developer Relocating your character to a cozy, secure container since 2006. Relocating your cozy, secure container to the EVE cemetery since 2008. |
|
|

CCP Valar

|
Posted - 2008.12.28 15:22:00 -
[6]
Originally by: Roy Batty68 Ooo! Moar database geekyness please!
So you guys reindex the major tables every downtime? 
No, but there is an index on the main standings table that is dropped while NPC to player corp standings are recalculated, as it takes less time to drop the index, do the updates and recreate the index than it takes to run the job while the index is still on the table.
Most our major tables are too big to be reindexed during normal downtime. ---- Virtual World Database Administrator Operations department CCP Games |
|
|

CCP Prism X
Gallente C C P

|
Posted - 2008.12.28 16:24:00 -
[7]
Originally by: Irma Bondis
Originally by: Roy Batty68 So you guys reindex the major tables every downtime? 
Stuff <-- See! Quotes don't have to include an insane block of text. Yay for fewer characters stored!
Your tech speak is spot on but your assumption is aaaalmost half-right.  The two different procedures have no step by step relation though as you described. One is a downtime job and the other is fetching character information on login. That's why the server made it up until people started logging in at which point the constant fullscans caused CPU to skyrocket, important cluster calls were not getting through and nodes started dying.
You are however right in assuming that we could code each and every procedure to check for the existence of those indexes we'd expect it to use although it would have to cover quite a lot as SQL Serve can sometimes be a big black box of hate and do things that you'd never expect in its query plans. (Personal note to CCP Atlas: See, I wrote "its" rather than "it's". I get the paradigm! /personalJoke). But it's somewhat obvious that that is a lot of redundant overhead we can't really accept in a DB that needs to be as robust as possible. We also shouldn't need to accept it as we should be able to trust our indexes not to go *poof* on us. 
However, it's an unacceptable risk. 135 minutes of extra downtime that could have been avoided is really not acceptable to anyone here in CCP. Sure we are now all aware of this possibility and know how to detect it but it's still (See Atlas! SEE!) an utterly unnecessary risk of increased downtime, even if it saves us some minutes of the daily downtime. So this will most likely change in the near future.
Lesson learned: Automatically dropping indexes ftl. 
~ Prism X EvE Database Developer Relocating your character to a cozy, secure container since 2006. Relocating your cozy, secure container to the EVE cemetery since 2008. |
|
|

CCP Explorer

|
Posted - 2008.12.28 18:56:00 -
[8]
Originally by: Wy LinChow Please tell us this problem was not as the result of tweaks being made during the holiday.
This was not the result of tweaks being made during the Holidays.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|
|
|