| Pages: 1 2 3 4 :: [one page] |
| Author |
Thread Statistics | Show CCP posts - 1 post(s) |
|

CCP Fallout

|
Posted - 2010.03.02 19:06:00 -
[1]
As many of you have notice, Tranquility has been less than tranquil of late. CCP Valar fills us in on the progress being made towards keeping Tranquility well-behaved in his newest dev blog.
Fallout Associate Community Manager CCP Hf, EVE Online Contact us |
|

Frug
Omega Wing
|
Posted - 2010.03.02 19:17:00 -
[2]
In before "I'm a paying customer and this is unacceptable and I'm canceling all 50 of my accounts FUUUUUU"
- - - - - - - - - Do not use dotted lines - - - - - - If you think I'm awesome say BOOO BOOO!! - Ductoris Neat look what I found - Kreul Whisper/PrismX 4 emperor |

HeliosGal
Caldari
|
Posted - 2010.03.02 19:22:00 -
[3]
perhaps these bugs have something to do with the bugged anamoly issue that u guys seem to not have been able to fix despite saying its been fixed but it hasnt still finding sites that refuse to despawn is this related ? Signature - CCP what this game needs is more variance in PVE aspects and a little bit less PVP focus, more content more varied level 1-4 missions more than just 10 per faction high sec low sec and 00 |

Arcturas Vostro
|
Posted - 2010.03.02 19:25:00 -
[4]
Originally by: HeliosGal perhaps these bugs have something to do with the bugged anamoly issue that u guys seem to not have been able to fix despite saying its been fixed but it hasnt still finding sites that refuse to despawn is this related ?
In before "Maybe it's due to this completely unrelated bug that makes me angry!"...oh, wait...
|

Hud Bannon
|
Posted - 2010.03.02 19:27:00 -
[5]
Ahh...The growing pains of interaction between tons of hugely complex systems. EVE takes on a double entendre when you call it the Sand box. :)
Sort of like people. They suprise you and don't work out like you thought it would when you get enough of them in a room together.
I predict that tranquilty server will gain sentience in... 3 years. :)
|

Callic Veratar
|
Posted - 2010.03.02 19:28:00 -
[6]
I hope it's not something as "simple" as a stack overflow. Too many connections and disconnections off the TCP stack, without prompt removal from the stack?
(Though, this was probably one of the first things you guys checked...)
|

Silly Petey
Minmatar The Fruit Flys
|
Posted - 2010.03.02 19:31:00 -
[7]
Lol
Ccp- that stuff you sold us keeps breaking Vendor- let me see the logs Ccp -errr they show nothing. Vendor- sorry for your loss. We hope you get back on your feet soon
|

Gawain Hill
|
Posted - 2010.03.02 19:45:00 -
[8]
ok so pick the day it happens most on then just do what needs to be done for that one day and let people moan the game becomes a bit laggy but if it stops the game falling over then all is well and we have 1 day of lag instead of a million days of the game falling over.
/me wonders if he's the only one who wouldn't object to that
|

Lobster Man
Metafarmers
|
Posted - 2010.03.02 19:51:00 -
[9]
Edited by: Lobster Man on 02/03/2010 19:52:12
Quote: "...our logs, surprisingly, showed nothing."
Best line in any devblog, ever....confirming everyone's suspicions, once and for all 
|

Regat Kozovv
Caldari Alcothology
|
Posted - 2010.03.02 20:00:00 -
[10]
Thanks for the update Valar. Good luck with the troubleshooting.
|

Mei Tzu
|
Posted - 2010.03.02 20:11:00 -
[11]
Originally by: Silly Petey Lol
Ccp- that stuff you sold us keeps breaking Vendor- let me see the logs Ccp -errr they show nothing. Vendor- sorry for your loss. We hope you get back on your feet soon
You mean it's "working as intended"?
|

Dante Edmundo
|
Posted - 2010.03.02 20:17:00 -
[12]
I thought database software was specifically designed to handle race conditions? What a sucky database.
|

Manfred Rickenbocker
|
Posted - 2010.03.02 20:21:00 -
[13]
Originally by: Hud Bannon Ahh...The growing pains of interaction between tons of hugely complex systems. EVE takes on a double entendre when you call it the Sand box. :)
Every sandbox gets a few turds in it from wandering animals now and then. The best you can do is sift the sand as best you can or call your sand provider for more sand. ------------------------ Peace through superior firepower: a guiding principle for uncertain times. |

Niccolado Starwalker
Gallente Shadow Templars
|
Posted - 2010.03.02 20:29:00 -
[14]
Im sure you guys will fix it! You allways do in the end!
But this line made me laugh, even in all its seriousness:
Originally by: CCP Valar ", opened up a support case with the vendor regarding the incident, since our logs, surprisingly, showed nothing"

Anyway, happy bughunting!
Originally by: Dianabolic Your tears are absolutely divine, like a fine fine wine, rolling down your cheeks until they flow down the river of LOL.
|

Ender Flagrante
Gallente The Scope
|
Posted - 2010.03.02 20:34:00 -
[15]
In before some idiot suggests switching to MySQL.
|
|

Chribba
Otherworld Enterprises Otherworld Empire
|
Posted - 2010.03.02 20:39:00 -
[16]
Switch to Chribsql? Any error will be replaced with a unit of Veldspar. Your choice for the optimal Veldsparish look!
Secure 3rd party service |
|

Fearless M0F0
Blue Republic
|
Posted - 2010.03.02 20:41:00 -
[17]
Just wondering how much money has CCP wasted debugging and developing workarounds so the sql engine they bought does what it is supposed to do out of the box 
Now what would have happened if they instead took all that time and money and migrated to Oracle...
SQL server has become a fine product but it seems Microsoft is way over their heads supporting it for the Eve cluster and are likely using CCP as a guinea pig for stuff Oracle and the other REAL database engines have been handling for years.
EVE II "Dominion" - The Return of teh LAG |

Tsabrock
Gallente Circle of Friends
|
Posted - 2010.03.02 20:46:00 -
[18]
A very interesting post. Upon reading the link for "race condition" I learned that this same phenomena was what was ultimately responsible for the North American Blackout in 2003 (which I narrowly missed being affected by).
From some of my own programming experience, such problems can be agonizingly difficult to track-down. --- I don't read the forums all the time here - if you read something here and want to respond to me directly, EVE-Mail me, and I'll eventually read it. |

Chainsaw Plankton
IDLE GUNS IDLE EMPIRE
|
Posted - 2010.03.02 20:57:00 -
[19]
Originally by: Frug In before "I'm a paying customer and this is unacceptable and I'm canceling all 50 of my accounts FUUUUUU"
I canceled all my accounts once, stupid creditcard expiring 
|

CODE RED
Caldari Black Nova Corp IT Alliance
|
Posted - 2010.03.02 20:58:00 -
[20]
If we had a vendor like this I'd have canned them ages ago and castrated their business until they closed shop........then again thats how I deal with things ;p
Good luck guys I know the pain you are in, been there, done that, got the crappy Tshirt ;p _________________________________________ Kryo "CODE RED" Dracon
|

Hienz Doofenshmirtz
|
Posted - 2010.03.02 20:58:00 -
[21]
race conditions are fun. glad you guys are working with the vendor, so they can improve their product in the future.
|

Gimme Sugar
|
Posted - 2010.03.02 21:14:00 -
[22]
Time to switch from SQL Server (Sybase) to Oracle RDBMS!
|

Lord Helghast
|
Posted - 2010.03.02 21:17:00 -
[23]
Theirs nothing wrong with SQL Server x64 its wicked we use it in production, but everything has bugs, look at mysql's changelogs between versions its not 1 or 2 things that get fixed, oracles great as well but no database platform is perfect, they all have their bugs and snafus
the fact that the vendor and ccp are working together just goes to show how nice it is to have a a good support back bone, half the problem of why it isnt fixed so far is because its a high end running production environment that they cant really do much to trouble shoot it, im sure if they cud get it to happen on sisi this would have been fixed in december.
|

Profian
Amarr Blood Bringers
|
Posted - 2010.03.02 21:31:00 -
[24]
Originally by: Chribba Edited by: Chribba on 02/03/2010 20:51:03 Switch to Chribsql? Any error will be replaced with a unit of Veldspar. Your choice for the optimal Veldsparish look!
Cheers Chribba, I read this while drinking and almost choked.....
ChribSQL, I'd buy that 
|

Mara Sci
|
Posted - 2010.03.02 21:46:00 -
[25]
So, just out of curiosity, isn't the entire point of failover so that you can *keep* running in the event of a problem? Failover shouldn't cause the need to reboot. I would say that is just fail and not really failover.
But then what do I know, I only manage small databases... |

Xyfu
Minmatar Brutor tribe
|
Posted - 2010.03.02 21:51:00 -
[26]
OMG LIEK WHAI AREN'T YOO MAKING SHIPS LOOK BETTUR.
(See what I did there? I flipped it. Daaamn.) _____ ^ That is a sig line. It should be there without me having to put one in. |

Ulair Memmet
ORIGIN SYSTEMS Shadows of Light
|
Posted - 2010.03.02 22:03:00 -
[27]
We have all reason to be ****ed about the issues, but what of it.
It's better for all of us if i just cheer you up and say keep it on You can do it
|

Jason Edwards
Internet Tough Guy Spreadsheets Online
|
Posted - 2010.03.02 22:13:00 -
[28]
I see alot of "our vendor"
microsoft? cisco?
Im impressed you guys tracked down a race condition in tcp stack. That surely came down to a "Once you have eliminated the impossible, whatever remains, however improbable, must be the truth." ------------------------ To make a megathron from scratch, you must first invent the eve universe. ------------------------ Life sucks and then you get podded. |

Daminma2
Perkone
|
Posted - 2010.03.02 22:32:00 -
[29]
Didn't the article say the race condition is suspected to be in the TCP stack? Also, didn't CCP go "stackless" on the TCP stack recently? Why are people blaming SQL?
|

Kinomoto Sakura
FW Scuad E C L I P S E
|
Posted - 2010.03.02 22:51:00 -
[30]
Time to get another vendor ihmo ----------------
- Is that a Shield Booster on your Omen? DesuSigs |

Mithfindel
Aseyakone
|
Posted - 2010.03.02 22:57:00 -
[31]
Edited by: Mithfindel on 02/03/2010 22:57:56
Originally by: Jason Edwards I see alot of "our vendor"
microsoft? cisco?
IIRC, Microsoft SQL Server 2008.
Edit: Time to don the asbestos suit, I guess.
|

Camios
Minmatar Insurgent New Eden Tribe Systematic-Chaos
|
Posted - 2010.03.02 23:01:00 -
[32]
As far as I can remember, CCP relied on Microsoft for MSSQL support. It sounded a good strategy, Microsoft would take of their product and help CCP that would only think about their game.
Now it's a bit strange, it's CCP helping microsoft.
BECAUSE OF MICROSOFT
|

Aalaria Black
Rogen's Heroes
|
Posted - 2010.03.02 23:18:00 -
[33]
Alot of people are pointing fingers at the database software when it could easily be the high availability software (like ServiceGuard on HPUX) that is falsely detecting a condition that causes it to initiate the fail over to the other cluster. Maybe a clock got off by a few seconds, maybe a cluster member failed to respond to a health check ping due to network issues ... there could be quite a number of reasons why a monitor decides to kick off a fail over sequence.
food for thought.
|

Vile rat
GoonWaffe SOLODRAKBANSOLODRAKBANSO
|
Posted - 2010.03.02 23:24:00 -
[34]
(6:22:33 PM) interval: I wish the vendor would have said that they can't help them, but they are sorry for the inconvenience (6:22:53 PM) vile_rat: no you see ccp are paying customers so
|

Lialem
|
Posted - 2010.03.02 23:26:00 -
[35]
http://www-01.ibm.com/software/data/db2/
|

Estamel Tarchon
|
Posted - 2010.03.02 23:32:00 -
[36]
Edited by: Estamel Tarchon on 02/03/2010 23:32:02 Because im a little interested in this (im a student), does your sql errors have anything to do with this? http://en.wikipedia.org/wiki/Isolation_level
|

Daminma2
Perkone
|
Posted - 2010.03.02 23:36:00 -
[37]
Originally by: Estamel Tarchon Edited by: Estamel Tarchon on 02/03/2010 23:32:02 Because im a little interested in this (im a student), does your sql errors have anything to do with this? http://en.wikipedia.org/wiki/Isolation_level
Quote:
Almost all of those were due to a bug in the networking subsystem that causes the SQL Server to fail over.
Doesn't seem like it. It's not even clear if the error is occurring on the database process at all.
|

Kerfira
Audaces Fortuna Iuvat
|
Posted - 2010.03.03 00:07:00 -
[38]
Originally by: Gimme Sugar Time to switch from SQL Server (Sybase) to Oracle RDBMS!
Why the hell would they switch from one monolithic database to another? They all have the exact same weaknesses.
If implementing something like EVE again, they should use a telco grade distributed X.500 directory server. Much faster and almost infinite data expansion capability (just add more server nodes).
Originally by: CCP Wrangler EVE isn't designed to just look like a cold, dark and harsh world, it's designed to be a cold, dark and harsh world.
|

Gerard Deneth
Caldari Pavlov Labs GmBH Independent Faction
|
Posted - 2010.03.03 01:26:00 -
[39]
Is there any possibility we might be looking at some more pervasive error in the system that is both causing this and under heavier load conditions some of the "lag" that's been seen in high number 0.0 combat?
---------------------------- The Game's always changing under your feet; don't start moaning when you get a toe caught in the gears. |

Hack Harrison
Caldari
|
Posted - 2010.03.03 02:35:00 -
[40]
Originally by: Estamel Tarchon Edited by: Estamel Tarchon on 02/03/2010 23:32:02 Because im a little interested in this (im a student), does your sql errors have anything to do with this? http://en.wikipedia.org/wiki/Isolation_level
No - Isolation level is something that effects transactions - what data you can see/update when another transaction is being applied etc... The result is that a transaction cannot complete due to another one, resulting in either the transaction blocking while the other one completes or a deadlock occuring if the same resources have to be accessed, but have been done so in a different order, so neither can complete.
The issue highlighted in the blog has nothing to do with transaction processing and is related to communication issues - too many database connections doing something to trigger the race condition.
|

Sinnbad Mayhem
|
Posted - 2010.03.03 03:52:00 -
[41]
This is one heck of a setup you have, I am sure you will find a solution. You better, losing T2 ships to lag sucks! 
But I gotta say something about the Race condition answer - BULL****
I had to get that off my chest. Good luck gents.  S&M |

HeliosGal
Caldari
|
Posted - 2010.03.03 06:24:00 -
[42]
its complex but im sure their diagnostic tools can work it out Signature - CCP what this game needs is more variance in PVE aspects and a little bit less PVP focus, more content more varied level 1-4 missions more than just 10 per faction high sec low sec and 00 |

Miraqu
Caldari
|
Posted - 2010.03.03 06:40:00 -
[43]
In the end, CCP will go the way the NYSE and the LSE went with their servers, vendors and used products.
By mid to end 2011 you will have to admit that your vendor microsoft can't deliver and you will finally look for a solution that works AND scales.
|

Charles Javeroux
Gallente INTERSTELLAR CREDIT
|
Posted - 2010.03.03 06:45:00 -
[44]
Good work, CCP!
Originally by: Orek Fear I guess the ultimate solution to inflation in EVE turned out to be an NPC stripper...
|

Frank Lonehorn
Gallente Lonehorn's Astral Mining Group
|
Posted - 2010.03.03 06:58:00 -
[45]
I think its the solid state drives on the database. and why is thier no catch on the thing so if u get a race, it will catch it for a sec?
|

Jei'son Bladesmith
The Storm Knights The Cool Kids Club
|
Posted - 2010.03.03 08:50:00 -
[46]
I blame rap music
|

Snorre Sturlasson
|
Posted - 2010.03.03 08:55:00 -
[47]
According to the blog, it's not the server itself but the underlying OS called windows. Maybe shifting to windows wasn't that good?
|

Freedom Netas
|
Posted - 2010.03.03 09:52:00 -
[48]
Originally by: Ender Flagrante In before some idiot suggests switching to MySQL.
In after some idiot defends MSSQL.
|

Typhado3
Minmatar
|
Posted - 2010.03.03 10:50:00 -
[49]
Quote: since our logs, surprisingly, showed nothing
boost logs 
good luck, I don't envy your bughunters right now... or ever actually bughunting can be a b**** ------------------------------ God is an afk cloaker |

Grez
Fairlight Corp Rooks and Kings
|
Posted - 2010.03.03 11:17:00 -
[50]
FYI, Oracle and MySQL would be a terrible switch. MSSQL is perfect for what they need it to do, you'd probably see a performance decrease on this level of transactions when switching to Oracle, and MySQL still has data integrity issues. ---
|

Camios
Minmatar Insurgent New Eden Tribe Systematic-Chaos
|
Posted - 2010.03.03 11:35:00 -
[51]
Time to change the cartridge on the Helmoltz coil of your planck bubble stabilizer, tbh
|

Pilk
Mother Lovers
|
Posted - 2010.03.03 12:57:00 -
[52]
Edited by: Pilk on 03/03/2010 13:00:35
Originally by: Kerfira
Originally by: Gimme Sugar Time to switch from SQL Server (Sybase) to Oracle RDBMS!
Why the hell would they switch from one monolithic database to another? They all have the exact same weaknesses.
If implementing something like EVE again, they should use a telco grade distributed X.500 directory server. Much faster and almost infinite data expansion capability (just add more server nodes).
Cross-node transactions are expensive in that sort of system. To the degree that EVE does them, which is a *lot*, you'd end up right back where you started, with a single, monolithic server, unless you'd rather experience crippling performance problems.
DB2, on the other hand, would be fantastic for this sort of thing (massive, transactional DB full of inter-related data). Freddie Mac uses it to keep track of a few trillion dollars flowing between all of the banks it interacts with, itself, and the Federal Reserve. But I'm sure there's a downside of which I'm not aware; keep in mind, IBM is CCP's hardware vendor, so it's not like they've never been introduced to (and presumably rejected) the capabilities of DB2.
Originally by: Grez FYI, Oracle and MySQL would be a terrible switch. MSSQL is perfect for what they need it to do, you'd probably see a performance decrease on this level of transactions when switching to Oracle, and MySQL still has data integrity issues.
If you are having data integrity issues with MySQL, it's because you're using the wrong DB engine. MyISAM has been obsolete for years. InnoDB is fully-ACID. Just set "default_storage_engine=InnoDB" in your my.cnf.
--P
Kosh: The avalanche has already started. It is too late for the pebbles to vote. Tyrrax's bet status: PAID! |

Alex V0X2
Minmatar
|
Posted - 2010.03.03 14:32:00 -
[53]
We didn't want that server anyway.
|

Pesets
The Hunt Club
|
Posted - 2010.03.03 14:40:00 -
[54]
Originally by: Callic Veratar I hope it's not something as "simple" as a stack overflow. Too many connections and disconnections off the TCP stack, without prompt removal from the stack?
(Though, this was probably one of the first things you guys checked...)

"Stack overflow" refers to overflow of the call stack, a structure that stores the addresses where program should pick up from once the current procedure finishes. Stack overflow happens when too many procedures are called from within each other, and there is no more memory to put the next call's return address on top of the stack.
"TCP/IP stack" is a stack of abstraction layers, of which there is a definite number, and there is no way for it to overflow. "Connection stack" also cannot overflow because connections are not stored in a stack.
On a related note, this forum needs a facepalm smiley.
|

Ga'len
Hellhounds. HellFleet
|
Posted - 2010.03.03 15:01:00 -
[55]
Originally by: CCP Fallout As many of you have notice, Tranquility has been less than tranquil of late. CCP Valar fills us in on the progress being made towards keeping Tranquility well-behaved in his newest dev blog.
Good work guys, keep these technical updates coming. You have a great number of us techies that work with these technologies on a daily basis. Providing this information helps us to provide suggestions to you and your peers who are reaching out to the general techie community to resolve these issues.
Keep it up, we'll help you work through this stuff!
|

Shepard Book
Imperial Academy
|
Posted - 2010.03.03 15:06:00 -
[56]
Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely.
I am not sure what you mean by sessions. Uncool if those sessions are customer clients. I have noticed an increasing number of discos during auto pilot in empire over past couple of months. I hope it is something else.
|

Silicon Buddha
Amarr Agony Unleashed Agony Empire
|
Posted - 2010.03.03 15:06:00 -
[57]
There is a better option to the traditional SQL clustering that has a failover and giveback type scenerio (and all the resulting issues).
There is a hardware vendor called Stratus that provides a hardware platform that promises (and delivers) very high availability. Essentially what happens is that there are 2 servers in a single stratus implementation. These servers are kept "lockstep" with each other from a CPU/Memory and Disk perspective so that if there is a failure on one slice of the server that it immediately and without missing a beat, fails over to the other slice.
We have very highly available MS SQL servers at my job which essentially requires 7x24 uptime.
During patch windows (applying patches to the Windows OS), we are able to break the mirroring, patch one side of the server, reboot it and make sure it comes up correctly, and then make the patched side active with no real impact to the systems or users. Essentially all we need to do is a quick restart of the MSSQL service (which the applications hardly notice).
We have this tied to our back-end SAN as well (as who in their reasonable mind would ever use local disk).
Feel free to contact me offline if you'd like more info on Highly available scenarios.
A concerned eve citizen _________________________________________________________ Click here for Fly Reckless Podcast
|

Louis deGuerre
Gallente Amicus Morte Void Alliance
|
Posted - 2010.03.03 15:08:00 -
[58]
Great job of you guys keeping the peons informed. Much appreciated. 
Sol: A microwarp drive? In a battleship? Are you insane? They arenÆt built for this! Clear Skies - The Movie
|

Garr Anders
Minmatar Thukk U
|
Posted - 2010.03.03 16:06:00 -
[59]
Originally by: Silly Petey Lol
Ccp- that stuff you sold us keeps breaking Vendor- let me see the logs Ccp -errr they show nothing. Vendor- sorry for your loss. We hope you get back on your feet soon
Best response ever ! ----- Garr Anders
"The only winning move is not to play" is about the best damn advice anyone can get regarding arguing over the internet. - referring to the Movie WarGames 1983
|

Exie
Endless Possibilities Inc. Ushra'Khan
|
Posted - 2010.03.03 16:20:00 -
[60]
Edited by: Exie on 03/03/2010 16:20:35
Originally by: Garr Anders
Originally by: Silly Petey Lol
Ccp- that stuff you sold us keeps breaking Vendor- let me see the logs Ccp -errr they show nothing. Vendor- sorry for your loss. We hope you get back on your feet soon
Best response ever !
This
E...
We be Jammin' |

Ehranavaar
|
Posted - 2010.03.03 18:38:00 -
[61]
Originally by: Mei Tzu
Originally by: Silly Petey Lol
Ccp- that stuff you sold us keeps breaking Vendor- let me see the logs Ccp -errr they show nothing. Vendor- sorry for your loss. We hope you get back on your feet soon
You mean it's "working as intended"?
how do you folks at ccp avoid the temptation to smite people like this?
|

Kweel Nakashyn
shadow and cloaking Yggdrasill.
|
Posted - 2010.03.03 18:51:00 -
[62]
If it's a race condition problem, you have multiple solution especially using : - input buffers in the RAM : Your f(a,/b) transforms into set c = /b; f(a,c). - Or transcode tables in RAM also : similar to switch case a,b when (a,b) : set c to ... when (a,/b) : set c to ... when (/a,b) : set c to ... when (/a,/b) : set c to ... end switch case
if c then ... else ...
I'm not sure if there is any other alternative about that problem. RAM. ~ OSEF |

Kweel Nakashyn
shadow and cloaking Yggdrasill.
|
Posted - 2010.03.03 19:10:00 -
[63]
Originally by: Tsabrock From some of my own programming experience, such problems can be agonizingly difficult to track-down.
You know why ? Because most pgm can't indend their code and write proper boolean equations using minterm. (I put some _ instead of spaces because this forum don't allow multiple spaces)
IF_____a ___AND_b
IF_____NOT_a ___AND_____b
IF_________(a_____OR_____b) ___AND_____(c_____OR_NOT_d) ___AND_NOT_(NOT_e_OR_NOT_f)
Oooooh Holy Batman, I can count the minimal cycles now, and instantly know the result of the equation. ~ OSEF |

Kweel Nakashyn
shadow and cloaking Yggdrasill.
|
Posted - 2010.03.03 19:13:00 -
[64]
Originally by: Grez FYI, Oracle and MySQL would be a terrible switch. MSSQL is perfect for what they need it to do, you'd probably see a performance decrease on this level of transactions when switching to Oracle, and MySQL still has data integrity issues.
DB2 ftw \o/  ~ OSEF |

Paknac Queltel
Caldari Provisions
|
Posted - 2010.03.03 20:41:00 -
[65]
Originally by: Kweel Nakashyn Edited by: Kweel Nakashyn on 03/03/2010 19:02:19 Edited by: Kweel Nakashyn on 03/03/2010 19:00:43 Edited by: Kweel Nakashyn on 03/03/2010 18:59:18 If it's a race condition problem, you have multiple solution especially using : - input buffers in the RAM : Your f(a,/b) transforms into set c = /b; f(a,c);
- Or transcode tables in RAM also : similar to
switch case a,b when (a,b) : set c to ... when (a,/b) : set c to ... when (/a,b) : set c to ... when (/a,/b) : set c to ... end switch case
if c then ... else ...
I'm not sure if there is any other alternative about that problem : RAM.
And yes, you lost one cycle at least (from the not) + cycles from i/o acess to the ram. But you probably better have few lost cycle than ****ty answers.
I'm sorry for ppl around but if this wasn't documented and/or tested by the electronic manufactorer, they are newbies. The programmer *could* have seen it, if he came from electronics and drunk no beers, but as a formre engineer in electronics, this is quite newbish to me. :) Why they didn't hire me in electronics you ask ? I live in ****ing France. After my degree I became a barman :D
WAT.
This crap is happening at least 2 levels of abstraction above where you're thinking it's happening. As such, electronics knowledge is largely irrelevant, and the 'electronic manufactorer' has nothing to do with this. This is all happening in software, not in hardware.
To CCP: If I wake up screaming from a nightmare about debugging this kind of crap, it's all your fault. I hope you can live with nearly making a grown man weep. The horror, the horror... |

Xikorita
Mob Thought Phalanx Alliance
|
Posted - 2010.03.03 22:43:00 -
[66]
"Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely."
So this is the reason that logged off pilots stay in space? Am I not safe anymore if I log?
|

Night Doc
|
Posted - 2010.03.03 23:09:00 -
[67]
this looks like the nail in the haystack problem
very very hard to find, and a very very simple solution
- Fit EVE to screen |

Joe Censored
Unknown-Entity Black Star Alliance
|
Posted - 2010.03.03 23:22:00 -
[68]
Edited by: Joe Censored on 03/03/2010 23:24:22
Originally by: Mithfindel
Originally by: Jason Edwards I see alot of "our vendor"
microsoft? cisco?
IIRC, Microsoft SQL Server 2008.
Ah this is disappointing. Use of an open source base OS for the SQL server would allow internal CCP kernel devs to quickly spot this type of race condition and implement a fix without the need for consultation with any outside vendor. Proprietary OS's like Windows or Mac OS always mean they have you by the balls when or even if they find your problem important enough for them to fix.
(put several printk's in the TCP stack code at various places, and wait for the race condition and see what was last output... done you found the location of the issue, or at least how I would find it)
|

Xavin Nydek
Ars ex Discordia
|
Posted - 2010.03.03 23:47:00 -
[69]
Originally by: Xikorita "Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely."
So this is the reason that logged off pilots stay in space? Am I not safe anymore if I log?
No, they are talking about database connection sessions, not game sessions.
I'm always amazed at the number of arrogant people who read about a complex problem like this, then post something like "well, if you idiots would just do this, it wouldn't be a problem." If they have been working with MS for months on this, I can assure you that there's not a simple or easy answer. Neither CCP nor MS are incompetent.
It's also laughable that people are suggesting changing database systems. No thanks, I would rather have new features and fixed bugs than have everything stall for a year while they change the database, then end up with an entirely new set of issues that may or may not be worse than what we have now.
|

Skyrape
|
Posted - 2010.03.03 23:55:00 -
[70]
@CCP that's what you get when you use fail microsoft stuff! better switch to a real DBMS and a REAL server OS, cause honestly, I can't imagine how you guys survived so far, probably just throwing money at it I guess.
And get some REAL technical support as well, cause what you run now is again FAIL
|

Pilk
Mother Lovers
|
Posted - 2010.03.04 01:07:00 -
[71]
Originally by: Joe Censored Edited by: Joe Censored on 03/03/2010 23:24:22
Originally by: Mithfindel
Originally by: Jason Edwards I see alot of "our vendor"
microsoft? cisco?
IIRC, Microsoft SQL Server 2008.
Ah this is disappointing. Use of an open source base OS for the SQL server would allow internal CCP kernel devs to quickly spot this type of race condition and implement a fix without the need for consultation with any outside vendor. Proprietary OS's like Windows or Mac OS always mean they have you by the balls when or even if they find your problem important enough for them to fix.
(put several printk's in the TCP stack code at various places, and wait for the race condition and see what was last output... done you found the location of the issue, or at least how I would find it)
While I agree with you in principle, your point is misdirected. The problem is not in the TCP stack implementation in their OS, but in some combination of the way the DB server and clients cooperate on collection pooling and their monitoring for conditions necessitating failover. The best they could hope for as far as the type of debugging you propose is to run a sniffer on the segment in question, but the problem is that it's not easy to reproduce, so they'd be logging terabytes of data at minimum, if it's even possible for them to sniff and log on that segment in the first place.
Or, to put it in a slightly-simpler way: the problem is at Layer 5 or above (i.e., the app's handling of the TCP sessions), not 3/4.
One suggestion, if it it possible to do network-level logging: even a simple ring buffer approach would give you more than enough data, if it's tied into the failover system. Set it to log the last (amount of RAM minus 1GB or so), and add a trigger to your monitoring/failover system that turns off the logger (or have the logger notice the failover somehow).
--P
Kosh: The avalanche has already started. It is too late for the pebbles to vote. Tyrrax's bet status: PAID! |

Hack Harrison
Caldari
|
Posted - 2010.03.04 01:20:00 -
[72]
ROFL - Why is it that dumb ass users with no database knowledge see the term session and assume it pertains to their GAME session.
For those that don't know - a database session is the name given to a client connecting to the server for a period of time (i.e. a session). As there are often high overheads associated with setting up and closing a session, compared to issueing a transaction (i.e. an update of some data), sessions are often pooled and reused rather then dropping them and recreating them.
As such, the issue described here is in regards to what they said - TQ rebooting due to failover. This is a DIFFERENT issue to people getting lagged out in a large battle, which is due to the inability of the EVE (distributed) application to scale to battles of that size based on the current code base and server infrastructure.
|

Bomberlocks
Minmatar Star Bombers
|
Posted - 2010.03.04 02:00:00 -
[73]
Originally by: CCP Fallout As many of you have notice, Tranquility has been less than tranquil of late. CCP Valar fills us in on the progress being made towards keeping Tranquility well-behaved in his newest dev blog.
Thanks for the explanation. I appreciate it.
Now, I have a copy of MS Access that you guys are welcome to use until you get the SQL Server problems sorted out.
|

Draco Argen
|
Posted - 2010.03.04 02:28:00 -
[74]
It's only just occurred to me. The sand box is where my cat goes to... well anyway, its not as fun to play in as it was 
Perhaps Eve is suffering the same problem :)
|

Bomberlocks
Minmatar Star Bombers
|
Posted - 2010.03.04 02:48:00 -
[75]
Originally by: CCP Valar We know that problem lies in the TCP stack and likely has something to do with handling of closed or closing sockets... and Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely
This is very naive and simplistic of me, but if the race condition is caused by the interaction between the way the TCP stack handles opening and closing sockets and db session pooling, and since turning off session pooling seems to solve the problem (but cause another due to increased overhead) doesn't it stand to reason that that tuning the size of the session pool might help solve the problem?
If the session pool is tuned proportionally to the number of open socket connections, accepting lag at higher connection numbers with increased reliability, perhaps you could keep the system running, albeit more slowly?
|

Ender Flagrante
Gallente The Scope
|
Posted - 2010.03.04 04:41:00 -
[76]
Originally by: Freedom Netas
Originally by: Ender Flagrante In before some idiot suggests switching to MySQL.
In after some idiot defends MSSQL.
I can only assume that you were referring to someone else since I made no mention of MSSQL.
|

Fade Toblack
|
Posted - 2010.03.04 09:47:00 -
[77]
Originally by: Xavin Nydek MS are incompetent.
There fixed that for you.
CCP have had somebody working full-time for 3 months on this problem? What's the TCO on that DB server now? Reckon you could've fixed it by now yourselves if you had something that came with source code?
Also why has Microsoft suddenly become "the vendor" when talking about problems with their software?
|

Achura chick
|
Posted - 2010.03.04 11:43:00 -
[78]
know i left a continum transfuctioner somewhere.. you think it would work ?
|

Acrid Acid
|
Posted - 2010.03.04 14:55:00 -
[79]
Tha logz, D show Not THING!
...oh wait, I have been beated to it.
|

Garia666
Amarr T.H.U.G L.I.F.E
|
Posted - 2010.03.04 15:14:00 -
[80]
Your logs never show anything.. tell us something we dont know www.garia.net |

Arec Bardwin
|
Posted - 2010.03.04 15:24:00 -
[81]
Originally by: CCP Fallout ... Tranquility has been less than tranquil of late.
People have started to enable sound???? 
|

Hawk TT
Caldari Bulgarian Experienced Crackers Circle-Of-Two
|
Posted - 2010.03.04 17:21:00 -
[82]
Oracle, Oracle, Oracle - LOFL...
To anyone suggesting that Oracle is a good idea - THIS IS THE MOST BUGGY RDBMS EVER CREATED! I have quite extensive first-hand experience with complex and large Oralce deployments (multi TB databases, geographically dispersed RACs etc.). It's a nightmare - thousands of bugs, poor support and patching (unless you are NASA or the Pentagon) etc. I would definately agree that Oracle has serious advantages over M$ SQL Server (scalability, high-availability, performance etc.), but it is very, very BUGGY. Once you have a stable installation that works for your particular application it is not very wise to patch it. The problem is that EVE is an constantly evolving application and this approach does not really work... ___________________________________ Science & Diplomacy Manager @ BECKS Circle-of-Two |

Brokers Clone
|
Posted - 2010.03.04 19:02:00 -
[83]
Originally by: Night Doc this looks like the nail in the haystack problem
very very hard to find, and a very very simple solution
well, in this case, the nail appears to be non ferrous, so the magnet trick didn't work.....
|

Eint Truzenzuzex
|
Posted - 2010.03.04 22:04:00 -
[84]
Let's face it.
CCP good job, and seek the bug. I mean ever single winterexpesion i can remember, had a bug / issuse / problem.
Who can remeber: - the sov. 4 relatet crashes, - the access violation if a dread goes into siege mode (BoD), - the forgotten pos.fixit file, - the show info bug, - the boot.ini, - the missing texture files and some/most ship where simply blue/pink,
i mean that is the spice in eve, ccp will fix it even it take a few year's.
|

Zahira Wrath
|
Posted - 2010.03.04 22:30:00 -
[85]
Edited by: Zahira Wrath on 04/03/2010 22:30:42
Originally by: Joe Censored
Ah this is disappointing. Use of an open source base OS for the SQL server would allow internal CCP kernel devs to quickly spot this type of race condition and implement a fix without the need for consultation with any outside vendor. Proprietary OS's like Windows or Mac OS always mean they have you by the balls when or even if they find your problem important enough for them to fix.
From 1st page:
Originally by: Ender Flagrante In before some idiot suggests switching to MySQL.
To CCP: Good work, thanks for keeping us in the loop.
|

Maestro Del'Tirith
Space Exploration Forward Motion Industries
|
Posted - 2010.03.05 03:14:00 -
[86]
Edited by: Maestro Del''Tirith on 05/03/2010 03:18:51 There's an obvious answer - stop buying databases from OS vendors and go buy a real one that actually scales.
I'm being facetious of course - switch RDBMS at this stage would obviously be utterly ridiculous.
One could only wish that SQL Server had a real solution for scaling via parallel processing. Unfortunately, as much as it is derided above, Oracle is the only DB on the market (well, standard RDBMS...obviously Hadoop style stuff isn't a question) that provides this with RAC. DB2, MySQL, Postgresql et al all have hacked in awful algorithms. But then, for a game like this, I can only imagine how absolutely impossible the Oracle licensing structure would be.
Hope MS can help ya'll out...make sure you are punching hard and get your system into a lab with them. I can only assume the 'replication' option being discussed has to be something like Oracle's Real Application Testing framework. Best of luck. -------------
Looking for a mature group to play with? Recruitment Thread |

Percy Soars
|
Posted - 2010.03.05 10:45:00 -
[87]
At least the intervals between failures is getting longer. 
2010.02.28 23:35:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1277148 2010.02.20 13:54:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1272385 2010.02.10 16:07:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1266874 2010.02.09 12:45:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1266165 2010.01.28 19:54:00 Emergency Tranquility Downtime POS Problem http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1259230 2010.01.26 18:33:00 Dominion 1.1.1 Deployment http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1257997 2010.01.24 20:59:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1256814 2010.01.23 13:56:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1255969 2010.01.22 19:43:00 down due to a database issue http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1255488 2010.01.21 13:26:00 Dominion 1.1 Installed http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1254549 2010.01.20 13:22:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1254036 2010.01.14 12:29:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1250631 2010.01.13 09:28:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1250013 2009.12.28 03:08:00 http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1240749 2009.12.18 11:58:00 Extended downtime http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1235167 2009.12.17 15:29:00 SQL issues http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1234721 2009.12.15 11:58:00 Dominion 1.0.3 deployment http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1233357 2009.12.09 17:35:00 network failure on a third party network http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1229802 2009.12.09 02:58:00 hard crash in the SQL Server engine http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1229439 2009.12.05 21:43:00 hard crash in the SQL Server engine http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1227095 2009.12.02 20:43:00 Emergency reboot http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1224770
|

Marchocias
Silent Ninja's
|
Posted - 2010.03.05 18:53:00 -
[88]
If you suggested switching DB, then you're insane... it would introduce a whole raft of different bugs, likely to be just as inconvenient as our current ones. Anyone who suggests switching RDBMS at this stage clearly either:
a) doesn't work in the industry, thus has an irrelevant opinion
b) does work in the industry, but really REALLY shouldn't.
All RDBMSs have advantages and disadvantages. I have worked with both MSSQL and Oracle for a long time... I would say they both have thier pros and cons. Anyone who praises one, and slams the other is spouting nonsense - it only make sense within the context of the applications the DB is being used for.
---- I belong to Silent Ninja (Hopefully that should cover it). |

Zenst
Aliastra
|
Posted - 2010.03.05 22:36:00 -
[89]
KARMA 
So its taken you this long to admit to us this. Read what people are saying, there are alot of good DBA's who play this game and i'm sure they could help, to have such a problem go on for so long is amazing for a production system, truely. What type of support you got with your vendor as I would of expected them to of at least had somebody working with you a pretty darn solid to of nailed this. Might be that you have the classic two vendor problem with the problem being in the middle of the two products talking. But reading what you have said I would of expected that type of feedback within a week, not months down the line. you still appear a bit lost as to what the actual problem is, given your trying kerplunk type fix's.
Anyhow, keep us updated, more we know the more we can feedback and actualy think you care instead of waiting months to tell us you still havn't fixed what you broke.
I must admit when you initialy started using M$'s database I was like eeep, I would of gone db2 or oracle, but I guess its a bit too far down the line in many respects now to change things.
So, get in somebody who knows how to troubleshoot this type of stuff and can argue the toss with vendors who will bounce you about saying its the network and the network vendor going its the database.
So what you calling this issue - eve2k .
If you havn't done a conference call between the network vendor and the database vendor and ad in your hardware vendor then I suggest you do so and thrash it out as saying we plan to try this and that is like a GM telling me to reinstall my client only to have the same problem. Bottom line its your problem, you can blame who you like but the eve players have there contracts with you not some `vendor`.
Perhaps there was a microsoft dba coder who lost a titan and got a poor response from ccp, I would imagine he would be laughing at all this now if he had to deal with your ticket. 
|

Lusulpher
Blackwater Syndicate Systematic-Chaos
|
Posted - 2010.03.06 10:39:00 -
[90]
Originally by: Zenst KARMA 
Perhaps there was a microsoft dba coder who lost a titan and got a poor response from ccp, I would imagine he would be laughing at all this now if he had to deal with your ticket. 
Words right out of my mouth!
The good old reimbursement response, but, TO CCP.
7 |

xXkynraXx
|
Posted - 2010.03.07 08:27:00 -
[91]
man this petition stuff is crap, i paid for game time on the third and actually logged in to my character, but ever since yesterday, every time i try to log in to my alternate characters account, i get this message: "unable to login, account disabled", i just friggin' paid for gametime on 03/03/10?...it still has not been taken care of, and i'm starting to get angry...
|

Darek Castigatus
Immortalis Inc. Shadow Cartel
|
Posted - 2010.03.07 14:57:00 -
[92]
Originally by: xXkynraXx man this petition stuff is crap, i paid for game time on the third and actually logged in to my character, but ever since yesterday, every time i try to log in to my alternate characters account, i get this message: "unable to login, account disabled", i just friggin' paid for gametime on 03/03/10?...it still has not been taken care of, and i'm starting to get angry...
And this has to do with the topic being discussed how exactly??
|

Hack Harrison
Caldari
|
Posted - 2010.03.08 06:30:00 -
[93]
Originally by: xXkynraXx man this petition stuff is crap, i paid for game time on the third and actually logged in to my character, but ever since yesterday, every time i try to log in to my alternate characters account, i get this message: "unable to login, account disabled", i just friggin' paid for gametime on 03/03/10?...it still has not been taken care of, and i'm starting to get angry...
Waaaaahhhhhh
|

Harris Dorn
|
Posted - 2010.03.11 05:09:00 -
[94]
Edited by: Harris Dorn on 11/03/2010 05:09:08 What you need to do is hire a team to setup a proper log that way you will have logs to show your mysterious vendor and you could look at the logs when we send petitions, see we are correct 90% of the time, laugh your butts off then tell us the logs show nothing.
|

Woody Hill
|
Posted - 2010.03.11 11:26:00 -
[95]
Hmmm,
CCP, have you tried turning it off and then turning it back on again.
Also sometimes my computer slows down when I open Firefox and SQL management studio at the same time. So try to stop people looking a you tube and the like, on your SQL server.
No charge for this valuable advice, but if you want to pop a Vindicator in my hangar we will call it quits.
kthxbai
|

Zenst
Aliastra
|
Posted - 2010.03.11 19:52:00 -
[96]
Edited by: Zenst on 11/03/2010 19:52:25 So has your `vendor` givin you a response like this yet:
Hi,
I have closed this petition as we can't give you any further advice on a fix for this. I hope you continue to enjoy EVE.
Best regards, GM stupidfraggle The EVE Online Customer Support Team

Seriously WTF, you break something then expect to palm people of with **** like that, you usless bunch of cnuts.
|

Lei Merdeau
Gallente
|
Posted - 2010.03.12 10:19:00 -
[97]
I've been having a ridiculous level of EVE freezing lately when most of the net keeps going. Luckily I just reset my ASDL2+ modem and unticked the Enforce MTU option (1492 bytes. Being over PPPoE I understand I should NOT do this. However, EVE seems to be much happier.
Mean anything?
|

Argo Pyxis
|
Posted - 2010.03.25 22:08:00 -
[98]
Originally by: Bomberlocks
Originally by: CCP Valar We know that problem lies in the TCP stack and likely has something to do with handling of closed or closing sockets... and Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely
This is very naive and simplistic of me, but if the race condition is caused by the interaction between the way the TCP stack handles opening and closing sockets and db session pooling, and since turning off session pooling seems to solve the problem (but cause another due to increased overhead) doesn't it stand to reason that that tuning the size of the session pool might help solve the problem?
If the session pool is tuned proportionally to the number of open socket connections, accepting lag at higher connection numbers with increased reliability, perhaps you could keep the system running, albeit more slowly?
While this might work, the performance trade-off would be *drastic*
TCP connection setup and teardown, regardless of OS, is a relatively expensive thing to do (and by expensive, I mean in terms of time, not CPU or other hardware resources)
The concept of Connection Pools is part of any RDBMS system today and is de rigueur when setting up a database and its consumers to perform under high transaction loads... so keeping pooling off is a night and day difference, and not for the better. Other parts of the game app (the app which runs on their servers and converses with the database(s)) might not be geared towards dealing with the increased query time which will result, very likely causing a domino effect which may manifest in more serious problems than just slowness seen by users.
/AP
|

Pricecheck sama
|
Posted - 2010.04.01 10:13:00 -
[99]
http://en.wikipedia.org/wiki/Karnaugh_map
love that wiki. CCP needs this !

|

Beltantis Torrence
Groovy Guns
|
Posted - 2010.04.19 21:41:00 -
[100]
Originally by: Kerfira
Originally by: Gimme Sugar Time to switch from SQL Server (Sybase) to Oracle RDBMS!
Why the hell would they switch from one monolithic database to another? They all have the exact same weaknesses.
If implementing something like EVE again, they should use a telco grade distributed X.500 directory server. Much faster and almost infinite data expansion capability (just add more server nodes).
For Eve? Eve is write heavy. A DS would be a terrible idea.
|

Beltantis Torrence
Groovy Guns
|
Posted - 2010.04.19 21:49:00 -
[101]
Originally by: Pilk
Originally by: Joe Censored Edited by: Joe Censored on 03/03/2010 23:24:22
Originally by: Mithfindel
Originally by: Jason Edwards I see alot of "our vendor"
microsoft? cisco?
IIRC, Microsoft SQL Server 2008.
Ah this is disappointing. Use of an open source base OS for the SQL server would allow internal CCP kernel devs to quickly spot this type of race condition and implement a fix without the need for consultation with any outside vendor. Proprietary OS's like Windows or Mac OS always mean they have you by the balls when or even if they find your problem important enough for them to fix.
(put several printk's in the TCP stack code at various places, and wait for the race condition and see what was last output... done you found the location of the issue, or at least how I would find it)
While I agree with you in principle, your point is misdirected. The problem is not in the TCP stack implementation in their OS, but in some combination of the way the DB server and clients cooperate on collection pooling and their monitoring for conditions necessitating failover. The best they could hope for as far as the type of debugging you propose is to run a sniffer on the segment in question, but the problem is that it's not easy to reproduce, so they'd be logging terabytes of data at minimum, if it's even possible for them to sniff and log on that segment in the first place.
Or, to put it in a slightly-simpler way: the problem is at Layer 5 or above (i.e., the app's handling of the TCP sessions), not 3/4.
One suggestion, if it it possible to do network-level logging: even a simple ring buffer approach would give you more than enough data, if it's tied into the failover system. Set it to log the last (amount of RAM minus 1GB or so), and add a trigger to your monitoring/failover system that turns off the logger (or have the logger notice the failover somehow).
--P
Or better yet just turn off automatic failover in favor of retrying since the failover isn't transparent anyway.
|
| |
|
| Pages: 1 2 3 4 :: [one page] |