Pages: 1 2 3 4 5 6 [7] 8 9 :: one page |
Author |
Thread Statistics | Show CCP posts - 9 post(s) |

Hebik Fane
Havoc Inc
|
Posted - 2007.05.25 16:02:00 -
[181]
Everyone stop it! I just finished the 3 day MS 2780 SQL class and my head is going to explode here. Deadlocks, tail of the log, stored proceedures I must now pod you all. 
P.S. For those of you that don't know, there is a reason there is a 3 day class on just maintaining your SQL server, its not easy so cut the Devs some slack.
|

Rude Bwoy
|
Posted - 2007.05.25 16:16:00 -
[182]
why don't they buy DB2 or Oracle. i.e. a REAL db. Who's the rudest of them all! |

Astrohole
|
Posted - 2007.05.25 16:18:00 -
[183]
Ah, just make me a standalone version of Eve and then I can gank myself if it stumbles. |

Defiant952
|
Posted - 2007.05.25 16:19:00 -
[184]
Originally by: winglessangelxxx
Originally by: CCP Valar A quick post mortem after this horrible night.
The issues tonight were caused by the contract system. The original design for contract lookups never intended that you could look up contracts in other regions and because of that and other issues with contracts, the query plans the SQL Server creates for the contracts lookup can go out of whack.
This causes extreme CPU load on our SQL Server and is not easily identified in a database trace due to the long duration of the proc(as in hours) when this happens. When the junior database admin first looked at the problem he did not identify the real cause and thus the first reboot did not fix the problem.
After I got a call shortly after the second reboot I identified the cause and tried applying fixes live. The server was however gone too far to recover so a reboot was initiated at 7:10. The server went down at 7:36. While the server was down, I forced a query plan on the contracts lookup proc that works in all cases and manually updated statistics on the relevant tables.
The SQL Server is now back to its normal self. The development team will hopefully have a proper fix for this ready soon.
And now back to bed 
Explain how a *patch to fix voice services* screwed with the contracts system please.... I'd love to hear this....
They didnt say the patch caused it, What caused it was a bad query on the contract system based on the global region settings that they were never designed for. Cut them some slack and once again as others have said, This is not the same EVE as 4 years ago so stop that argument already. You all should be praising them for how few unshceduled downtimes you have.
|

Korotani
Caldari Love and Rockets
|
Posted - 2007.05.25 16:28:00 -
[185]
Thanks to whoever stayed up all the night long to fix this - People may whine about it, but I doubt they could do it, and to all you computing A levelers or degree students out there, you think you can fix it? Stop being a self righteous c*** and apply for a job at CCP, then we can take it out on you when you get it wrong.
|

lam0r
Caldari The Legion.
|
Posted - 2007.05.25 16:31:00 -
[186]
Originally by: Xrious Edited by: Xrious on 25/05/2007 10:15:52 Does anyone find it worrying that a junior DBA appears to have the authority to shutdown a server with 30,000 users on? Sure only the person at top should have that authority? I apologise and know that you need sleep, but employ someone else. Thats really what we're paying for. We all abuse M$, but I had alot less problems with SP2 than CCP seems have with their own systems.
When I was working in DBA I often had the same power to kick 10-15k people off our system in order to prevent 'collateral damage'. When you have scripts updating numerics, it's hard to 'undo' your mistakes unless you have an audit/transaction log (which would take even more downtime to check and restore data).
As far as MSSQL goes, well it's a fast and robust DBE that is so easy to administer. I found it easier than my time administering Oracle servers and as many have said, the difference in performance is really negligible at that scale (I've had workstations running MSSQL kick the crap out of any other windows based DBE, so imagine what multi-CPU machines can do).
My curiousity lies in the specs of the DB server and how it is configured (multi-threading? using stored procedures? is it performing unnecessary execution plans?) but that's what our friendly Dev team is for I guess :P
|

Jim McGregor
|
Posted - 2007.05.25 16:32:00 -
[187]
Edited by: Jim McGregor on 25/05/2007 16:32:09
I tried to log on this morning from home to change skills but found that the server wouldnt accept logins. So when i got to work I downloaded the Eve client to my work computer and was on my way to quickly log in and change skills when my boss showed up in the exact same moment as Eve started showing the fullscreen intro scene...
Lets just say he wasnt happy. 
--- Eve Wiki | Eve Tribune |

Pang Grohl
Gallente
|
Posted - 2007.05.25 16:37:00 -
[188]
Originally by: CCP Valar A quick post mortem after this horrible night.
The issues tonight were caused by the contract system. The original design for contract lookups never intended that you could look up contracts in other regions and because of that and other issues with contracts, the query plans the SQL Server creates for the contracts lookup can go out of whack.
This causes extreme CPU load on our SQL Server and is not easily identified in a database trace due to the long duration of the proc(as in hours) when this happens. When the junior database admin first looked at the problem he did not identify the real cause and thus the first reboot did not fix the problem.
After I got a call shortly after the second reboot I identified the cause and tried applying fixes live. The server was however gone too far to recover so a reboot was initiated at 7:10. The server went down at 7:36. While the server was down, I forced a query plan on the contracts lookup proc that works in all cases and manually updated statistics on the relevant tables.
The SQL Server is now back to its normal self. The development team will hopefully have a proper fix for this ready soon.
And now back to bed 
I knew allowing global viewing of Contracts was a bad idea.
Anyway, good job on getting things sorted. Chin up to the Junior DBA, his reboot let me get in to set things up for an extra long weekend away from EVE.
Si non adjuvas, noces (If you're not helping, you're hurting) |

Defiant952
|
Posted - 2007.05.25 16:57:00 -
[189]
Edited by: Defiant952 on 25/05/2007 16:55:51 I swear half of you don't read.
Look if this was the same as it was FOUR years ago it would be fixed, BUT how new is the contract system? Thats right pretty new especially for the global viewing which was what caused the random problem.
EDIT: Woot top of page 8 :P
|

Dikat
|
Posted - 2007.05.25 16:58:00 -
[190]
Originally by: Angelis666 I Would kill ANYONE to have the nerdy knowledge of some of the people in this thread!
And I'm sure THEY would kill to know the touch of a woman :)
|

Milz0r
|
Posted - 2007.05.25 17:00:00 -
[191]
Yes, i understand that.
But if it took them 4 years to identify a contract problem with viewing all regions...then there is something wrong.
It just seems that they dont really fix anything...they just restart the servers. I lost a 66 million isk blueprint and a raven to this recent server crash. I sent a petition and i bet they wont do anything about it...even if i am a paying customer to a game that is constantly crashing.

|

torswin
Caldari Capital Productions Inc.
|
Posted - 2007.05.25 17:02:00 -
[192]
I wonder why the Junior Database-guy get so much flames? He did what he thought was right and when he discovered that his "fix" didn't help, he contacted someone who probably had more experience than he did in analysing the cluster
(hope i got that right, please correct )
and it is inhuman to be perfect 
|

Jim McGregor
|
Posted - 2007.05.25 17:04:00 -
[193]
Originally by: Capt Tripps
Hate to tell ya but in my IT experience starting in 1982, it never ends with computers and servers and software, always another problem. That's just how it is.
Thats why we get paid the big bucks to fix it.  --- Eve Wiki | Eve Tribune |

UmnaHun
|
Posted - 2007.05.25 17:06:00 -
[194]
Edited by: UmnaHun on 25/05/2007 17:06:59 Soo many clever people here...
Why I only see one - and only one - eve like game running???
If you are so clever/experienced/talented, do a better one!
lol!
CCP Keep up the good work (and nerf BOB)
|

Megasexmanaut
|
Posted - 2007.05.25 17:10:00 -
[195]
Since Eve is a constantly evolving project, I expect it to have occational health problems. That is not really a bad thing because it's always getting better. What I like about the devs at CPP is their candor in letting me know what the problems are about and what they are doing to fix it. Having a better understanding of the situation removes my misconceptions and elevates my respect and appreciation for the people at CPP. 
|

lam0r
Caldari The Legion.
|
Posted - 2007.05.25 17:13:00 -
[196]
I would say just keep it clean and the feedback constructive, without us it's sometimes hard to know to what extent a problem exists! Sometimes very important points are missed through the sea of flames :P
|

Garth Vaders
|
Posted - 2007.05.25 17:15:00 -
[197]
Hello to all. This is my first post in these forums. I decided to try out this game recently without much hope for it but got immediately hooked. So I upgraded my subscription to 6 months :) The concept is great and i like almost everything about the game (although i have problems with the blueprints manufacturing and other similar "heavy" stuff like that but i have great fun so far doing all the rest.I would appreciate any links on how the research and manufacturing fields are working cause i find the subjects a bit confusing.
Now to the point. Although the game is great and i have played many online games i see that it has severe lag problems. It shouldn't be so laggy with only 30-40 thousant people on. It is sad to say that my greatest fear when i click on a mission is if i will make it back alive not because i am afraid of the dangers of the mission ahead of me but because i am afraid of lag.
The other day (not this one that we had the "problems" this thread is refering) the server got a lag spike at the exact time when i was warping desperate into another gate in order to avoid a swarm of pirates that had droped down my shields and half my amror. So i had lag spike and then got disconnected. And well i went mad cause i had suceeded the mission and the only thing i needed was to be able to warp safe back and .... lag was making it impossible. Anyway after several minutes of agony i managed to log back in and found my ship to the safe system i had warped .Thankfully the jump was made even if i wasn't there. Seems the server managed to execute the warp command just before it boots me.... :)
I say all that to share with you the horror that us gamers get when such things happen but i bet you are all familiar with it.
As far as last night goes my 2 cents are:
1) I really appreciate the friendly way the admins communicated us the problem they have :)
2) The fact that you are running a Microsoft server perlexes me. I was almost certain that you would run the game in a Linux or Unix server because i was thinking perhaps in an oversimplified way: "Hey these guys are Scandinavians the must use Linux" :P So my suggestion on this matter is try to switch to Unix or Linux if you can it will save a lot of lag. Get some Linux expert (you are Scaninavians i am sure you find many Linux experts up there north hehe)
3) Someone mentione that these forums share the same resources with the server hosting the game so it's common sence to suggest to remove the Website from the server and set it up on a server differnt to the one runing the game. 4) To remediate on the lag that a universe so big hosted in one server has i would suggest the following solution: " Divide the universe in 2 servers.So server A will have half the universe and server B the other half. Make also a server C to contain all the "transactions info" in effect set the whole "market" thing on that different server C and have it communicate the "market" data with the other two. The universe will still remain the same ONE it is . You can convert certain jump gates into transitions from server A to server B so when a ship jumps will get transitioned to the other server. This way lag will be diminished much i think.
5) You have to understand that since this game is GREAT it will keep on growing BUT if these additional new players (like me) bring lag with them then the whole world will suffer from lag making actually other players leave. (Imagine it as if trying to hold sand in your arms, at some point the more sand you will grab the more will slip away)which is counterproductive of course) So what will happen is that you will gain players everyday while losing others at the same time that won't be able to put up with the lag the new players will bring. :( So if you won't find a way to remediate the lag the player numbers in all will stop increasing >meaning less profit for your company.
Bottom line: Fix the lag before is too late
|

Lowa
Gallente North Star Networks Cruel Intentions
|
Posted - 2007.05.25 17:17:00 -
[198]
Nina Mires, my dear, the HAVE split the DB's up.  The SSD systems are a prime example running one single function (or DB table). The rest is split across hardware too afaik, it may look like a single instance but it isnt.
How that is tied together I'm still figuring out but understanding the environment takes time. 
Originally by: Dikat
Originally by: Angelis666 I Would kill ANYONE to have the nerdy knowledge of some of the people in this thread!
And I'm sure THEY would kill to know the touch of a woman :)
That reply deserves the "touche!" remark tbh!  But, in case you didnt know, nerds are the new black. And we are in hot demand! Honest to [who ever you believe in] its like a revelation when the ladies realize that finger/hand dexterity (sp) is a HUGE "I Win Button"!. 
However, taking a shower now and then helps of course.
/Lowa <-- Showered and ready. I'm just gonna log in for a while... 
What if the truth was something else? |

Ulitio
|
Posted - 2007.05.25 17:24:00 -
[199]
This is not to be meant as criticism.
This whole episode makes me wonder if the current software engineering practice to use off the shelf products for purposes really works. What I mean is MS SQL server(and most RDBMS products) are designed as business products. They are not really designed to be used in any real time applications(and while MMOGs aren't RT apps, they are closer to RT than a pure business model.) Now I know what the vendor's say, but the fact remains is how the product was designed, not what some marketing bots say it can do. So I have to wonder if the storage mechanism was specially designed, if this sort of performance problem would have occurred(And yes I am well aware of the implications, I have spent the past 4 years or so trying to adapt OtS products to specific purposes with good success. However my experience is that when you bring someone new in, even if they have experience with the OtS product, they still get confused easily by the mods to it)
Food for thought...
|

Siege
Minmatar Siegecraft Bounty Hunting
|
Posted - 2007.05.25 17:29:00 -
[200]
Originally by: Milz0r Yes, i understand that.
But if it took them 4 years to identify a contract problem with viewing all regions...then there is something wrong.

Errrr.... Contracts themselves have only been around for about 6 months now, and globally viewing them has been around even less time than that. Maybe, what, 2 months?
So, explain to me again how they could have found the problem 4 years ago, on a two month old system.
|

Hondo Kimotoro
Farmer Killers United Corporations Against Macros
|
Posted - 2007.05.25 17:51:00 -
[201]
To those who were blaming the jr techie guy, and ccp for not having a high end guy on duty at the time.
This isnt unusual, in fact the military ( US ) does this as a common practice, you think the general is going to be up guarding the motor pool or pulling cq duty? heck no, some pvt or sgt is gonna be doing it. Then when something beyond their abilitys to cope with occurs they go up the chain of command.
and thats exactly what happened here, guy did what he could, was beyond him, he woke the general and he stormed in to the rescue :-)
that being said, the system has grown more instable since i first started pre bloodlines. But ive read some initiatives they are trying to work on to fix this. And i congradulate them for at least trying to fix it.
To those of us who are upset the server is down,( yes i get ****y too when it happens.. but its anger out of love :-P ) i simply point to the other games out there, my fleet battle account might just take a vacation with all the lag i get fron fleet operations until it is fixed and go play another game ( my other accounts however will stay nice and active.. so NO you cant have my stuff im coiming back :-P ) then when the server goes down here i can just swap to a different game, best of both worlds.
however getting rid of some of those macro mission runners and miners might help the lag a little bit <.< >.>
|

Aterna
Talon's Grasp
|
Posted - 2007.05.25 17:53:00 -
[202]
Originally by: Garth Vaders Hello to all. Stuff
1) We all love our devs. Some people do the "tough love" thing though.
2) Read some of the posts earlier on. If they don't turn your ears to goo, you will probably understand. If they do, then you don't know enough about servers and DB admin to make any qualified remarks (I don't).
3) CCP has already mentioned that the website server and forums are being upgraded, someone from the web cell is working on it. Check the recent dev blogs?
4) EVE already does this, but on a much larger scale then you know. The universe isn't divided into just 2 or 3 servers. It is divided into over a hundred 'nodes.' Each node supports a small group of star systems. Any time you use a jumpgate or change system, you move from one server to another. The DB servers are already not the same server that the system nodes are on. The market, the contracts, bookmarks, hangar, etc are all on their own server array. Again, read some of the dev blogs and above player comments, before you post.
5) You stopped making sense somewhere around here. If everyone would spread out and away from mission hub systems and market hub systems, it would lessen the lag felt on those nodes. EVE can support a lot of players, but the amount of people you can cram on a single node is still finite, and the more NPC's, wrecks, missions being run lower that number significantly.
Simply put, EVE needs a mission agent rework, to spread them out a bit to little traveled systems, to ease node stress on places like Motsu and Saila.
I don't even know if it is possible, but a major solution would be some sort of 'Omgwtfhax' server code that allowed multiple nodes to support a single system, with the ability to add or remove nodes dynamically, on the fly so to speak. So that we don't have to ask in advance to reinforce nodes for epic battles, they can happen seamlessly and afterwards, the node count for the battle can drop down to normal while the victor takes the spoils and the loser flies home or gets a new clone. - - -
WTB new sig, evemail me please. |

The Kinetic
Caldari Sword Production
|
Posted - 2007.05.25 18:02:00 -
[203]
Originally by: CCP Admiral Chamrajnagar Server is now online, we had a database problem where the database server went too 100% cpu and remained there for 1 hour. I made the decision after consulting with the GM's to take the server down gracefully instead of allowing it to melt down.
A job had caused an abnormal load and the server was trying to catch up but could not.
Upon startup of the cluster again a machine decided to reboot itsself and cause the infinite startup loop.
After correcting that issue we are now online.
My apologies for the delay in updating you.
I must say, it is nice now you guys keep us informed as to whats going on and what caused the problems etc  --------
hi2u |

Lesch
Gallente Meyvn Corp
|
Posted - 2007.05.25 18:10:00 -
[204]
Quote: I wonder why the Junior Database-guy get so much flames? He did what he thought was right and when he discovered that his "fix" didn't help, he contacted someone who probably had more experience than he did in analysing the cluster
I'm not sure anyone's upset with the junior guy himself. From my perspective at least, it's simply poor customer service to have put him and the customers in that position in the first place.
Quote: and he's assuming junior = not as good, and you know what they say about assumption...
I wasn't assuming anything. If you read the dev post it was laid out quite clearly there. I'll paste it here for your benefit:
Quote: When the junior database admin first looked at the problem he did not identify the real cause and thus the first reboot did not fix the problem.
After I got a call shortly after the second reboot I identified the cause and tried applying fixes live. The server was however gone too far to recover so a reboot was initiated at 7:10. The server went down at 7:36. While the server was down, I forced a query plan on the contracts lookup proc that works in all cases and manually updated statistics on the relevant tables.
pretty clear that in this case, junior = not as good.
Now, as far as this goes:
Quote: You are assuming that there are more North American players than European players. However, there are more European and Non-North American players playing Eve than North/South American players. I beleive this was mentioned by Oveur or kieron somewhere as well.
I made no such assumption. I simply stated that this was allowed to occur during North American prime time.
I understand that it's in some people's nature to want to stand up for CCP in a situation like this. I also understand that there are people who just want to argue with someone on forums from time to time. What I'm suggesting is that people look at the real issues going on here and make themselves heard when they receive inadequate service, so that the people we pay for that service might take notice and make efforts to correct the situation.
|

Remorer
|
Posted - 2007.05.25 18:38:00 -
[205]
Originally by: Garth Vaders It shouldn't be so laggy with only 30-40 thousant people on.
ONLY 30-40 thousand.. Isn't that a lot of players on the same server??
|

Mari Onette
|
Posted - 2007.05.25 18:38:00 -
[206]
Originally by: Garth Vaders It shouldn't be so laggy with only 30-40 thousant people on.
There is no other game in the world that supports 30 thousand users in a single virtual world. WoW has more subscribers then eve, but in terms of the number of people on a single world at a time, EvE beats it hands down. A wow server with 30 thousand concurrent users would be WAY more laggy then eve (actually, it would probably be outright broken).
Originally by: Garth Vaders
To remediate on the lag that a universe so big hosted in one server has i would suggest the following solution: " Divide the universe in 2 servers.So server A will have half the universe and server B the other half. Make also a server C to contain all the "transactions info" in effect set the whole "market" thing on that different server C and have it communicate the "market" data with the other two. The universe will still remain the same ONE it is . You can convert certain jump gates into transitions from server A to server B so when a ship jumps will get transitioned to the other server. This way lag will be diminished much i think.
Eve is not a single server. It is actually a cluster of some 150+ nodes (i am to lazy to look up the exact number), each running 1 or more star systems in eve. Market and contracts are both on seperate nodes, as well as the webserver for the fourms (although that is a recent improvement). So your suggestion about seperating eve into different servers has already been done, its part of the basic design of tranqulity. The limitation of the system is that you cannot put more then 1 node on a single star system. Combine this with the limitations of modern hardware (its more a bandwidth issue then pure CPU power) and you start encoutering lag once you get 300 some people in the same star system. Jita and Motsu are perfect examples of this problem. They both have dedicated nodes and there are still lag issues.
The simplest way to reduce lag in eve is to move away from core systems like jita and motsu. There are a few thousand star systems in eve, many of them completely empty most of the time. Make a system like that your home base and you wont encounter nearly as much lag anymore. I hardly ever encounter lag since I moved away from crowded systems, and I still have easy access to equipment, minerals, and good agents.
|

Toramt
|
Posted - 2007.05.25 18:57:00 -
[207]
Edited by: Toramt on 25/05/2007 18:58:35 Knowing something about how Oracle works (SQLServer is probably equivalent), query plans can change over time especially if any sort of structural change was made to any of the tables in the query, or even if the number of rows crosses an invisible threshold. This can cause the performance of the query to change drastically if the database decides that a very 'bad' query plan is the best way to get the data now. DBA's can override this behavior by hard-coding the plan or 'faking' the statistics on the relevant tables, but this is not usually done as the database normally does a good job of picking the right query plans and responding to changing table structure / sizes.
In this case, it is quite possible that something tangential to the Contracts tables changed (the Items or Players tables perhaps), but since they are referenced they impacted the query plan for Contracts. This kind of problem is the sort you would not always see in a dev/test system, and it is reasonable for it to show up after X years of the Contracts system existing, since the affected tables may not have been core to Contracts at all.
|

Rigsta
Gallente Raddick Explorations Executive Outcomes
|
Posted - 2007.05.25 19:28:00 -
[208]
Originally by: Milz0r Yes, i understand that.
But if it took them 4 years to identify a contract problem with viewing all regions...then there is something wrong.
Holy paradox Batman! Contracts have been around for maybe 6 months, the problem has been around for about a day and it's taken 4 years to notice it! 
</sarcasm>
Originally by: Jim McGregor I felt the disturbance... it was like a million voices suddenly stopped whining for a second. Unfortunantly it then continued.
|

WhitePhantom
Gallente Edenists
|
Posted - 2007.05.25 19:50:00 -
[209]
Originally by: Ulitio This is not to be meant as criticism.
This whole episode makes me wonder if the current software engineering practice to use off the shelf products for purposes really works. What I mean is MS SQL server(and most RDBMS products) are designed as business products. They are not really designed to be used in any real time applications(and while MMOGs aren't RT apps, they are closer to RT than a pure business model.) Now I know what the vendor's say, but the fact remains is how the product was designed, not what some marketing bots say it can do. So I have to wonder if the storage mechanism was specially designed, if this sort of performance problem would have occurred(And yes I am well aware of the implications, I have spent the past 4 years or so trying to adapt OtS products to specific purposes with good success. However my experience is that when you bring someone new in, even if they have experience with the OtS product, they still get confused easily by the mods to it)
Food for thought...
Eve is a business product, how does it not fix the exact purpose of MSSQL?
|

Jimer Lins
Gallente Sanctuary
|
Posted - 2007.05.25 20:02:00 -
[210]
Just remember to take the scheduled job that runs UPDATE STATISTICS every hour out when they do deploy a better query. 
I've had to do something very similar in my job; we had a query that worked perfectly in test but took forever in a production mode. It turned out to be related to how the query plan was being optimized in the new environment- very badly. ;)
|
|
|
Pages: 1 2 3 4 5 6 [7] 8 9 :: one page |
First page | Previous page | Next page | Last page |