
Quutar
Caldari Uxor Infensus
|
Posted - 2006.10.04 21:59:00 -
[1]
my understanding right now is that: 1 node = 1 physical server in a cluster 1 node can hold multiple solar systems, but a solar system can only be on a single node.
Every down time they run a load balancing script that looks at the load that the various solor system encountered and shifts the allocations of the nodes vs solor systems around to try to spread out the processing, and to give really busy systems thier own node, for instance jita is on it's own node.
To move a solar system from one node to another while currently love will dis connect everybody currently on the solar system, and has potential bugs involved with POSes.
so... ultimatly a single server will handle a single solar system. At some point it always has to boil down to a single server. but where is the bottle neck.
when a node fails... why does it fail? is it due to lack of cpu, lack of memory, or lack of back end system (aka database). I am assuming that the database is scalable, and is a MS-SQL server (I am familiar with oracle... and I know we can cluster them on massive 300+ cpu machines if needed).
(if the node death is due to the database... then they need to contract with Ms profesional services and with HP to get super DBs)
Lets assume that the node fails due to cpu or memory. Both of these have a theoretical limit that a single server can hold, expecially if these are windows machines (I am assuming that they are, since CCP is a microsoft development partner). So just throwing more machines at the problem will not solve it, since the 800 man battle is at one of the machines.
So what can they do. anything they can do to fix it will be a fundimental change to the EVE-Online client.
I think the only thing they can do is to break down the "partitioning" further. Instead of limiting one system to a node, they limit one "grid" to a node. I think that this be a massive undertaking, if they did it, and would basically compleatly rewrite the server and network portion, maybe even the client portion of the game.
Maybe there is something else, and since I don't have code access to thier apps, I can't make any realistic observations, other than wild ass guesses.
I wonder if CCP has considered getting some loadrunner experts on staff, then they can simulate thousands apon thousands of clients in a controlled fashion, to find the problems and bottlenecks.
Meh... it's a thought.
sonofabeachballbouncingmarymotherfiretrucker |