Pages: 1 2 3 4 5 [6] 7 :: one page |
|
Author |
Thread Statistics | Show CCP posts - 24 post(s) |
Andy Koraka
PonyWaffe Insidious Empire
20
|
Posted - 2013.12.04 18:26:00 -
[151] - Quote
Maybe I'm misunderstanding something, but as far as I can tell this will only have a negative effect on the quality of game play in regards to already painful fleet combat.
Frankly I don't remember the last time I was in a full fleet and there wasn't heavy Ti-Di. Every time a solitary 250 man fleet jumps a gate the system spikes to 10% tidi for 30-45 seconds. Even if every fleet fight was on an individual reinforced node (reinforced nodes are the exception, not the rule) the issue of gate Tidi is going to be exponentially worse under the new regional scheme since every individual fleet in the area traveling to (or from) the combat system is going to be sequentially triggering gate lag that affects everyone on the same node. It's going to be a particularly painful change given the recent quality of life hits to the majority of fleet ships, there's nothing fun or engaging about staring at a warp tunnel for 10 minutes per system the entire trip home.
As far as the metagame is concerned, even without a published node map it's going to be exploited. For example in a defensive Sov war, if most of a region is on the same node it's not going to be hard to find a linked system by trial and error and dock/undock repeatedly to cascade the entire node (most of a region in the current scheme) into a sustained 10% tidi to discourage siege fleets from grinding structures.
Yes the old system wasn't perfect, but the guy ratting in an empty system halfway across EvE could have just moved over to a different system and continued ratting. Maybe this is the right solution for Empire where loads are usually steady from day to day but it's the wrong approach in Nullsec. |
Melek D'Ivri
617 Squadron
30
|
Posted - 2013.12.04 19:04:00 -
[152] - Quote
Pretty sure that explanation is as simple and easily understood as it gets, folks! |
Rammix
TheMurk
175
|
Posted - 2013.12.04 19:11:00 -
[153] - Quote
Seems they have nothing else to do. I can list up to 100 things that need dev time, they only need to ask. Where is damn WIS by the way? OpenSUSE 12.2, wine 1.5 Covert cyno in highsec: https://forums.eveonline.com/default.aspx?g=posts&t=296129&find=unread |
Joshua Blue
3-Strikes Nulli Secunda
3
|
Posted - 2013.12.04 19:53:00 -
[154] - Quote
Best blog ever! |
|
CCP Phantom
C C P C C P Alliance
3768
|
Posted - 2013.12.04 20:41:00 -
[155] - Quote
Gilbaron wrote:is there any kind of support for university papers ? i might actually be interested (not on a technical level, but for markets or politics) Yes, there is! It depends a bit on the type of research etc., but yes, in theory we can support academical research and did that in the past already. If you have serious interest and some specifics already in place, please contact the Community team (just send a support ticket with some details).
CCP Phantom - Senior Community Representative - Volunteer Manager |
|
Tasha Saisima
State War Academy Caldari State
74
|
Posted - 2013.12.04 21:10:00 -
[156] - Quote
The busiest system needs to share a node with the least busiest system so fewer people are affected |
Dersen Lowery
Laurentson INC StructureDamage
859
|
Posted - 2013.12.04 21:35:00 -
[157] - Quote
Vincent Athena wrote:Ive heard the words "Brain in a Box" quite a bit and seen vague descriptions of it having to do with preparing session change data on a separate node. But is there a full description somewhere? What it does, how much load it will remove, and so on?
Based on what I remember from CCP Veritas talking about it:
The basic problem is that every time you do a session change, the new node has to query the database for all the information about you: skills, implants, clone, yadda yadda yadda. When 500 people undock, or jump a gate, that's 500 relatively large database queries at once, with the node twiddling its thumbs until the results come back (because it can't guess how your skills impact the particular fit of the particular ship you're flying, etc.). Boom, TiDi.
The "information about you" is the "brain." The "box" is a portable data structure--a cache, really, stored on a dedicated server--and so a handle to \where your brain is in the which box can be handed from one node to another when you change sessions. Database queries are then decoupled from session changes, and they can be done as needed, asynchronously. Suddenly, fleet undocks, docks, jumps, etc., no longer spike TiDi.
Also from memory, the next initiative after that decouples the notification system from the physics engine so that it can run on its own node. Then the physics engine only has to figure out what happened, and it can asynchronously call another process on another core (or node) to tell everyone on grid what happened. That will reduce the level of sustained TiDi caused by a major fleet fight. If that initiative has a cute name, I haven't heard it yet. Proud founder and member of the Belligerent Desirables. |
Sentient Blade
Crisis Atmosphere
1071
|
Posted - 2013.12.04 21:44:00 -
[158] - Quote
Mioelnir wrote:Sentient Blade wrote:I've mentioned it elsewhere, but why are these machines not virtualised (or are they?) surely something like vMotion would be able to move high-use systems onto dedicated hardware without the need to pause anything. You really think a company, running the largest gaming cluster for over 10 years now would not have already bought a stock solution if it worked for them? Cute.
"Stock Solution"?
I'm not sure you appreciate the complexity it would take to roll out such a massive deploy and use live migration all while pumping the entire local network through virtual switches.
Easy it isn't... |
Abdiel Kavash
Paladin Order Fidelas Constans
2135
|
Posted - 2013.12.04 21:47:00 -
[159] - Quote
Andy Koraka wrote:As far as the metagame is concerned, even without a published node map it's going to be exploited. For example in a defensive Sov war, if most of a region is on the same node it's not going to be hard to find a linked system by trial and error and dock/undock repeatedly to cascade the entire node (most of a region in the current scheme) into a sustained 10% tidi to discourage siege fleets from grinding structures. I wouldn't be too afraid of that. Keep in mind that intentionally putting extra load on the servers is a serious EULA violation. And since CCP are already closely monitoring system load, your docking shenanigans will show up as a big flare. And as soon as some dev looks closer and sees that there is no actual fighting associated with the extra load, you're in trouble.
People have been given warnings and bans in the past for all sorts of exploits trying to force a node to break. |
Mioelnir
Cataclysm Enterprises Easily Offended
184
|
Posted - 2013.12.04 22:54:00 -
[160] - Quote
Sentient Blade wrote:Mioelnir wrote:Sentient Blade wrote:I've mentioned it elsewhere, but why are these machines not virtualised (or are they?) surely something like vMotion would be able to move high-use systems onto dedicated hardware without the need to pause anything. You really think a company, running the largest gaming cluster for over 10 years now would not have already bought a stock solution if it worked for them? Cute. "Stock Solution"? I'm not sure you appreciate the complexity it would take to roll out such a massive deploy and use live migration all while pumping the entire local network through virtual switches. Easy it isn't... Environment integration and live rollout of a technology have nothing to do with it. First that technology has to solve your particular problem. And yes, VMware ESX / vMotion "live migration" is pretty much a stock solution as far as virtualization goes.
TQ "nodes" are not machines, they are processes. With each process serving multiple solar systems. For that reason, you can't move a high-use solar system by moving a virtual OS around.
As far as workarounds go, one could run one virtual server with a single process running a single solar system for every system, sure. With all the increased overhead that brings along with it. And I am equally sure some Dev at CCP evaluated that already. And the fact that they did not adopt it (or something similar) means it did not work for them.
And if all that is sorted out, the EVE server code runs at 1HZ. Freezing a node for 2 seconds to copy it over to somewhere else are 2 missed server cycles that the clients have to resynchronize with again. Which means a major redesign and rewrite of the network code on the client and server side.
For those kind of development resources, you need a really strong business case. And while dynamically reinforcing nodes is a nice target, we push those into TiDi as well. All the time. Which means it's essentially a band-aid. Not something you get a man-year of development effort approved for. |
|
Sentient Blade
Crisis Atmosphere
1071
|
Posted - 2013.12.05 01:18:00 -
[161] - Quote
Mioelnir wrote:Sentient Blade wrote:Mioelnir wrote:Sentient Blade wrote:I've mentioned it elsewhere, but why are these machines not virtualised (or are they?) surely something like vMotion would be able to move high-use systems onto dedicated hardware without the need to pause anything. You really think a company, running the largest gaming cluster for over 10 years now would not have already bought a stock solution if it worked for them? Cute. "Stock Solution"? I'm not sure you appreciate the complexity it would take to roll out such a massive deploy and use live migration all while pumping the entire local network through virtual switches. Easy it isn't... Environment integration and live rollout of a technology have nothing to do with it. First that technology has to solve your particular problem. And yes, VMware ESX / vMotion "live migration" is pretty much a stock solution as far as virtualization goes. TQ "nodes" are not machines, they are processes. With each process serving multiple solar systems. For that reason, you can't move a high-use solar system by moving a virtual OS around. As far as workarounds go, one could run one virtual server with a single process running a single solar system for every system, sure. With all the increased overhead that brings along with it. And I am equally sure some Dev at CCP evaluated that already. And the fact that they did not adopt it (or something similar) means it did not work for them. And if all that is sorted out, the EVE server code runs at 1HZ. Freezing a node for 2 seconds to copy it over to somewhere else are 2 missed server cycles that the clients have to resynchronize with again. Which means a major redesign and rewrite of the network code on the client and server side. For those kind of development resources, you need a really strong business case. And while dynamically reinforcing nodes is a nice target, we push those into TiDi as well. All the time. Which means it's essentially a band-aid. Not something you get a man-year of development effort approved for.
I'll take these in turn...
#1 Yes it's a stock solution, but it's method of deciding when to migrate guests between hardware isn't. You'd need some kind of real-time reporting from the solar system servers, collating, and then deciding what gets put on what hardware.
#2. You could go for this approach of 1 system per VM. In a virtualized system this would actually be rather easy to maintain and would provide the best real world gains for many-threads few-intensive workloads.
#3 If you have to freeze something for 2 seconds your migration code isn't working right. You should be able to migrate an entire VM over with maybe half a seconds pause, if that. Not that you actually miss them in the first place, their data just stays in the queue. The underlying VM has no idea at all it's been moved. |
Haseo Antares
Corollary Forest Fairytail.
57
|
Posted - 2013.12.05 03:41:00 -
[162] - Quote
Magic, got it. We currently have the world's greatest linguists and scientists trying to decode whatn++ you just said. |
Dersen Lowery
Laurentson INC StructureDamage
860
|
Posted - 2013.12.05 04:44:00 -
[163] - Quote
Sentient Blade wrote: The underlying VM has no idea at all it's been moved.
And since it's taken a nontrivial amount of time to move relative to the 1HZ physics engine, meaning that the odds are very good that your half a second will cross a tick boundary, that means that every move must be followed by a resync with adjacent systems to get everyone back on the same page, right? If one node is off by a server tick, how do you handle that? Proud founder and member of the Belligerent Desirables. |
Abdiel Kavash
Paladin Order Fidelas Constans
2136
|
Posted - 2013.12.05 04:52:00 -
[164] - Quote
Dersen Lowery wrote:Sentient Blade wrote: The underlying VM has no idea at all it's been moved. And since it's taken a nontrivial amount of time to move relative to the 1HZ physics engine, meaning that the odds are very good that your half a second will cross a tick boundary, that means that every move must be followed by a resync with adjacent systems to get everyone back on the same page, right? If one node is off by a server tick, how do you handle that? During TiDi different systems are not running in sync either.
(I'm not saying this as a proof that this will be easy, rather as anecdotal evidence for it.) |
Pak Narhoo
Splinter Foundation
1204
|
Posted - 2013.12.05 04:59:00 -
[165] - Quote
CCP Prism X, is there any relation between the "balanced universe" and the perceived unresponsiveness from this thread? |
NinjaTurtle
AQUILA INC Verge of Collapse
47
|
Posted - 2013.12.05 06:01:00 -
[166] - Quote
Great dev blog! Thanks so much for giving us insight into how you balance the clusters, I for one had been wondering what your process was for some time. Can't wait to see the results Co-host and editor of Declarations of War Podcast http://declarationsofwar.com Twitter- @schertt |
Rn Bonnet
Sniggerdly Pandemic Legion
20
|
Posted - 2013.12.05 08:09:00 -
[167] - Quote
Dersen Lowery wrote:Sentient Blade wrote: The underlying VM has no idea at all it's been moved. And since it's taken a nontrivial amount of time to move relative to the 1HZ physics engine, meaning that the odds are very good that your half a second will cross a tick boundary, that means that every move must be followed by a resync with adjacent systems to get everyone back on the same page, right? If one node is off by a server tick, how do you handle that?
Vmotion at least is truly transparent to the underlying VM. You will see a "pause" but incoming network packets etc. are not dropped ,just queued while the machine is in motion afaik. |
Steve Ronuken
Fuzzwork Enterprises Vote Steve Ronuken for CSM
2119
|
Posted - 2013.12.05 11:00:00 -
[168] - Quote
Rn Bonnet wrote:Dersen Lowery wrote:Sentient Blade wrote: The underlying VM has no idea at all it's been moved. And since it's taken a nontrivial amount of time to move relative to the 1HZ physics engine, meaning that the odds are very good that your half a second will cross a tick boundary, that means that every move must be followed by a resync with adjacent systems to get everyone back on the same page, right? If one node is off by a server tick, how do you handle that? Vmotion at least is truly transparent to the underlying VM. You will see a "pause" but incoming network packets etc. are not dropped ,just queued while the machine is in motion afaik.
Nope.
Set up a continuous ping of a VM, then vmotion it, and you'll see a couple of dropped packets. Steve Ronuken for CSM 9! http://www.fuzzwork.co.uk/ Twitter: @fuzzysteve on Twitter |
Cerulean Ice
EVE University Ivy League
49
|
Posted - 2013.12.05 15:25:00 -
[169] - Quote
I noticed a typo in the 3rd to last image, detailing how the x/y split works to better facilitate the repeated splitting in half. http://content.eveonline.com/www/newssystem/media/65499/1/wholePowerOfTwoSolution.jpg In the blue text for the 1st split, 85/64 is not 75.3%. 64/85 is, however. ^^ |
Cygnet Lythanea
World Welfare Works Association Independent Faction
316
|
Posted - 2013.12.05 16:43:00 -
[170] - Quote
It's nice to see work done on high sec, even if it took the servers burning up before CCP would admit that highsec exists... LOL
The Most Interesting Player In Eve. |
|
Mioelnir
Cataclysm Enterprises Easily Offended
184
|
Posted - 2013.12.05 21:03:00 -
[171] - Quote
About the 2 seconds: that's straight from the vendor. So while, in practice, it may not take more than half a second, you still need to design your cluster to be able to handle a 2 second move. Better yet, a 4 second move. If every client disconnects because a move took .7 instead of .5 seconds, you gained nothing.
And to the every solar system on its own VM: yes, that is rather easy to maintain - from the POV of the virtual infrastructure. But it means x30 more connections on the internal end of the session servers. It also means x30 more SQL sessions which probably can't be scaled down by x30. It also means a larger memory foorprint for the entire server (x30 more OS instances) and decreased cache efficiency. That's why I called it a workaround.
vMotion works nicely for applications which you can also loadbalance via IP failover. For protocols with standing connections and high degree of time synchronization - let's say it gets complicated fast.
Abdiel Kavash wrote:Dersen Lowery wrote:Sentient Blade wrote: The underlying VM has no idea at all it's been moved. And since it's taken a nontrivial amount of time to move relative to the 1HZ physics engine, meaning that the odds are very good that your half a second will cross a tick boundary, that means that every move must be followed by a resync with adjacent systems to get everyone back on the same page, right? If one node is off by a server tick, how do you handle that? During TiDi different systems are not running in sync either. (I'm not saying this as a proof that this will be easy, rather as anecdotal evidence for it.) The tick between different systems runs differently. It probably always has. While all TQ nodes will run with similar latencies against the same NTP to keep the cluster internal clocks sync'ed, I doubt CCP sync'ed the server tick. Unless they use a wallclock second to initialize the first tick after starting the process - which actually they might have, thinking about it.
But this is not really that important inside the cluster. There really only the wallclock has to be sync'ed so timestamps represent consistently the same to all involved. That can be handled, NTP solved that problem decades ago.
The move is much more likely to desync the tick-count between server and client dogma simulation. The clients would be some seconds ahead of the server. Here the server could: - skip forward to the clients, discarding input for the skipped ticks - skip forward, (try to) apply the entire input queue to the next processed tick - issue all clients to roll back to his tick, discarding input - signaling the clients a higher TiDi level than the server actually runs at until the it has caught up again In any case, the server would have to be notified by the infrastructure that it has been moved, since the eve clients are untrusted terminals and the server can not trust them even if every connected client agrees that the server-tick is off by the same offset.
Btw, I think it's awsome that we as players sit here talking about TQ's cluster architecture. |
Diomedes Calypso
Aetolian Armada
187
|
Posted - 2013.12.06 06:11:00 -
[172] - Quote
These sorts of blog posts make me love the game even though I tend to think of a python as a snake in the amazon or a pet snake around someone's neck at a park in Berkeley California.
Respect for the intelligence and knowledge of the users.
Treating us like adults.
I love that the company has so firmly decided (lol yes, since the 1000$ pants debacle) not to assume that people who don't really grasp more than the broad strokes will be put off by "too much detail"/
Yes I do understand the clusters and understand deviations and balancing etc but get lost or glazed eyed a bit deeper. I love that I'm told more than I want to know on some topics but can suck in the details on topics I'm interested in (start talking the velocity of money and I get real interested)
And .. heck.. I can always start researching terms I don't understand and enjoy the whole thing and be more knowledgeable about computers from playing the game ! . |
Blue Harrier
144
|
Posted - 2013.12.06 15:40:00 -
[173] - Quote
Can I just pop in and say having read all 9 pages of this thread I wish more threads were like this on the forums.
Constructive talking among a diverse group of some very and some not so very knowledgeable members, no one having tantrums, throwing teddies out of prams, nothing but reasoned arguments.
Some putting forward what ifGÇÖs, others debating and showing why this would not be possible but leaving room for further debate in case they missed something.
Must be the spirit of Christmas or something, well done to all.
"You wait - time passes, Thorin sits down and starts singing about gold." from The Hobbit on ZX Spectrum 1982. |
Katrina Bekers
Rim Collection RC Sorry We're In Your Space Eh
191
|
Posted - 2013.12.06 17:13:00 -
[174] - Quote
Steve Ronuken wrote:Nope.
Set up a continuous ping of a VM, then vmotion it, and you'll see a couple of dropped packets.
Ping is connectionless and has a timeout of 3 seconds.
A TCP connection is - duh! - connection based, and usually the timeout is at 30 seconds.
Perfect? No.
But a dropped ping doesn't necessarily mean a dropped connection. << THE RABBLE BRIGADE >> |
Steve Ronuken
Fuzzwork Enterprises Vote Steve Ronuken for CSM
2129
|
Posted - 2013.12.06 17:50:00 -
[175] - Quote
Katrina Bekers wrote:Steve Ronuken wrote:Nope.
Set up a continuous ping of a VM, then vmotion it, and you'll see a couple of dropped packets. Ping is connectionless and has a timeout of 3 seconds. A TCP connection is - duh! - connection based, and usually the timeout is at 30 seconds. Perfect? No. But a dropped ping doesn't necessarily mean a dropped connection.
It does mean dropped packets though. /That/ is what I was saying. Steve Ronuken for CSM 9! http://www.fuzzwork.co.uk/
Twitter: @fuzzysteve on Twitter |
Rain6637
Team Evil
6757
|
Posted - 2013.12.06 21:34:00 -
[176] - Quote
wormhole mass accumulation needs to be looked at, specifically: how it relates to traffic control. traffic control prevented a wormhole jump, giving me a "you will be cleared to jump within the next X seconds," but also counted my ship's mass against the remainder on the hole, subsequently shutting it down while I stared at a traffic control timer. if quiet systems = sisi-esque dropped jump attempts, the least consideration you could also make is preventing dropped jumps from contributing to wormhole mass limits. Rainfleet on Twitch |
Rain6636
Team Evil
823
|
Posted - 2013.12.07 00:43:00 -
[177] - Quote
I've submitted a bug report, referencing the dev blog, and outlining the scenario in which traffic control will reject a wormhole jump while the ship's mass is still counted toward the hole's mass limit (as if the jump was successfully made). I can't find a bug report number to list here. Rainf1337 on Twitch |
Jessica Danikov
Clan Shadow Wolf Fatal Ascension
144
|
Posted - 2013.12.07 13:20:00 -
[178] - Quote
Andy Koraka wrote:Maybe I'm misunderstanding something, but as far as I can tell this will only have a negative effect on the quality of game play in regards to already painful fleet combat.
Frankly I don't remember the last time I was in a full fleet and there wasn't heavy Ti-Di. Every time a solitary 250 man fleet jumps a gate the system spikes to 10% tidi for 30-45 seconds. Even if every fleet fight was on an individual reinforced node (reinforced nodes are the exception, not the rule) the issue of gate Tidi is going to be exponentially worse under the new regional scheme since every individual fleet in the area traveling to (or from) the combat system is going to be sequentially triggering gate lag on the same node. It's going to be a particularly painful change given the recent quality of life hits to the majority of fleet ships, there's nothing fun or engaging about staring at a warp tunnel for 10 minutes per system the entire trip home.
As far as the metagame is concerned, even without a published node map it's going to be exploited. For example in a defensive Sov war, if most of a region is on the same node it's not going to be hard to find a linked system by trial and error and dock/undock repeatedly to cascade the entire node (most of a region in the current scheme) into a sustained 10% tidi to discourage siege fleets from grinding structures.
Yes the old system wasn't perfect, but the guy ratting in an empty system halfway across EvE could have just moved over to a different system and continued ratting. Maybe this is the right solution for Empire where loads are usually steady from day to day but it's the wrong approach in Nullsec.
The changes made haven't done much to change this problem significantly- both systems create large areas of connected systems that are all on a single node, the new one just ignores constellation boundaries and balances the (predicted) load across nodes better, while also ensuring all solar systems on a node are fairly local to each other. At worst, it may make the contiguous spaces a little larger.
The static mapper could do a lot more for this issue by striping nodes if the difference between intra-node and inter-node jumps really is significant (especially when scaled up) and the efforts to do so should be fairly minimal. If not, the Brain in a Box is going to be the next big advance in that area. |
Rain6636
Team Evil
824
|
Posted - 2013.12.07 20:10:00 -
[179] - Quote
still waiting for confirmation that failed wormhole jumps with traffic control messages count against the wormhole mass, but will be looked into. (meanwhile there will be support tickets, handled by uninformed customer service staff) Rainf1337 on Twitch |
Alex Logan
Brutor Tribe Minmatar Republic
11
|
Posted - 2013.12.07 23:06:00 -
[180] - Quote
I don't think we should trust CCP Prinsm X.
I don't think libras are serious and trustworthy.
Sorry but I won't read your stuff. |
|
|
|
|
Pages: 1 2 3 4 5 [6] 7 :: one page |
First page | Previous page | Next page | Last page |