Pages: 1 2 3 4 5 :: [one page] |
Author |
Thread Statistics | Show CCP posts - 18 post(s) |
|

CCP Fallout

|
Posted - 2010.08.20 18:04:00 -
[1]
Our "Fixing Lag" series continues with CCP Atlas' blog on character nodes, which you can read here.
Fallout Associate Community Manager CCP Hf, EVE Online Contact us |
|

Orephia
|
Posted - 2010.08.20 18:08:00 -
[2]
Thanks!
& first?? |

Qoi
Exert Force
|
Posted - 2010.08.20 18:23:00 -
[3]
Great read, thanks a lot.
Looks really reasonable, i'll look forward to fleet fight numbers when the dreaded blackscreen issue has been resolved :)
|

Hieronomus
|
Posted - 2010.08.20 18:23:00 -
[4]
forth !!!
hi
|

Dacil Arandur
|
Posted - 2010.08.20 18:25:00 -
[5]
Thanks for the blog! Seems like a very smart way to take some pressure off the solar system "location nodes." I also really like the idea of other services not directly related to the location being free of lag even if the location itself is overloaded.
Thanks again for keeping us informed!
|

Meissa Anunthiel
|
Posted - 2010.08.20 18:28:00 -
[6]
Better legends for the graphs would be appreciated, I have absolutely no clue what I'm looking at. Care to say what each colored line is?
Thanks a lot for the devblog however (and the character nodes). member of CSM 2, 3, 4 and 5. Feel free to contact me with queries. Convo, evemail or join the "meissaCSM" in-game channel. |

Alain Kinsella
Minmatar
|
Posted - 2010.08.20 18:29:00 -
[7]
Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).
Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.
[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]
|

Master Akira
Child Head Injury and Laceration Doctors
|
Posted - 2010.08.20 18:29:00 -
[8]
Originally by: CCP Atlas Our hope is that very soon our beloved Tranquility will be able to support fleet fights of a scale that far exceeds anything you've seen before, hopefully going beyond the roof of roughly one thousand on a dedicated node.
Now that's a bold statement.
This was a very interesting blog, with interesting solutions to the given problem. My question then would be:
Are you guys already working on moving the load of a single solar system to multiple cores if needed? Are you guys already biting the bullet of doing multithread? Because it seems you will HAVE to do it at some point whether you like it or not, and Oveur already stated that it was a "first step" to do...
|

Liang Nuren
Parsec Flux War.Pigs.
|
Posted - 2010.08.20 18:36:00 -
[9]
Awesome dev blog - this should really help a lot. It sounds like you guys are really doing a fantastic job, and I think you're all awesome.
For my own curiosity though: - Is the bottleneck in the database (finding/updating rows) or in the processing of individual requests (like loading/manipulating objects). It seems like if its the second, then this is really an awesome way to handle it. - If it's the second, is there a single character database or did you distribute characters onto different databases? If you distributed them, is it difficult to move characters between databases for load balancing purposes? - If you distributed it, is there an archival character database for offline/inactive characters, and perhaps a series of smaller character node databases for logged in characters which replicate to the master db?
Well, I could talk shop all day, and I probably shouldn't. But I do have a more serious question - it seems to me that the "Jita Inventory System" shouldn't be required to dump someone's stuff in a station. It seems like the interactions that can be had by docked people are limited to trade windows and local chat - neither of which I can imagine being handled by the location node. It seems like it's a perfect place to further distribute. Is this an improvement you guys are planning on making or are there things I don't know about?
I got money on the second, personally.
Also: sorry for the armchair development. A very well written blog that tangentially touches on my area of expertise.
-Liang -- Eve Forum ***** Extraordinaire On Twitter Blog
|

Aranial
Gallente Empyrean Warriors Lux Caelestia
|
Posted - 2010.08.20 18:42:00 -
[10]
Wahey! More mental nom nom .
|
|

CCP Explorer

|
Posted - 2010.08.20 18:50:00 -
[11]
Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|

Daedalus II
|
Posted - 2010.08.20 18:50:00 -
[12]
How much memory does a typical system use? How long would it take to copy all that to another node? Would it be possible for example to measure if one system spikes, then temporarily pause all other systems on that node for a few ticks and copy them to a node with more resources left? If we're talking one or a few seconds I'm sure people would accept the game lagging a few seconds if it means they don't have to be on the same node as a 1000 man fleet fight.
|

Nye Jaran
|
Posted - 2010.08.20 18:52:00 -
[13]
Originally by: CCP Explorer Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.
The same graph (network traffic) shows twice now.
|

Master Akira
Child Head Injury and Laceration Doctors
|
Posted - 2010.08.20 18:52:00 -
[14]
Originally by: CCP Explorer Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.
You duplicated the images 
|
|

CCP Explorer

|
Posted - 2010.08.20 18:54:00 -
[15]
Truly fixed now.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|

Liang Nuren
Parsec Flux War.Pigs.
|
Posted - 2010.08.20 18:55:00 -
[16]
Originally by: CCP Explorer Truly fixed now.
It's why you're a director instead of a dev. (Thanks for the fix)
-Liang -- Eve Forum ***** Extraordinaire On Twitter Blog
|
|

CCP Explorer

|
Posted - 2010.08.20 18:58:00 -
[17]
Originally by: Meissa Anunthiel Better legends for the graphs would be appreciated, I have absolutely no clue what I'm looking at. Care to say what each colored line is?
There are larger versions of the the images available by clicking them.
Figure #2 is the number of net read calls made in a given time period. Up to 80% of the calls were routed away from the Jita location node to the character nodes.
Figure #3 is the CPU usage on the Jita location node before and after.
Lower lines are "after" and lower is better.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Explorer

|
Posted - 2010.08.20 19:01:00 -
[18]
Originally by: Alain Kinsella Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).
Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.
[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]
EVE Mail is on the Character Nodes. In the first iteration we implemented Mail Nodes for EVE Mail but they then became Character Nodes in the second iteration and started hosting other services. I'll mention your concern to the devs.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|

Korerin Mayul
Amarr hirr
|
Posted - 2010.08.20 19:10:00 -
[19]
Lovley work! It must have been soul destroying re-routing all those calls, but the scalability gains make it the kind of work that our children children will thank you for!
every time you do stuff like this, eve gets a little bit smarter - that iteration is one of the reasons im still playing. Keep up the good work (after a few good beers perhaps)
|

Kaliba Mort
Minmatar Dark-Rising IT Alliance
|
Posted - 2010.08.20 19:11:00 -
[20]
It is at all possible in the near to mid-term future to make the Location node (eg. interactions of ships on same grid) a multi-threaded node? Or at least make it multi-process that shares data via shared memory?
|

Hawk TT
Caldari Bulgarian Experienced Crackers Circle-Of-Two
|
Posted - 2010.08.20 19:17:00 -
[21]
Great work! Keep going!
Could you (by any chance) share with us which services are still to be migrated from the Location nodes to the Character nodes? Or this will be posted in an upcoming blog?
Cheers! ___________________________________ Science & Diplomacy Manager @ BECKS Circle-of-Two |

Bartholomeus Crane
Gallente The Crane Family
|
Posted - 2010.08.20 19:18:00 -
[22]
This is a good blog. I liked reading it. I want to know more about the underlying distribution, what goes where, the load differences, and communication overhead, etc., but I doubt I'll ever get it. Nevermind, this method (splitting off functional entities from other functional entities) is a method that works, up to a point. Beyond that, you'll have nothing further to split away and will have to go look at multi-core partitioning, but for now it will help. Keep going (if you still can) ... Inappropriate signature removed. Zymurgist |

Malcanis
Caldari Vanishing Point. The Initiative.
|
Posted - 2010.08.20 19:27:00 -
[23]
I am very much appreciating this new, communicative CCP. Obviously we're not going to see DevBlogs and DevPosts sustained at quite this rate, but I hope the CCP staff are going to carry on this way.
Actual Information beats the hell out of speculation
And I also think there has been a great improvement in the mood of the playerbase. We're still very much waiting on real results, but a lot of us are feeling a lot more positive and optimistic that we'll get them.
Malcanis' Law: Whenever a mechanics change is proposed on behalf of "new players", that change is always to the overwhelming advantage of richer, older players. |
|

CCP Explorer

|
Posted - 2010.08.20 19:32:00 -
[24]
Originally by: Malcanis I am very much appreciating this new, communicative CCP. Obviously we're not going to see DevBlogs and DevPosts sustained at quite this rate, but I hope the CCP staff are going to carry on this way.
Actual Information beats the hell out of speculation
And I also think there has been a great improvement in the mood of the playerbase. We're still very much waiting on real results, but a lot of us are feeling a lot more positive and optimistic that we'll get them.
I do want to submit that this blog contains real live-on-TQ results (phase #2 of these changes was deployed to TQ on 12 August, phase #3 was a part of Tyrannis 1.0.4 this week on 18 August).
In addition there are dev blogs in the pipelines from other devs with other such results.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|

Callipygian Provocateur
BIG Majesta Empire
|
Posted - 2010.08.20 19:36:00 -
[25]
I'm honestly a bit surprised to hear that the location node code is, at the very least governed by, python. I would have expected more of the server side code to be lower level. However, since you mentioned running into the GIL, and because the servers are running Windows, I'm curious if there has been any exploration of IronPython.
From my understanding, at least for some applications, it's faster than CPython on Windows. It also isn't hindered by the pesky GIL and allows easy access to native Windows threads. And there was an effort underway to slip some JIT compilation under the surface. Perhaps some interest from a 'large' customer might even reinvigorate Microsoft's interest in the project.
|

ShadowMaster
Gallente
|
Posted - 2010.08.20 19:52:00 -
[26]
Thank you again for yet another amazing dev blog. Looking forward to the next one today.
|

Ford Chicago
Einherjar Rising Cry Havoc.
|
Posted - 2010.08.20 19:53:00 -
[27]
Meissa Anunthiel, the dev blog states that "each line is "the number of calls made onto the Jita [location] node during a 24 hour period". It is a bit confusing because the legend does not list the Node Ids sequentially. This would have been obvious if the legend had simply had dates instead of an internal node id.
I think that the lines in Figure 2 cover four sequential days; note that the difference in the numbers is approximately 200 which roughly corresponds with the number of nodes in the cluster. If so, it makes comparisons a bit difficult as there are known load differences on different days of the week. CCP Explorer, did you attempt to normalize the comparison against differences by day of the week in order to accurately quantify the benefit of this change or are you just showing us raw data?
I also found it interesting that "up to 80% of the calls were routed elsewhere" (other than the Location node) but the cpu utilization of the Location node only dropped 5-15% points. This means that 20% of the calls are responsible for the majority of cpu usage.
CCP Explorer, can you go into more detail about which types of calls generate the most cpu utilization? Which types of calls have been handed to the Character nodes besides mail. What are the 5-6 calls made on a jump event that *don't* need to be handled by the location node?
I found this to be one of the more interesting of the recent dev blogs, but even so, all it really says is that some things that used to be handled by the Location node are now handled elsewhere. As a programmer I suspect my interest is on the more technical side than the average player, but I'm frustrated with the recent "dev blogs" that seem more like marketing material.
|

Agrilad
|
Posted - 2010.08.20 20:07:00 -
[28]
A thought just occured to me that I am sure has occured to y'all.
Why if there are 4 calls that always have to be made every jump. Why don't you combine them into 1 call. So the 4 different round trips over net don't have to occur?
What are those 4 calls?
Was distracted but was given a second to think. I may have answered my own question. Perhaps those graphs and data aren't the call's over internet, but the call's inside your proxy's and load balancers. So perhaps the call to make a jump is a single call from the client, but takes 4 seperate calls to 4 different nodes to complete.
|

James Bryant
|
Posted - 2010.08.20 20:17:00 -
[29]
Hey guys,
Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.
My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.
No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?
-JB
|

Ix Forres
Caldari Righteous Chaps
|
Posted - 2010.08.20 20:20:00 -
[30]
Originally by: Callipygian Provocateur Edited by: Callipygian Provocateur on 20/08/2010 19:38:10 I'm honestly a bit surprised to hear that the location node code is, at the very least governed by, python. I would have expected more of the server side code to be lower level. However, since you mentioned running into the GIL, and because the servers are running Windows, I'm curious if there has been any exploration of IronPython.
From my understanding, at least for some applications, it's faster than CPython on Windows. It also isn't hindered by the pesky GIL and allows easy access to native Windows threads. And there was an effort underway to slip some JIT compilation under the surface. Perhaps some interest from a 'large' customer might even reinvigorate Microsoft's interest in the project.
*Edit* Also, thanks for yet another awesome blog post. I <720 (that's <3!! [yes, I'm a math nerd]) this kind of info.
Nearly all of EVE is written in Python. A particular version called Stackless Python. IronPython has a number of significant drawbacks to go with the small positives, and last time I heard the project was being slowly abandoned by MS along with IronRuby. Stackless, however, is being developed heavily by people inside CCP; this is a big enough deal that they write their own version of Python for all intents and purposes. I don't think that any benefits would outweigh the huge work required to port away from it, not to mention losing things like StacklessIO which have been major CCP projects in the past to deliver significant IO improvements. This is one area where CCP really is on top of the game.
There's also the fact that quite a lot of the stuff on the server _can't_ be done in a thread-friendly way, and anything that could be threaded would still be limited to running on one core since that's how the LBUs work (as I understand it) so there's no advantage to threading over their existing methodology, which is to use stackless tasklets (which are like threads, without the overhead). An equivalent in Ruby would be fibers; I'm sure there's other equivalents for other languages.
Back to the blog post; there's a lot of great info and the separation of more code from the Location node to other nodes for the purposes of load balancing is an interesting approach to take, and seems to be delivering. While this will of course help what further steps are being undertaken to improve performance and to either decrease load on the Location nodes, or split up the processing tasks on the node under load? What about transparent node movements and other ideas that've been thrown around in the past; has anything come of these, or are other methods being focused on before attacking those potentially more time consuming issues?
Either way, great informative blog from Atlas, a nice read. -- Ix Forres - 3rd Party Application Developer - EVE Metrics - accVIEW
|

Jinquoi
JSR1 AND GOLDEN GUARDIAN PRODUCTIONS Black Core Alliance
|
Posted - 2010.08.20 20:21:00 -
[31]
Wow! Congrats for a techless geek with on an amoebic brain cell you have just made understandable a very technical topic. Thanks!
|

Mynas Atoch
Eternity INC. -Mostly Harmless-
|
Posted - 2010.08.20 20:23:00 -
[32]
This is probably the best devblog of the bunch released this week. Its focussed, informative, positive, describes the historical design, the improvement, and the success of the implementation. It even has evidence of before and after. I'll take one of these instead of four hurf and blurf devblogs any day of the week.
You are doing what your peers are not. Giving us a peek behind the curtain, not of vague promises and waffle, nor cool gee whiz features, nor even monologues wrapped up in post doctoral jargon, but of something we can understand and appreciate.
The rest of you please take note.
thanks
![]() |

Zendoren
Aktaeon Industries
|
Posted - 2010.08.20 20:25:00 -
[33]
Edited by: Zendoren on 20/08/2010 20:26:16 Best blog thus far!
However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.
Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.
|
|

CCP Atlas

|
Posted - 2010.08.20 20:33:00 -
[34]
Originally by: Ford Chicago CCP Explorer, did you attempt to normalize the comparison against differences by day of the week in order to accurately quantify the benefit of this change or are you just showing us raw data?
This is raw data but the 4 runs were quite similar in terms of a population and usage profile.
Originally by: Ford Chicago I also found it interesting that "up to 80% of the calls were routed elsewhere" (other than the Location node) but the cpu utilization of the Location node only dropped 5-15% points. This means that 20% of the calls are responsible for the majority of cpu usage.
Yes, that is a very good observation. This isn't giving us an 80% gain in terms of CPU since the calls that were routed elsewhere are much lighter than the ones that need to remain.
Originally by: Ford Chicago CCP Explorer, can you go into more detail about which types of calls generate the most cpu utilization? Which types of calls have been handed to the Character nodes besides mail. What are the 5-6 calls made on a jump event that *don't* need to be handled by the location node?
Some examples of calls that now get routed to the character nodes are lookups of characters, corps and alliances (something that happens all the time when you see someone in your overview for example), certain show-info operations, sov info, some station info, etc, etc. It's all over the place, which is the reason it hasn't been structured properly up until now. Programmers have typically thought "I have this teeny tiny call, I'll just stick it on the location node".
Originally by: Ford Chicago I found this to be one of the more interesting of the recent dev blogs, but even so, all it really says is that some things that used to be handled by the Location node are now handled elsewhere. As a programmer I suspect my interest is on the more technical side than the average player, but I'm frustrated with the recent "dev blogs" that seem more like marketing material.
Thanks. :) Like I mentioned, it's just a bunch of little things, most of them very light calls but they add up to a big bunch of traffic.
|
|

Zendoren
Aktaeon Industries
|
Posted - 2010.08.20 20:33:00 -
[35]
Originally by: Alain Kinsella Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).
Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.
[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]
Try clearing your mail cache within the Esc Option menu (on last tab).
|
|

CCP Atlas

|
Posted - 2010.08.20 20:41:00 -
[36]
Originally by: James Bryant Hey guys,
Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.
My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.
No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?
Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.
With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.
|
|
|

CCP Atlas

|
Posted - 2010.08.20 20:50:00 -
[37]
Originally by: Zendoren Edited by: Zendoren on 20/08/2010 20:26:16 Best blog thus far!
However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.
Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.
This does not change the topology of the cluster at all, and is a perfect fit for its existing layout. From the client's point of view the network is:
Client -> Proxy -> Sol -> SQL Server
The 'Sol' tier can be any node in the cluster while the rest of the layers are exactly 1 for each client. For the sol nodes as you saw in Figure 1 in the blog, you maintain a virtual connection to several sol's at a time depending on the request context. It's all transparent to the application logic and pretty nifty and easy to work with. We do need to place certain restrictions on game design in order to maintain this schema, but it's the architecture that Eve was founded upon.
(I'm not mentioning above that there is a hardware load balancer in front of the proxy tier which picks a proxy for you when you connect since that will just confuse the layout)
|
|

Herschel Yamamoto
Agent-Orange Nabaal Syndicate
|
Posted - 2010.08.20 21:05:00 -
[38]
Those are some very impressive graphs you've got there. A few questions. What impact will this have on Jita - what will the new pop cap be? How does this seem to be affecting the jump-in lag that has plagued fleet fighting in recent months? And how will this affect lag in contexts other than people jumping into systems - does it speed things up for people who are in system doing things, or just on system load?
And thanks for a great week of dev blogs, all involved. I even understood like 2/3 of it. === "The data does not support that polished quality sells better than new features" "Once Incarna and Dust are fully implemented, focus will probably shift far more towards improvement" CCP, FTW? |
|

CCP Atlas

|
Posted - 2010.08.20 21:16:00 -
[39]
Originally by: Liang Nuren Edited by: Liang Nuren on 20/08/2010 18:36:26 Awesome dev blog - this should really help a lot. It sounds like you guys are really doing a fantastic job, and I think you're all awesome.
For my own curiosity though: - Is the bottleneck in the database (finding/updating rows) or in the processing of individual requests (like loading/manipulating objects). It seems like if its the second, then this is really an awesome way to handle it. - If it's the second, is there a single character database or did you distribute characters onto different databases? If you distributed them, is it difficult to move characters between databases for load balancing purposes? - If you distributed it, is there an archival character database for offline/inactive characters, and perhaps a series of smaller character node databases for logged in characters which replicate to the master db?
Well, I could talk shop all day, and I probably shouldn't. But I do have a more serious question - it seems to me that the "Jita Inventory System" shouldn't be required to dump someone's stuff in a station. It seems like the interactions that can be had by docked people are limited to trade windows and local chat - neither of which I can imagine being handled by the location node. It seems like it's a perfect place to further distribute. Is this an improvement you guys are planning on making or are there things I don't know about?
I got money on the second, personally.
Also: sorry for the armchair development. A very well written blog that tangentially touches on my area of expertise.
-Liang
Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.
We only have a single database and it's easier to scale that up than the sol nodes and we're already ahead of the curve in terms of what the DB can deliver. We do cache very aggressively on the server though and consolidating these character node calls onto a half a dozen nodes rather than servicing them throughout the cluster does remove a bit of the DB load since we get more cache hits, but like I said, the DB is not a big issue in this regard today. What this particular change saves us mostly is having to process relatively light and simple calls on a given node.
The inventory system is what lies at the heart of Jita's cpu cycles and it's really just a glorified DB cache. Moving items about and interacting with them causes a cascade of all sorts of events that must be handled by the game systems on that node. Therefore it's not really feasible to offload parts of those operations elsewhere.
Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)
Interesting tidbit about Guido-and-the-GIL. I need to google it.
|
|

Zendoren
Aktaeon Industries
|
Posted - 2010.08.20 21:23:00 -
[40]
Edited by: Zendoren on 20/08/2010 21:23:59
Originally by: CCP Atlas
Originally by: Zendoren Edited by: Zendoren on 20/08/2010 20:26:16 Best blog thus far!
However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.
Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.
This does not change the topology of the cluster at all, and is a perfect fit for its existing layout. From the client's point of view the network is:
Client -> Proxy -> Sol -> SQL Server
The 'Sol' tier can be any node in the cluster while the rest of the layers are exactly 1 for each client. For the sol nodes as you saw in Figure 1 in the blog, you maintain a virtual connection to several sol's at a time depending on the request context. It's all transparent to the application logic and pretty nifty and easy to work with. We do need to place certain restrictions on game design in order to maintain this schema, but it's the architecture that Eve was founded upon.
(I'm not mentioning above that there is a hardware load balancer in front of the proxy tier which picks a proxy for you when you connect since that will just confuse the layout)
Sorry I confused you for Soundwave, Atlas
|
|

CCP Atlas

|
Posted - 2010.08.20 21:24:00 -
[41]
Originally by: Herschel Yamamoto Those are some very impressive graphs you've got there. A few questions. What impact will this have on Jita - what will the new pop cap be? How does this seem to be affecting the jump-in lag that has plagued fleet fighting in recent months? And how will this affect lag in contexts other than people jumping into systems - does it speed things up for people who are in system doing things, or just on system load?
And thanks for a great week of dev blogs, all involved. I even understood like 2/3 of it.
This change isn't going to multiply the number of people we can cram into Jita, but I'm hopeful that it will give us 10-20% yield in population. We are taking it slow in Jita and have the population cap set at 1500 now, we will increase it once we see Jita handling that well. We would rather see a lag-free Jita at 1500 than laggy at 1800.
This will have a positive impact on the jump in lag for fleets since many of the calls that slow down the jumping are now serviced immediately elsewhere, leaving the location node free to do the important bits. This isn't a fix for jump-in lag though. We have some hopeful actual fixes (serious mitigation anyway) in the pipes for immediate future though. More blogs on that soon.
This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.
|
|

Manfred Rickenbocker
Pan Galactic Gargle Blasters Important Internet Spaceship League
|
Posted - 2010.08.20 21:38:00 -
[42]
Edited by: Manfred Rickenbocker on 20/08/2010 21:40:38 I notice y'all dont have "Station" nodes. Is there a reason you cant break station traffic (such as docked pilots list, fittings, inventory, industry, etc etc) on a separate node people pew-pewing outside in their very important internet spaceships? I figure if you did that, you might be able to help split your traffic between those who are station spinning to those zooming around in space. ------------------------ Peace through superior firepower: a guiding principle for uncertain times. |

Sered Woollahra
Gallente Independent Traders and Builders MPA
|
Posted - 2010.08.20 22:08:00 -
[43]
You know, the information contained in this series of blogs should be combined & edited into a comprehensive case study on MMO infrastructure scaling and performance troubleshooting. It would make a terrific read for anyone interested in high performance/availability environments. It may be better to wait for some concrete results though :-)
|

Blue Harrier
Gallente
|
Posted - 2010.08.20 22:17:00 -
[44]
Originally by: CCP Atlas
Snip -
This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.
Funny you should say that but after the last patch I was talking to my son (he was in 0.0 and I was in Essence in Empire), and I asked him did he notice the client seemed æsnappierÆ to use and he replied that he was thinking the same and about to ask me.
So it looks like you on the right track, keep up the good work and thanks for some great blogs that even I can understand (well some of it ).
|

Cailais
Amarr THE ORDAINED
|
Posted - 2010.08.20 22:28:00 -
[45]
I hope you guys fix EVE soon. 90% of my buddies no longer log in and ones quit completely (I did get his stuff though ;) )
C.
the hydrostatic capsule blog
|

TornSoul
BIG Majesta Empire
|
Posted - 2010.08.20 22:37:00 -
[46]
Quote:
We have multiple market regions living on a single node and currently four nodes servicing all the market regions. If the load on the market increases we can just increase the number of nodes dedicated to that task and decrease the number of markets on a given node.
When exactly did this happen???
I recall from far back (years) that that was one of the holy grails you where working on. It was my impression (not announced? or me not catching it?) that this hadn't been achieved yet.
Reading the blog it comes of as if this has been in place some time (how I read it anyhow) Is this correct - or is it in fact part of the described change(s) - I.e. a recent thing?
/confused...
---
Oh and - Awesome, awesome (series of) blog(s)
---
And seeing Oveur active on the forums again, and even torfi, really does wonders for the "karma bank account". Please keep it up guys.
BIG Lottery |

Camios
Minmatar Insurgent New Eden Tribe
|
Posted - 2010.08.20 22:39:00 -
[47]
Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?
|

Xianthar
STK Scientific The Initiative.
|
Posted - 2010.08.20 22:52:00 -
[48]
Originally by: Liang Nuren
Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.
Be nice if it were looked at again, i know there was a patch back around 1.5ish that removed the GIL and implemented fine-grain locking which lost support because performance was much worse on single core systems and didn't start to shine till 3+ cores. But that was ~10 ten years ago, prior to quad cores being the norm and 6-8 core cpu's + hyperthreading virtual cores being the performance segment. Perhaps the trade off makes much more sense now.
Then again with the recent change to python 2.7 from 2.5 CCP picked up the multiprocessing package that was added in 2.6, maybe that they plan for spreading out node work loads.
|
|

CCP Explorer

|
Posted - 2010.08.20 22:53:00 -
[49]
Originally by: TornSoul
Quote: We have multiple market regions living on a single node and currently four nodes servicing all the market regions. If the load on the market increases we can just increase the number of nodes dedicated to that task and decrease the number of markets on a given node.
When exactly did this happen???
I recall from far back (years) that that was one of the holy grails you where working on. It was my impression (not announced? or me not catching it?) that this hadn't been achieved yet.
Reading the blog it comes of as if this has been in place some time (how I read it anyhow) Is this correct - or is it in fact part of the described change(s) - I.e. a recent thing?
The market has been run on its own set of nodes for years.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|

TornSoul
BIG Majesta Empire
|
Posted - 2010.08.20 23:05:00 -
[50]
Originally by: CCP Explorer The market has been run on its own set of nodes for years.
Thanks for a quick! answer.
---
But dang-nabbit.. Now I have to go find that post I made a couple of weeks ago where I made a smart remark about this not having happened yet.
Do send me an EVE mail next time 
BIG Lottery |
|

CCP Atlas

|
Posted - 2010.08.20 23:09:00 -
[51]
Originally by: Camios Edited by: Camios on 20/08/2010 22:52:22 Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?
The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.
There is not a massive amount of work done in the chat channel though. Typing in local doesn't impact the server much but it does play a role in whether your client recovers or not when your session is hurting.
|
|

Jim Luc
Caldari Rule of Five Lucky Starbase Syndicate
|
Posted - 2010.08.20 23:15:00 -
[52]
Originally by: CCP Atlas
Originally by: Camios Edited by: Camios on 20/08/2010 22:52:22 Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?
The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.
There is not a massive amount of work done in the chat channel though. Typing in local doesn't impact the server much but it does play a role in whether your client recovers or not when your session is hurting.
What is the possibility of using a proxy node for these types of things? For instance, the location node is needed, yes, but could the location node dispatch events for when someone enters, and leaves the system - but the chat and client will be running on a completely separate node, listening for any change in activity. It seems to me that eliminating the chat from a node will drastically reduce load.
|

Mashie Saldana
BFG Tech
|
Posted - 2010.08.20 23:27:00 -
[53]
With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?
18 months |

Alain Kinsella
Minmatar
|
Posted - 2010.08.20 23:49:00 -
[54]
Originally by: CCP Atlas
Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.
With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.
*ears perk up*
That sounds a lot like VMWare VMotion, or the similar 'live migrate' feature in Solaris LDOM. [Not sure if Hyper-V has something similar.]
I keep getting this vision of the Sol node as a Stackless interpreter setup to run as a mini-VM. 
As for Mail, cache clearing may have fixed. Still checking through it though (my other guy is on one of the Bulk lists, that should be a reasonable test). Will bug report if I can 'repro consistently (I was able to before, will see now).
|

The Paperwork
|
Posted - 2010.08.21 00:13:00 -
[55]
Quote: The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.
So... and I'm just spitballin' here... what about letting "local" die in a lag fire, and just having big fleet fight grid nodes?
|

Ephemeral Waves
Silver Snake Enterprise
|
Posted - 2010.08.21 00:53:00 -
[56]
Quote: ...support fleet fights of a scale that far exceeds anything you've seen before...
We'd be happy if it would support fleet fights of a scale that we've ALREADY seen before rather than the current screwed up situation.
|

Frug
Omega Wing
|
Posted - 2010.08.21 00:59:00 -
[57]
Dude. A bunch of beige towers leading into an original imac?
No wonder there's lag issues.
- - - - - - - - - Do not use dotted lines - - - - - - If you think I'm awesome say BOOO BOOO!! - Ductoris Neat look what I found - Kreul Whisper/PrismX 4 emperor |

Jim Luc
Caldari Rule of Five Lucky Starbase Syndicate
|
Posted - 2010.08.21 01:06:00 -
[58]
Originally by: Frug Dude. A bunch of beige towers leading into an original imac?
No wonder there's lag issues.
I LOL'd 
|
|

CCP Atlas

|
Posted - 2010.08.21 01:11:00 -
[59]
Originally by: Mashie Saldana With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?
Actually, the NPC AI is a perfect example of a system that needs to live on the location node. However, there are not any outstanding scalability issue with that.
There are no technical reasons Sleeper AI or something akin to that isn't on more or all NPC's, it's a game mechanical / balancing issue which is outside my expertise... 
|
|
|

CCP Atlas

|
Posted - 2010.08.21 01:12:00 -
[60]
Originally by: Jim Luc
Originally by: Frug Dude. A bunch of beige towers leading into an original imac?
No wonder there's lag issues.
I LOL'd 
I was wondering when someone would comment on that 
|
|

Yuki Kulotsuki
|
Posted - 2010.08.21 01:19:00 -
[61]
Edited by: Yuki Kulotsuki on 21/08/2010 01:19:37
Originally by: CCP Atlas freeing Jita up for important things like ... scams in local
I may just have a new signature.
Originally by: CCP Lemur THIS IS GOD: ... IF YOU HAVE ANY MORE REQUESTS I'M AVAILABLE SUNDAY FROM 10:30 TO 12:00 TO RECEIVE YOUR PRAYERS.
|

Hienz Doofenshmirtz
|
Posted - 2010.08.21 02:33:00 -
[62]
does this make 5 dev blogs in 5 days. stop writing blogs and get back to coding
just kidding thanks for keeping us in the loop, and writing dev blogs that almost no one will really understand. keep on trucking and keeping us informed, if you need a ghost writter for your blogs so you can keep coding, I'm looking for a new job.
|

ghosttr
Amarr Muppet Factory
|
Posted - 2010.08.21 02:42:00 -
[63]
I have a question regarding location stuffs
When you are in a system does the node calculate all of the distances from you -> another object. Or does it just give all clients in system a 'broadcast' of objects?
Also are static objects (planets, gates) in the target system loaded (from the client) upon jumping, or do they have to be transmitted by the server aswell?
Prospecting! |

dakin
Minmatar Starfish Operating Syndicate Annwn Matari
|
Posted - 2010.08.21 04:25:00 -
[64]
Quote: Even though the metrics that we have gathered are for the Jita node, this change will have a positive effect on all loaded nodes in the cluster--with Jita and Fleet Fight nodes benefiting the most. The reason I use Jita here is that it has a very predictable load pattern whereas fleet fights are anything but. However, the same principles apply. Before this change you would be making something like 5-10 server calls to your location node to finish jumping, each one of these calls could take a long time to complete. Now you'll be making something like 4, with the rest returning very quickly.
We're hoping you'll be able to tell the difference the next time you decide to invade your nearest friendly neighbor.
Was this applied to Singularity? Because after today's mass test, I can say there was no improvement on jumping.
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.21 04:28:00 -
[65]
Originally by: CCP Atlas This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.
Yeah.. map is way more speedy than it was - VERY GOOD work there. If the mail just could handle as well?! I just cleared the cache of my mail and it seems a bit faster now.. why isn't this 'clear the cache' an automated process when closing down the client?
On a more related note.. what comprises the services the location node is handling actually? Would you be able to give us a list of stuff this node handles pretty please?
And yeah.. best blog of the lot so far!
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.21 04:39:00 -
[66]
Originally by: CCP Atlas
Originally by: Mashie Saldana With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?
Actually, the NPC AI is a perfect example of a system that needs to live on the location node. However, there are not any outstanding scalability issue with that.
There are no technical reasons Sleeper AI or something akin to that isn't on more or all NPC's, it's a game mechanical / balancing issue which is outside my expertise... 
Why does the NPC AI need to live on the location node? What services do (and need) to live on the location node anyways.. hm, I'm repeating myself here.. 
|

Siiee
Recycled Heroes
|
Posted - 2010.08.21 05:15:00 -
[67]
Originally by: Tres Farmer Why does the NPC AI need to live on the location node? What services do (and need) to live on the location node anyways.. hm, I'm repeating myself here.. 
If I'm understanding it right you can think of the location node as the "local" node. You can only interact (pew pew) with NPCs that are on grid with you at your location (and the grid is attached to your solar system instance), and thus it has to be on the location node.
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.21 05:31:00 -
[68]
Edited by: Tres Farmer on 21/08/2010 05:34:53
Originally by: Siiee
Originally by: Tres Farmer Why does the NPC AI need to live on the location node? What services do (and need) to live on the location node anyways.. hm, I'm repeating myself here.. 
If I'm understanding it right you can think of the location node as the "local" node. You can only interact (pew pew) with NPCs that are on grid with you at your location (and the grid is attached to your solar system instance), and thus it has to be on the location node.
I can interact (pew pew) with other players who are NOT running their thought-processes on that local node. The reason we got so dumb NPC might be just because the sleeper AI uses x-times more cpu and is only bearable in w-space where you don't got so many PvE'ers as in k-space. You see where I'm going with this?! Let's say CCP manages to 'outsource' NPC AI onto other node(s). Want more intelligent NPC? Just install more NPC nodes..
I can't argue on facts here as my knowledge of the server and the processes are so tiny. So any insight from Atlas is highly appreciated.
So again.. a table with services running on the local node and their load (might even need diff cases listed like Jita/PvE/PvP) would be needed for us all to look at..
|

Nareg Maxence
Gallente
|
Posted - 2010.08.21 05:34:00 -
[69]
Originally by: Tres Farmer Why does the NPC AI need to live on the location node? What services do (and need) to live on the location node anyways.. hm, I'm repeating myself here.. 
The AI needs to control the NPCs which are part of the space simulation. It needs do decide whether the NPC flies up or down or left or right or shoots at you or warps away. Anything that's part of the space simulation needs to be on the location node.
I guess it is technically possible that you could treat NPCs as a ships controlled by a computer player, just like we have our ships and we control them via the client. The question is what that would accomplish. In low load systems with people just ratting or mining or whatever, there wouldn't be much point in trying to offload the probably fairly low contribution to load that the AI accounts for. In high load systems where there are fleet fights or system like jita, people aren't interacting with the AI, so it's idle anyway.
The only situation where I could see a possible benefit, would be in mission hubs, but then you will have to replace the direct interaction of the AI with a message passing system to get the commands from the AI node to the location node and back. Thinking of the amount of NPCs that spawn and despawn all the time, it seems to me that this would increase network trafic on the node by an unacceptable amount, and would there then be a benefit at all?
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.21 05:45:00 -
[70]
Edited by: Tres Farmer on 21/08/2010 05:45:39
Originally by: Nareg Maxence
Originally by: Tres Farmer Why does the NPC AI need to live on the location node? What services do (and need) to live on the location node anyways.. hm, I'm repeating myself here.. 
The AI needs to control the NPCs which are part of the space simulation. It needs do decide whether the NPC flies up or down or left or right or shoots at you or warps away. Anything that's part of the space simulation needs to be on the location node.
I guess it is technically possible that you could treat NPCs as a ships controlled by a computer player, just like we have our ships and we control them via the client. The question is what that would accomplish. In low load systems with people just ratting or mining or whatever, there wouldn't be much point in trying to offload the probably fairly low contribution to load that the AI accounts for.
Low load is never the reason to something, that's right.
Originally by: Nareg Maxence In high load systems where there are fleet fights or system like jita, people aren't interacting with the AI, so it's idle anyway.
Well it doesn't need to be idle then, does it? At the moment PvE in high load systems isn't worth it because of the problems it causes.. What was there first - the egg or the chuck?
Btw.. we don't have orbiting planets/moons/stations because of the load they cause on the server.. ever thought of a way around this? Have a Orbital-Bodies-Node and let it do the calcs needed for all the stuff that's supposed to be moving. If a player enters system his client talks to this node and get's up to date information about all the orbiting bodies and their status. We should have orbiting planets. No?
Originally by: Nareg Maxence The only situation where I could see a possible benefit, would be in mission hubs, but then you will have to replace the direct interaction of the AI with a message passing system to get the commands from the AI node to the location node and back. Thinking of the amount of NPCs that spawn and despawn all the time, it seems to me that this would increase network traffic on the node by an unacceptable amount, and would there then be a benefit at all?
I have no fricking clue because I have no idea about the services running on the location node nor the load each of the services cause under all those cases.. do you?
|

Siiee
Recycled Heroes
|
Posted - 2010.08.21 05:54:00 -
[71]
Originally by: Tres Farmer I can interact (pew pew) with other players who are NOT running their thought-processes on that local node.
The "thought process" is irrelevant (and absolutely trivial, even Sleeper AI is still fairly basic from my experience with it, only barely more "intelligent" and more random).
Normal NPC AI tends to boil down to something like this (bolded the important part) -- if(ships_on_grid) shootAt(first_ship_on_grid) --
selecting a target from a list, not difficult, but that information of the first ship on the grid is only available in the solar system simulation, and so has to occur on the location node. If you were to run NPC AI on a separate node it would need to have all of that grid information available from the location node, and you'd end up having to maintain two separate copies of the same information, lots of additional overhead, not worth it.
They would be better off working on breaking the location node away from a single Sol (think a grid-sized location) which would accomplish a lot more in the grand scheme of things.
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.21 06:14:00 -
[72]
Originally by: Siiee
Originally by: Tres Farmer I can interact (pew pew) with other players who are NOT running their thought-processes on that local node.
The "thought process" is irrelevant (and absolutely trivial, even Sleeper AI is still fairly basic from my experience with it, only barely more "intelligent" and more random).
Normal NPC AI tends to boil down to something like this (bolded the important part) -- if(ships_on_grid) shootAt(first_ship_on_grid) --
We'll who here would like more intelligent NPC? Ok, who here does have an idea or know if this is possible with the current implementation of things.. like NPC AI on the location node?
Originally by: Siiee selecting a target from a list, not difficult, but that information of the first ship on the grid is only available in the solar system simulation, and so has to occur on the location node. If you were to run NPC AI on a separate node it would need to have all of that grid information available from the location node, and you'd end up having to maintain two separate copies of the same information, lots of additional overhead, not worth it.
How much information is that? The position of non-NPC ships, aka players on the local grid? Wow.. that's really a big chunk of information there..
Look.. the space-simulation-service does all the calcs for physics, the weapons, the movements.. all that buzz.. NPC AI doesn't need access to that data. Players can't get access to that data either for others on the same grid. Why should NPC need that?
Originally by: Siiee They would be better off working on breaking the location node away from a single Sol (think a grid-sized location) which would accomplish a lot more in the grand scheme of things.
and how do you streamline things? Do grids with POS or Custom Office need a NPC AI service idling around?
|

Jita Dancer
|
Posted - 2010.08.21 09:58:00 -
[73]
"Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.
With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node."
Why dont you go the other way? When a node gets busy, it remaps ALL THE NON BUSY SYSTEMS somewhere else? Perhaps not as beneficial as a busy node moving itself, but by definition - the non-busy nodes are either empty or have a very small number of players attached and will (more) likely be able to successfully hand those players off onto a different node, leaving the bulk of players not exposed to a "high load handover"... Just a thought.
|

Dragonia Redtail
|
Posted - 2010.08.21 10:32:00 -
[74]
Can CCP please explain to me how we are going to know when and where a fleet fight takes place?
In case of killing a pos, yes its known what we attack and where. But ehm, a 250 man roaming gang is not easy to announce in advance.
Is there not an option to be found to get a node to assist the other node when it gets up to 90% load and starts trowing up?
After all, you know we all love this game, but the lag monster involved is our biggest target....
|

Mashie Saldana
BFG Tech
|
Posted - 2010.08.21 11:34:00 -
[75]
Originally by: CCP Atlas
Originally by: Mashie Saldana With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?
Actually, the NPC AI is a perfect example of a system that needs to live on the location node. However, there are not any outstanding scalability issue with that.
There are no technical reasons Sleeper AI or something akin to that isn't on more or all NPC's, it's a game mechanical / balancing issue which is outside my expertise... 
That is interesting so server load isn't a reason to keep the current (near nonexistant) NPC AI?
18 months |

Kolatha
|
Posted - 2010.08.21 11:44:00 -
[76]
Originally by: Jita Dancer
Why dont you go the other way? When a node gets busy, it remaps ALL THE NON BUSY SYSTEMS somewhere else? Perhaps not as beneficial as a busy node moving itself, but by definition - the non-busy nodes are either empty or have a very small number of players attached and will (more) likely be able to successfully hand those players off onto a different node, leaving the bulk of players not exposed to a "high load handover"... Just a thought.
I would think this would be the better option. When a system starts getting busy it means something is going down. If you move that busy system with all the participants ramping up their game you are just asking for trouble, unless you can do it smoothly and seamlessly. Moving the busy node also means you need to make sure you have sufficient idle nodes waiting just for this purpose.
On the other hand I can see how a number of large roving gangs could cause some pretty hefty internal bandwidth usage as they cause system after system to get hit with remapping.
|
|

CCP Atlas

|
Posted - 2010.08.21 11:50:00 -
[77]
Originally by: Kolatha
Originally by: Jita Dancer
Why dont you go the other way? When a node gets busy, it remaps ALL THE NON BUSY SYSTEMS somewhere else? Perhaps not as beneficial as a busy node moving itself, but by definition - the non-busy nodes are either empty or have a very small number of players attached and will (more) likely be able to successfully hand those players off onto a different node, leaving the bulk of players not exposed to a "high load handover"... Just a thought.
I would think this would be the better option. When a system starts getting busy it means something is going down. If you move that busy system with all the participants ramping up their game you are just asking for trouble, unless you can do it smoothly and seamlessly. Moving the busy node also means you need to make sure you have sufficient idle nodes waiting just for this purpose.
On the other hand I can see how a number of large roving gangs could cause some pretty hefty internal bandwidth usage as they cause system after system to get hit with remapping.
Yes, indeed. Today we do either, depending on the situation. Sometimes the system itself is moved and sometimes the other systems, leaving the fight alone.
|
|
|

CCP Atlas

|
Posted - 2010.08.21 11:57:00 -
[78]
Originally by: Tres Farmer ... And why does it take +30seconds for the mail window to be responsive after opening it the first time in the actual client session? This just looks broken! What is it waiting for? New mails? Why can't it show the old mails.. all that had been cached already and should be saved locally on my hdd right away with the menu on the left? And then if new ones come in have it show them.. Making me wait for this is not user friendly! ...
This doesn't sound like the way it should work. Can you submit a bug report for us? Mention me please so that I get it.
|
|

Selene D'Celeste
Caldari The D'Celeste Trading Company ISK Six
|
Posted - 2010.08.21 13:19:00 -
[79]
Edited by: Selene D''Celeste on 21/08/2010 13:19:28
Originally by: CCP Atlas
Originally by: Tres Farmer ... And why does it take +30seconds for the mail window to be responsive after opening it the first time in the actual client session? This just looks broken! What is it waiting for? New mails? Why can't it show the old mails.. all that had been cached already and should be saved locally on my hdd right away with the menu on the left? And then if new ones come in have it show them.. Making me wait for this is not user friendly! ...
This doesn't sound like the way it should work. Can you submit a bug report for us? Mention me please so that I get it.
I believe this happens if you haven't cleaned out your evemail cache for a long time. When I remove all old mails from trash, etc, the mail system loads much faster. Over time as mails accumulate it slows down.
Also thanks for the blog. It's good to see logical, common sense improvements being made to the server topology. If you guys keep at it we should be zooming around again in a few more months =D
Originally by: CCP Atlas Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)
This would be another nice improvement along the same line of thought. Glad to see that you're looking into it. ______________________________
|

Louis deGuerre
Gallente Amicus Morte Shock an Awe
|
Posted - 2010.08.21 16:35:00 -
[80]
Excellent blog, keep em coming  Sol: A microwarp drive? In a battleship? Are you insane? They arenÆt built for this! Clear Skies - The Movie
|

Caladain Barton
Navy of Xoc Wildly Inappropriate.
|
Posted - 2010.08.21 18:58:00 -
[81]
Edited by: Caladain Barton on 21/08/2010 18:58:55 CCP, we were having 1000 man battles on un-reinforced nodes without them crashing. you could squeeze 2k into a reinforced node back before dominion.
Just so you know how high the bar *was* at. Are you planning on restoring the game to that level, or to the 500 man battle on regular node, and 1000 on the special nodes?
|

Luke S
Zeta Corp.
|
Posted - 2010.08.21 20:11:00 -
[82]
I got a little off topic question and maybe thinking a little too far ahead. Have you guys thought about the servers for Dust 514? Will they be going through XBox live servers/ PlayStation network? or will you have dedicated servers for your MMOFPS. How will they be talking to each other.
If this is too early, Stick this post in your Dev blog list. I like to hear how it would work. ---
|

d4shing
|
Posted - 2010.08.21 23:05:00 -
[83]
Edited by: d4shing on 21/08/2010 23:07:58 Edit: whoops, accidentally posted blank message...
Anyways, I wanted to suggest that you guys use some sort of algorithms to pick which nodes to RF without being told by players.
I mean, you have timers for all these ihubs and POSes and SBUs coming out of reinforced, right?
Surely no more than 5ish ihub ref cycles come up on a given day... why not just automatically RF those nodes?
Seems like it would be pretty easy to do...
|

Tres Farmer
Gallente Federation Intelligence Service
|
Posted - 2010.08.22 02:43:00 -
[84]
Originally by: CCP Atlas
Originally by: Tres Farmer ... And why does it take +30seconds for the mail window to be responsive after opening it the first time in the actual client session? This just looks broken! What is it waiting for? New mails? Why can't it show the old mails.. all that had been cached already and should be saved locally on my hdd right away with the menu on the left? And then if new ones come in have it show them.. Making me wait for this is not user friendly! ...
This doesn't sound like the way it should work. Can you submit a bug report for us? Mention me please so that I get it.
Bug Report ID is 99609. Good luck!
|

Bagehi
Association of Commonwealth Enterprises R.A.G.E
|
Posted - 2010.08.22 07:31:00 -
[85]
Originally by: Tres Farmer Edited by: Tres Farmer on 22/08/2010 02:48:11
Originally by: CCP Atlas
Originally by: Tres Farmer ... And why does it take +30seconds for the mail window to be responsive after opening it the first time in the actual client session? This just looks broken! What is it waiting for? New mails? Why can't it show the old mails.. all that had been cached already and should be saved locally on my hdd right away with the menu on the left? And then if new ones come in have it show them.. Making me wait for this is not user friendly! ...
This doesn't sound like the way it should work. Can you submit a bug report for us? Mention me please so that I get it.
Bug Report ID is 99609. Good luck! [Edit] I cleaned the cache of my mail 24hours ago I think.. but, yeah.. never before that afaik. But to be honest, I don't do this with thunderbird either and that one gets WAY more mails on a daily basis.
Too bad I couldn't get an answer on my question for more detail about those services/processes running on those location node(s).  Or is there a blog coming in about those details? Would love it!
My mailbox also takes quite a while to load up. Usually, when I first login, I'll open that, then go about doing other things (checking skills, checking my isk, checking my orders, check current fleets, etc) while waiting for the mail to come up. I just loaded up the client and timed it. Just over 18 seconds from opening the mail window to the mail loading up.
This signature is useless, but it is red.
|

Cheap Dude
|
Posted - 2010.08.22 09:10:00 -
[86]
Nice read.. Got one question though. When you warp to a location you end up in making a new grid or joining an existing one. Can't grids be splitup like the other stuff you mentioned in the blog? This way only the grid lags when numbers get too high and not the whole solarsystem.
|

Mielono
Caldari SWARTA
|
Posted - 2010.08.22 14:09:00 -
[87]
I think they probably could but then there would be a session change/load screen setup in system instead of the seamless warp that we have at the moment. So they could probably isolate the different areas in the solar system onto different nodes but then possibly introduce lag into people warping into a area in the solar system instead of just the people jumping into the system. Right now the lag monster seems to be on the side of the defender, introducing that without revamping how the system handles instant large loads and fleet fights would probably just end up hurting fleet fights more than it already does.
The idea itself is not without merit though, if the fight really starts to heat up at a certain location on the map, somehow figuring out how to bubble that area off onto another core while the rest of the system continues on as business as usual would be nice and would allow smaller split off fights to occur without being troubled by the lag monster.
Originally by: Culmen
A cat is like that carebear who sticks around only while there's food, and at best kills a few rats.A dog F*cking enforces NBSI, and deep down is slightly disappointed you aren't tak
|

ElfeGER
Versatech Co. Blade.
|
Posted - 2010.08.22 16:49:00 -
[88]
Originally by: CCP Atlas
Originally by: James Bryant Hey guys,
Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.
My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.
No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?
Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.
With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.
it might be easier to move 100 systems with 5 people each compared to moving a system with 500 people closing down empty systems on the loaded node might help as well so that they restart somewhere else when needed or right away
|

Akita T
Caldari Caldari Navy Volunteer Task Force
|
Posted - 2010.08.23 00:05:00 -
[89]
So, when can we expect to see problematic "location nodes" getting moved to a different CPU core in real-time (with a brief interruption in service before that happens, but without a server reboot) ?!? _
Beginner's ISK making guide | Manufacturer's helper | All about reacting _
|

SFX Bladerunner
Minmatar Black Serpent Technologies R.A.G.E
|
Posted - 2010.08.23 01:16:00 -
[90]
Originally by: Akita T So, when can we expect to see problematic "location nodes" getting moved to a different CPU core in real-time (with a brief interruption in service before that happens, but without a server reboot) ?!?
Do I really have to say it?...
It starts with an S, and ends with a Ö __________________________________________________
History is much like an endless waltz, the three beats of war, peace and revolution continue on forever.. |

Jim Luc
Caldari Rule of Five Lucky Starbase Syndicate
|
Posted - 2010.08.23 07:14:00 -
[91]
Originally by: SFX Bladerunner
Originally by: Akita T So, when can we expect to see problematic "location nodes" getting moved to a different CPU core in real-time (with a brief interruption in service before that happens, but without a server reboot) ?!?
Do I really have to say it?...
It starts with an S, and ends with a Ö
ShortlyÖ? 
|

Franga
Kangaroos With Frickin Lazerbeams
|
Posted - 2010.08.23 09:35:00 -
[92]
Good read. I appreciate the effort to explain it for those of us that are 'technically challenged'.
|

Vaerah Vahrokha
Minmatar Vahrokh Consulting
|
Posted - 2010.08.23 10:49:00 -
[93]
Edited by: Vaerah Vahrokha on 23/08/2010 10:54:18
Quote:
Our "Fixing Lag" series continues with CCP Atlas' blog on character nodes, which you can read here.
I read the blog, envisioned the whole Python and C++ architecture for some seconds, and it caused me to have a techgasm.
Quote:
Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)
Interesting tidbit about Guido-and-the-GIL. I need to google it.
I am not sure this is what you need, I am quite sure the 80%-20% rule also applies to Jita / other hubs, where you could indeed partition by station but it'd be useless because one station is really the one center of activity.
What about going all out and quasi fractally (since you already did this in a macro way) implement a sub-station layered level of partitioning?
- Every NNN players a new core is allocated
- The outer (and inner) calls to such core are re-routed to a dispatcher that makes both sides "think" they are dealing with one core like now (read: legacy code is somewhat preserved) while in reality the virtual sub-core is being managed by a freshly allocated resource elsewhere.
This would bring in massively insane scalability.
The scheduler would also build a moving average of the last amounts of virtual core splits that could be reported to you, so you know how to optimize its granularity and can plan hardware purchase in advance.
Edit: of course such virtual core partition manager would keep statistics about the lists of users associated to each core and would periodically coalesce emptying lists (due to players warping off or logging off) into other lists and basically perform its own flavour of garbage collection. - Auditing & consulting
When looking for investors, please read http://tinyurl.com/n5ys4h + http://tinyurl.com/lrg4oz
|

corebloodbrothers
|
Posted - 2010.08.23 10:54:00 -
[94]
Edited by: corebloodbrothers on 23/08/2010 10:54:46 awsome reading, even if it isnt the holy grail of lag it shows us, players or ccp customers, that ccp does care and is putting effort in. that alone gives confidence for future.
i am curious whats more to come. as i am often in the fleet fights with lag, i wouldt mind missing things during those events. Meaning: give players, or enforce above a threshold the closing of certain player performed services.
maybe there are tasks in that solar system still on the same node and not vital too the fleet fight, * like checking your wallet * or updating market orders. * Or close and remove NPC stuff and wormholes if that helps. * Clean out tasks too the actions used for the fleet fight, * hell close chatwindows and local chat if it helps. * turn ships into squares if it helps * turn brackets off by default above a certian threshold * reset graphics to low if it matters (probably player comp affected only though) * turn of messages abotu damage and misses default when node is full * kill relogging (debatable) * or move them next door (very debatable ;)) * no reforming fleets (is used as lag tactic these days)
the point is, people in a fleet fight occupying a (in general) 0.0 system will accept all sort of trade offs and things they miss if it helps clearing out lag for that task they are there.
u can put it under a magical no lag button, or (my favourite ) enforce it, so people dont use lag as a war method by making it havier (examples are peopel reformign fleets in a lag system or spamm local)
the list is endless, what u think ccp ?
|

Yeay Fritg
Caldari Confrerie de Kaedri Cluster Of Rebirth
|
Posted - 2010.08.23 11:36:00 -
[95]
Hello,
All sounds logical except... do you remember having accessed your wallet in a fleet battle...while jumping by the Titan Bridge to Jita ?
Or may be was it when the FC told every Capital ship to jump and to check their market orders ?
Never tested this way to battle in eve, was just using weapons fitted on ships.
Still not convicted...
Yeay
|

FAIL Communicator
|
Posted - 2010.08.23 12:49:00 -
[96]
Edited by: FAIL Communicator on 23/08/2010 12:53:31 so u never seen local spamm during a 500 man fleet fight by one side to crash the node ? so the tcu or blockade unit is either saved or timers expire ? well i have.
remaking fleets or putting in new fleet commanders and moving stuff around is also used, same as spamming one person with convoys, especially if he s primary. Why do u think al those premade pictures with symbols are posted in local ?
u should consider the worst of each players behaviour if u battle lag, especially when u consider EVE is about that possible dark side
but the mentioned points are just examples of the sugestion to not only move actions, not fleet fight related, used to other nodes, but to clear as much non essential funcionality to reduce lag caused by massive fleets. Its the point i like to stress that players would gladly accept that as a trade off. Looking at node load in a more wide view would create other reduction in order to keep lag low.
|

BIZZAROSTORMY
|
Posted - 2010.08.23 12:56:00 -
[97]
so.. the Proxy server asumes we're all logging in from Imacs? 
|

Syekuda
State Protectorate
|
Posted - 2010.08.23 13:55:00 -
[98]
To CCP: To do my part please take Faction Warfare in consideration when you want people to report you fleet fights in your form. Currently, it seems like you (or someone in CCP) don't care at all about fleet fights in faction warfare.
In case people or even CCP don't know (wouldn't be surprised at all) fights happens in low-sec. You've seen the data in the last economic newsletter. Less than 15% of people in EVE-Online lives in low-sec. So I can assume that lots of system is in 1 node ...or not alot. So you can assume yourself that when a 200 man total fleet (both parties) jump in a system it can get laggy like hell
Again, please ACCEPT faction warfare request fleet fight. The last 2 attempts were denied and not accepted and we suffered from it.
--------------------------------------------------
Life is pleasant. Death is peaceful. It's the transition that's troublesome.
ISAAC ASIMOV |

Keiko Kobayashi
Amarr Celestial Janissaries Curatores Veritatis Alliance
|
Posted - 2010.08.23 16:59:00 -
[99]
Re. the fleet fight notification form, I always figured that the Dominion changes allowed you to better predict large scale fleet fights, because you can take data of deployed SBUs and their timers as a reasonably accurate indicator of when and where large scale battles will occur?
IsnÆt that the case, and doesnÆt that make filling out fleet fight notification forms largely unnecessary?
|

HeliosGal
Caldari
|
Posted - 2010.08.24 11:59:00 -
[100]
so to push ccp further have unintended fleet fights across multiple quite nodes and encourage a spread of the player base got it. Now ccp for the big one if someone is afk in station for an hour lets log em out that will help reduce lag
|

Steini OFSI
Gallente The Collective Against ALL Authorities
|
Posted - 2010.08.24 13:18:00 -
[101]
can't you split up gunfire, movement and ship health in a similar architechture?
CPU1, Moving: handles by 1 cpu and collisions, hands out distance, transversal and sig-radius CPU2, Shooting: Calculates guns, ammo, skills, dmg modifiers, tracking
CPU3, a combiner that takes only the most nessecary numbers and calculates dmg
I'm just curious. ------- I love myself |

CyberGh0st
Minmatar Ara Veritas
|
Posted - 2010.08.24 14:21:00 -
[102]
Originally by: Caladain Barton Edited by: Caladain Barton on 21/08/2010 18:58:55 CCP, we were having 1000 man battles on un-reinforced nodes without them crashing. you could squeeze 2k into a reinforced node back before dominion.
Just so you know how high the bar *was* at. Are you planning on restoring the game to that level, or to the 500 man battle on regular node, and 1000 on the special nodes?
Perhaps you missed it, but CCP Atlas said these changes are improvements and will also improve jump ins amongst other things, but this is not the bugfix that fixes the problems that were introduced with Dominion.
However they think they are close to actually fixing what went wrong.
So I expect that with these changes and once they finally fixes the problems introduced with Dominion, a reinforced node should be able to handle more than 2000 players.
Some speculation here : CCP Atlas hopes for a max 20% increase in Jita, so we could possibly see a 2400 player battle on a reinforced node.
http://www.mmodata.net Favorite MMO's : DAoC Pre-TOA-NF / SWG Pre-CU-NGE |

Whatever Dood
|
Posted - 2010.08.24 18:49:00 -
[103]
Originally by: Steini OFSI can't you split up gunfire, movement and ship health in a similar architechture?
CPU1, Moving: handles by 1 cpu and collisions, hands out distance, transversal and sig-radius CPU2, Shooting: Calculates guns, ammo, skills, dmg modifiers, tracking
CPU3, a combiner that takes only the most nessecary numbers and calculates dmg
I'm just curious.
I'm going to answer this one, and add some questions of my own.
First of all, what Atlas is talking about here is moving coarse-grain calculations like fleet bookkeeping out of the "location" (or fleet-fight) process and into their own process, or load-balancing unit (LBU). These LBU's can be scheduled on separate nodes.
The fleet bookkeeping LBU doesn't have to communicate with the location/fleet-fight LBU very often, because fleet composition doesn't change on a tick by tick basis. Therefore, the overhead for talking across nodes isn't prohibitive. (Actually, this is also why performing fleet bookkeeping actions during a battle is a good lag "cheat", because it does incur more load than would otherwise be the case due to the cross-node communication.)
Splitting off processing at the granularity you're talking about is fine-grain, ie, it'll involve communication at the per-tick level. Therefore, it's not appropriate to do so "in a similar architecture", as you say, ie, by splitting out into separate LBU's. <shrug> It probably would be appropriate at the node level, however. (or something like it, the actual bits we break out depend on how we wrote the original code.)
In other words -
Each server is a "node" with two cores. Talking across cores is very cheap. Talking across nodes is expensive. The LBU architecture is for cross-node deployment.
What you're suggesting is more appropriate for cross-core deployment, and involves recoding the location/fleet-fight process into a multi-core architecture. (But they said that's on the table too.) hth
My question relates to "tick by tick basis" - what's the "tick" or "frame" time for the location/fleet-fight process? It can't be one second, can it?
|

LTcyberT1000
Caldari Free Space Tech Goonswarm Federation
|
Posted - 2010.08.24 18:54:00 -
[104]
Edited by: LTcyberT1000 on 24/08/2010 18:54:33 Well, it seems the way cluster works now and the way to distribute load is going to be in same way as distributing kernel-level calls over 1 big cluster machine. That would result as 1000+ CPU Cores SMP + distributed memory over LAN...
The way i see, probably it would be best to have same approach as Linux Single System Image project (http://openssi.org/cgi-bin/view?page=openssi.html) has. In result, CCP would end up as 1 super computer sharing low level distributed CPU/tasks load over entire cluster instead of traditional single point of failure in load distribution node with limited nodes to share. 
---- T-1000, the old school gamer, started with 80286 machine, 11 years so far for playing games. ******************************************** Skill level: Freelancer Wolf in Moon day :) ****** |

Whatever Dood
|
Posted - 2010.08.24 19:09:00 -
[105]
Originally by: LTcyberT1000 Edited by: LTcyberT1000 on 24/08/2010 18:54:33 Well, it seems the way cluster works now and the way to distribute load is going to be in same way as distributing kernel-level calls over 1 big cluster machine. That would result as 1000+ CPU Cores SMP + distributed memory over LAN...
The way i see, probably it would be best to have same approach as Linux Single System Image project (http://openssi.org/cgi-bin/view?page=openssi.html) has. In result, CCP would end up as 1 super computer sharing low level distributed CPU/tasks load over entire cluster instead of traditional single point of failure in load distribution node with limited nodes to share. 
I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit?
I think the key multiprocessing problem to solve for EVE is turning the "location" or fleet-fight LBU concurrent. That implies cross-thread communication at a fine-grain level. It also implies a limit to the total amount of processing we'd have to splat, ie, just the processing load of one "location" LBU.
We wouldn't want our communication actions there to go through a LAN. That's orders of magnitudes slower than cross-core communication. I think the appropriate hardware architecture for EVE is probably just what they have now, except to use servers with more (perhaps 4x) cores.
Of course the real work is converting the code base to take advantage of concurrent resources.
|

LTcyberT1000
Caldari Free Space Tech Goonswarm Federation
|
Posted - 2010.08.24 19:13:00 -
[106]
Edited by: LTcyberT1000 on 24/08/2010 19:14:56
Originally by: Whatever Dood
I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit? work is converting the code base to take advantage of concurrent resources.
The way how Single System Image works - it is really runnig as 1 virtual multicpu computer. If you need to get idea how it works, see http://setiathome.ssl.berkeley.edu/ and other mathematical calculation projects already working for years... :)
---- T-1000, the old school gamer, started with 80286 machine, 11 years so far for playing games. ******************************************** Skill level: Freelancer Wolf in Moon day :) ****** |

Whatever Dood
|
Posted - 2010.08.24 19:29:00 -
[107]
Originally by: LTcyberT1000 Edited by: LTcyberT1000 on 24/08/2010 19:14:56
Originally by: Whatever Dood
I'm confused. You mention SMP, symmetrical multiprocessing, architecture - which usually refers to multiple CPU's connected to the same local memory (ie, via a hardware bus, ie, in the same box) and running the same OS image. But then you go on to mention distributed memory over LAN. That doesn't fit? work is converting the code base to take advantage of concurrent resources.
The way how Single System Image works - it is really runnig as 1 virtual multicpu computer. If you need to get idea how it works, see http://setiathome.ssl.berkeley.edu/ and other mathematical calculation projects already working for years... :)
I wasn't questioning whether it works. I was questioning the usage of the "SMP" acronym for loosely-coupled, ie, over LAN, systems.
Regardless, the problem remains the same. It's much, much more expensive to communicate across a LAN. Enough so that single-box SMP architectures like what they're using now are more appropriate for problems like turning the fleet-fight LBU's concurrent.
|

Tripflare
|
Posted - 2010.08.24 22:26:00 -
[108]
Originally by: Alain Kinsella That sounds a lot like VMWare VMotion
I've often wondered if running nodes as Virtual Machines would be worth mentioning for load balancing: If nobody is using the node then the VM requests very little resources from the underlying host - just enough to tick over.
Meanwhile, where demand for lots of CPU / RAM is required on a node - that VM ramps up and uses it's full allocation on the host; 8 Virtual CPUs and 255Gb RAM can be allocated to a VM running on VMware vSphere (each vCPU is mapped to a physical core).
You can have up to 32 physical hosts per vSphere cluster - each host can have up to 1Tb RAM..
If lots of nodes (VMs) happen to be getting hit hard on one host the Dynamic Resource Scheduler (DRS) will live migrate (vMotion) the nodes that aren't doing much automatically to other hosts in the cluster - this is a very fast process and is fully automated.
If a physical host suffers hardware failure, the nodes (VMs) that were running on it are automatically restarted on other hosts in the cluster. (VMware High Availability)
There's more but I think I'll stop there for now... :)
Trip
|

Taudia
Gallente Sane Industries Inc. Initiative Mercenaries
|
Posted - 2010.08.28 12:44:00 -
[109]
Wait, why does it say this blog was posted today? Fallout introduced the comment thread for it more than a week ago...?
|

Nareg Maxence
Gallente
|
Posted - 2010.08.28 14:01:00 -
[110]
The stackless python upgrade blog didn't recieve as enthusiastic response as this one did, so they decided to bump it.
|

Genya Arikaido
|
Posted - 2010.08.28 16:10:00 -
[111]
Dev Blog Necromancy? 
Originally by: CCP Tuxford my bad.
Rest assured I'm being ridiculed by my co-workers.
|

jwingenderowns
|
Posted - 2010.08.28 16:51:00 -
[112]
You have not seen this dev-blog before. You will appreciate it and enjoy it. </jedi-mind-trick>

|

Lan Tragger
|
Posted - 2010.08.29 11:38:00 -
[113]
I understand GIL makes things annoyingly difficult in a single process. Easy solution is of course multiple processes. Of course, That still leaves you with synchronization between processes, but let's be honest. If Eve does not learn to use hardware more fully, you will always have scalability problems. The ability to transfer the location node on the fly should easily be doable. Of course, that means running a split node during transfer, but that shouldn't be too large of a problem (It's actually a huge undertaking). What about kicking queue processors which can act independently (don't need to block us, but currently due because of GIL) into separate processes and feed the queue to them. In a multi-core system, this means you can use shared memory or similar to transfer the data. In a multi-platform scenario, pick your transmission medium. Of course, the gains must exceed the latency of the transfer, though I suspect some evil coding could make it happen. Sending micro code snipets to cluster nodes and dealing with the latency of the transfer isn't a new problem. Even if you can't transfer the entire location yet, at least consider that effects/module queue processing (re the other blog entry) might be isolated enough to separate out from the main server thread and process independently on a separate core which would fix the play nice with GIL problem in that scenario. Depending on how crucial ordering of queue processing is, you could possibly even utilize multiple queue processors on demand in case the current processing gets too far behind (all number of ways to balance data across them, even if you played the simple odd/even game).
shutting up now. Visit google and see some of their clustering technologies. It's awesome. Mostly doesn't apply in this scenario, but I can think of a few tidbits that would.
|

Mr LaForge
|
Posted - 2010.08.29 16:15:00 -
[114]
When I read the first few words of the blog, I thought this:
"Cobra sucks, GI Joe is better"
|

ELECTR0FREAK
Eye of God Black Star Alliance
|
Posted - 2010.08.29 18:36:00 -
[115]
This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Hey, I'm all for you all having a good time when working, but I think it's time to put down the bottle guys and gals. 
Discoverer of the Original Missile Damage Formula |

Syekuda
State Protectorate
|
Posted - 2010.08.30 00:13:00 -
[116]
Uhh, what he ^^ said. Guys stop drinking. Vacation is over

Uhhh I may be first to say publicly but ccp got pwned by themselves lmao
--------------------------------------------------
Life is pleasant. Death is peaceful. It's the transition that's troublesome.
ISAAC ASIMOV |

Terminal Entry
New Fnord Industries
|
Posted - 2010.08.30 03:28:00 -
[117]
Originally by: Syekuda
ps: is this blog lagging or losing sync ?
LMAO!
Originally by: CCP kieron If you feel we as an entity are corrupt and abhorrent, we bid you good luck in finding a game and company that suits your interests.
|

Trebor Daehdoow
|
Posted - 2010.08.30 15:01:00 -
[118]
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
Confessions of a Noob Starship Politician Spending Hours blogging the Minutes
|

Syekuda
Valor Inc.
|
Posted - 2010.08.30 15:59:00 -
[119]
Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
6 pack of redbull should do the trick
--------------------------------------------------
Life is pleasant. Death is peaceful. It's the transition that's troublesome.
ISAAC ASIMOV |
|

CCP Masterplan
C C P Alliance

|
Posted - 2010.08.30 18:01:00 -
[120]
Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
I think this is what you're looking for... |
|

Hawk TT
Caldari Bulgarian Experienced Crackers Circle-Of-Two
|
Posted - 2010.08.30 18:59:00 -
[121]
That's called serious science & research!
But I knew it, I knew it - CCP Dr.EyjoG is a Doctor of Medicine and the whole "Economic lifecycle" b....t is just a cover!!! Actually, his sole responsibility is to keep the Devs @ The Ballmer's Peak 
Originally by: CCP Masterplan
Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
I think this is what you're looking for...
___________________________________ Science & Diplomacy Manager @ BECKS Circle-of-Two |

Lan Tragger
|
Posted - 2010.08.30 19:02:00 -
[122]
Originally by: CCP Masterplan
I think this is what you're looking for...
I always wondered why I did my best work on the laptop at the bar and why getting married seemed to diminish my programming skills. :) |

Tiger's Spirit
Caldari
|
Posted - 2010.08.31 03:36:00 -
[123]
Edited by: Tiger''s Spirit on 31/08/2010 03:36:50 "A new command line switch has been added that will disable the lighting effects on Alienware machines that support it. It is ô/nolightfxö."
CCCP mistake, need more command line switch to 1.0,5 patch, "/nolagfix" again :D
|

Troy LS
|
Posted - 2010.08.31 20:34:00 -
[124]
I seem to recall reading a Dev Blog that explained that large fleet fights were unpredictable and that it was difficult to allocate computing resources to such fights. I'm not a programmer but, as an engineer, I have studied statistics and probablity. It seems to me that there are some statistics that would help predict resource utilization on a particular node.
For instance, if a fleet is formed, there is a chance that it will engage a similarly sized fleet. The probability of engagement increases if there are two or more fleets in close proximity. The probability is further increased if the two fleets have negative standings with one another, and virtually assured if they are at war.
My point is that a stochastic model can be developed that would predict resource utilization and allocate resources accordingly. When the model incidates a high probability of resource utilization, the resources can be apropriately allocated PRIOR to the engagement. The resources that might be involved can be predicted by the number of pilots in fleets, the number corpmates and alliance members and associates with standings that are online and in the region, and the resources (ships) that are availabile to those pilots.
I think its funny that CCP has asked players to inform them when they expect large fleet engagements. I can't think of too many situations when enemies have agreed to fight at an agreed upon time and place.
Anyway, this is a great Blog post and I'm happy to see the developers really are taking action.
|

Dannemora
|
Posted - 2010.09.02 03:58:00 -
[125]
Originally by: Troy LS I seem to recall reading a Dev Blog that explained that large fleet fights were unpredictable and that it was difficult to allocate computing resources to such fights. I'm not a programmer but, as an engineer, I have studied statistics and probablity. It seems to me that there are some statistics that would help predict resource utilization on a particular node.
For instance, if a fleet is formed, there is a chance that it will engage a similarly sized fleet. The probability of engagement increases if there are two or more fleets in close proximity. The probability is further increased if the two fleets have negative standings with one another, and virtually assured if they are at war.
My point is that a stochastic model can be developed that would predict resource utilization and allocate resources accordingly. When the model incidates a high probability of resource utilization, the resources can be apropriately allocated PRIOR to the engagement. The resources that might be involved can be predicted by the number of pilots in fleets, the number corpmates and alliance members and associates with standings that are online and in the region, and the resources (ships) that are availabile to those pilots.
I think its funny that CCP has asked players to inform them when they expect large fleet engagements. I can't think of too many situations when enemies have agreed to fight at an agreed upon time and place.
I'm not a not an engineer, but as a systems architect, I saw roughly the same as you in that dev blog :)
However I also noticed one big problem in the current design that negates the effect of being able to predict the resource utilization in real time.
They can't shift people off a location node once it's loaded for them without booting them off it in the process.
That's why they need to perform the load pattern during downtime.
It's like knowing how much beer you should stock an aircraft with before it takes off.
Sure there's nice studies that show the normal consumption based on a boat load of parameters. But if one large group of, let's say, middle aged accountants, wasn't heading to some important seminar but actually going to a 30 year celebration of their graduation, the calculation would most likely be off. And the accountants would run the aircraft dry, at least of beer.
Loading every single flight with enough beer to serve an entire cabin full of beer-happy accountants isn't a workable solution. It would be way to expensive for any company to add all that weight just to shuttle beer around the globe for no good.
Mid air refuelling would work. But it would require that such an option was added at the design stage. Adding it to existing aircraft would be a really hairy job.
But, when I read the devblog I also noticed that they are moving as much as possible off the actual aircraft, erh location node.
This in extension would make it a good deal easier, or at least less complex, to re-engineer the choke points.
Redesigning the beer supply system to handle air-refuelling is easier than redesigning the entire aircraft.
So I'd say, with some experience, that they are on the right track, and improvements will come.
But it'll take a while to get it done, and they will have to tread carefully not to create new problems in other areas.
/D
|

ELECTR0FREAK
Eye of God Black Star Alliance
|
Posted - 2010.09.03 03:25:00 -
[126]
Edited by: ELECTR0FREAK on 03/09/2010 03:28:18 Edited by: ELECTR0FREAK on 03/09/2010 03:27:59
Originally by: CCP Masterplan
Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
I think this is what you're looking for...
LMAO, awesome! 
Well then, someone's sneaking a little more than their designated beer ration!
Discoverer of the Original Missile Damage Formula |

Mal Lokrano
Gallente The Executives IT Alliance
|
Posted - 2010.09.03 09:07:00 -
[127]
Originally by: up to 80% of the calls were routed elsewhere freeing Jita up for important things like inventory operations and scams in local.
That made me lol, thanks. ____________________________________________ When going to a party with wine, women, and song. Always ascertain the vintage of the first two.
Don't bug me ingame about diplomats, I don't know wh |

Thy Collector
|
Posted - 2010.09.07 23:52:00 -
[128]
So... I get this (I think). But it doesn't exactly explain the gate lag. If the CPU was really being overloaded, wouldn't this just backup the queue so that there would be a delay from when grid loaded, but not so long that the request would time out. Correct me if I'm wrong, but aren't all requests sent to the location node put into a queue. Under normal load, the queue is processed relatively quickly so it often takes less than a second for things to complete. If the CPU was being jammed with more requests than could be processed in a normal fashion, wouldn't this just back up the queue and bit, and not cause a complete failure of processing of requests.
I figure that you guys are using a priority queue so that certain requests get placed higher up in the queue than other requests. This seems to make sense until those higher priority requests begin to fill up the queue at an unusually fast rate, and so low priority requests never get taken care of. Could this possible a reason as to why the grid never (or takes forever to load) for everyone jumping in? Are requests such as loading a grid from another system (or other things that would effect us actually loading grid after jumping) being put so far back in the queue relative to other requests that they are just never taken care of and time out?
If so wouldn't it make sense to us some kind of run-time like algorithm so that requests that have been in the queue for too long are forced to the front? E.G. requests must be completed in a certain amount of time, regardless of their original priority.
I ask this because I don't understand why loading grid is never taken care of as that seems to be the biggest issue. Its one thing for the grid to take an extra 10-30 seconds to load and its another thing entirely for the grid to take so long to load that the request times out and then we are left helpless.
Further more, sense those already on grid have much less of in issue are grid requests from those entering a new system less important than those already in system?
I understand that there are also other things that need to be taken care of, such as making sure that their are no unnecessary repeats of requests, especially high priority requests, but if it indeed is the case that some requests are just being pushed back so much that they are never taken care of, it seems that making sure they are processed at all should be a higher priority.
|

Nytemaster
The Perfect Storm
|
Posted - 2010.09.08 07:06:00 -
[129]
Quote: Through their work and the work of others, there's now, at this moment, a perfect storm within the company and we have a great number of fixes in the pipes that will knock your socks off.
I knew we had a Dev in corp and this post proves it.
|

Mashie Saldana
BFG Tech
|
Posted - 2010.09.09 12:56:00 -
[130]
Originally by: Nytemaster
Quote: Through their work and the work of others, there's now, at this moment, a perfect storm within the company and we have a great number of fixes in the pipes that will knock your socks off.
I knew we had a Dev in corp and this post proves it.
Omg, you must be George Clooney then, can I have your autograph please? 
|

Rakessh
Antares Shipyards Circle-Of-Two
|
Posted - 2010.09.30 20:47:00 -
[131]
Quote: Another example of a load balancing unit is a solar system. Typically we will have multiple, even hundreds of solar systems living on a single node. We call these types of nodes Location Nodes. Typically these solar systems have such low load that they can be mapped onto a node with a lot of other systems in the same way as the market is. However, now comes the gotcha: If a single solar system exceeds the capacity of the CPU core we have lost the ability to further balance it.
Not entirely correct, and cpu utilization figures alone does not tell the whole story, maybe not even the truth. Running several hundred solar system on a single node because they are low-load ones, doesn't mean you won't run into trouble. There are only so many time slices in a second for any task to complete! Typically one "balancing" unit running on the same node as another "balancing unit" will compete for cpu time, and thus you will have to pay close attention to your process que depth.
Tweaking of the kernel to solve this problem is a must, you have to create more time slices. It will increase kernel overhead, but also increase the responsiveness until you hit the utilization barrier. Do you guys actually do that to your windows servers, or are you trying to run less than 50 "balancing units" per node? I think the normal win NT (2000) kernel has 20ms slices, so in one second you have 50 slice units. So your guys will have to decide the time resolution each process should have (how many time per second the process is allowed cpu time so it can respond to new input).
So whenever you are running more potentially cpu intensive processes than configured time slices, you may end up with processes end up in time starvation. Response times will skyrocket, and even a solar system with only 10 guys in it can feel jita-laggy.
Just thought I'd point it out, seeing the article was doing it easy-mode, I hope it doesn't mean you weren't aware though ;)
CEO Arachnea Phoenix Battalion |
|
|
Pages: 1 2 3 4 5 :: [one page] |