Author |
Thread Statistics | Show CCP posts - 18 post(s) |
|

CCP Fallout

|
Posted - 2010.08.20 18:04:00 -
[1]
Our "Fixing Lag" series continues with CCP Atlas' blog on character nodes, which you can read here.
Fallout Associate Community Manager CCP Hf, EVE Online Contact us |
|
|

CCP Explorer

|
Posted - 2010.08.20 18:50:00 -
[2]
Please note that the network traffic and CPU usage graphs were inadvertently swapped and didn't match their captions. I've fixed that now.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Explorer

|
Posted - 2010.08.20 18:54:00 -
[3]
Truly fixed now.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Explorer

|
Posted - 2010.08.20 18:58:00 -
[4]
Originally by: Meissa Anunthiel Better legends for the graphs would be appreciated, I have absolutely no clue what I'm looking at. Care to say what each colored line is?
There are larger versions of the the images available by clicking them.
Figure #2 is the number of net read calls made in a given time period. Up to 80% of the calls were routed away from the Jita location node to the character nodes.
Figure #3 is the CPU usage on the Jita location node before and after.
Lower lines are "after" and lower is better.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Explorer

|
Posted - 2010.08.20 19:01:00 -
[5]
Originally by: Alain Kinsella Good read, was kinda what I expected when the last patch notes came out (we implement something similar at work).
Only question I have on this post: Is the EveMail system going to be placed on its own node set? I'm always surprised that my character (who gets maybe 3-4 evemails a week) takes nearly a minute to load the screen at startup.
[And on this note, can you also explain why its so much quicker to access EveMails through EveGate, in an OOG browser like FireFox? My assumption here is that the character node for Mail is already implemented, but only being called directly by the EveGate/Web side, not by the Client.]
EVE Mail is on the Character Nodes. In the first iteration we implemented Mail Nodes for EVE Mail but they then became Character Nodes in the second iteration and started hosting other services. I'll mention your concern to the devs.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Explorer

|
Posted - 2010.08.20 19:32:00 -
[6]
Originally by: Malcanis I am very much appreciating this new, communicative CCP. Obviously we're not going to see DevBlogs and DevPosts sustained at quite this rate, but I hope the CCP staff are going to carry on this way.
Actual Information beats the hell out of speculation
And I also think there has been a great improvement in the mood of the playerbase. We're still very much waiting on real results, but a lot of us are feeling a lot more positive and optimistic that we'll get them.
I do want to submit that this blog contains real live-on-TQ results (phase #2 of these changes was deployed to TQ on 12 August, phase #3 was a part of Tyrannis 1.0.4 this week on 18 August).
In addition there are dev blogs in the pipelines from other devs with other such results.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Atlas

|
Posted - 2010.08.20 20:33:00 -
[7]
Originally by: Ford Chicago CCP Explorer, did you attempt to normalize the comparison against differences by day of the week in order to accurately quantify the benefit of this change or are you just showing us raw data?
This is raw data but the 4 runs were quite similar in terms of a population and usage profile.
Originally by: Ford Chicago I also found it interesting that "up to 80% of the calls were routed elsewhere" (other than the Location node) but the cpu utilization of the Location node only dropped 5-15% points. This means that 20% of the calls are responsible for the majority of cpu usage.
Yes, that is a very good observation. This isn't giving us an 80% gain in terms of CPU since the calls that were routed elsewhere are much lighter than the ones that need to remain.
Originally by: Ford Chicago CCP Explorer, can you go into more detail about which types of calls generate the most cpu utilization? Which types of calls have been handed to the Character nodes besides mail. What are the 5-6 calls made on a jump event that *don't* need to be handled by the location node?
Some examples of calls that now get routed to the character nodes are lookups of characters, corps and alliances (something that happens all the time when you see someone in your overview for example), certain show-info operations, sov info, some station info, etc, etc. It's all over the place, which is the reason it hasn't been structured properly up until now. Programmers have typically thought "I have this teeny tiny call, I'll just stick it on the location node".
Originally by: Ford Chicago I found this to be one of the more interesting of the recent dev blogs, but even so, all it really says is that some things that used to be handled by the Location node are now handled elsewhere. As a programmer I suspect my interest is on the more technical side than the average player, but I'm frustrated with the recent "dev blogs" that seem more like marketing material.
Thanks. :) Like I mentioned, it's just a bunch of little things, most of them very light calls but they add up to a big bunch of traffic.
|
|
|

CCP Atlas

|
Posted - 2010.08.20 20:41:00 -
[8]
Originally by: James Bryant Hey guys,
Fantastic dev blog. Certainly answers a whole slew of architectural questions that have been lingering in my head for some time.
My question is in regards to location node load balancing. The very fact that fleet fight requests are necessary seems to indicate a lack (or possibly not enough) automated load balancing of location nodes.
No doubt this is not a new idea to you all, so I am wondering what the difficulties are in implementing the capability to offload light traffic location nodes to underutilized CPUs when heavy traffic nodes start to throttle the CPU. Is there not a way to manage the connection while the handoff is being made? Or is it more an issue of detection and implementing the proper hysteresis in the algorithm (so that nodes don't start swapping around CPUs needlessly)?
Indeed. What is on our roadmap is in fact to allow for non-destructive (e.g. not kick everyone out) live remapping of solar systems. I don't know when this will be a reality but it's definitely something that we are very interested in.
With such a system in place when a solar system you're in gets too loaded to play nice with the other solar systems the load balancer would kick in automatically and you would just pause for a bit and then continue as if nothing had happened on a spiffy new node.
|
|
|

CCP Atlas

|
Posted - 2010.08.20 20:50:00 -
[9]
Originally by: Zendoren Edited by: Zendoren on 20/08/2010 20:26:16 Best blog thus far!
However, I would have liked a further explanation on how the server topology will be changing for TQ with the addition of these nodes. From what I remember, The original setup was Proxy Server -> Load balance server -> sol Server. how will this change with the addition of these nodes.
Also, Would have liked to see a little glimpse of CCP Soundwave's expectations on the potential performance increase with addition of multi-processor support to the server side code coupled with these node changes.
This does not change the topology of the cluster at all, and is a perfect fit for its existing layout. From the client's point of view the network is:
Client -> Proxy -> Sol -> SQL Server
The 'Sol' tier can be any node in the cluster while the rest of the layers are exactly 1 for each client. For the sol nodes as you saw in Figure 1 in the blog, you maintain a virtual connection to several sol's at a time depending on the request context. It's all transparent to the application logic and pretty nifty and easy to work with. We do need to place certain restrictions on game design in order to maintain this schema, but it's the architecture that Eve was founded upon.
(I'm not mentioning above that there is a hardware load balancer in front of the proxy tier which picks a proxy for you when you connect since that will just confuse the layout)
|
|
|

CCP Atlas

|
Posted - 2010.08.20 21:16:00 -
[10]
Originally by: Liang Nuren Edited by: Liang Nuren on 20/08/2010 18:36:26 Awesome dev blog - this should really help a lot. It sounds like you guys are really doing a fantastic job, and I think you're all awesome.
For my own curiosity though: - Is the bottleneck in the database (finding/updating rows) or in the processing of individual requests (like loading/manipulating objects). It seems like if its the second, then this is really an awesome way to handle it. - If it's the second, is there a single character database or did you distribute characters onto different databases? If you distributed them, is it difficult to move characters between databases for load balancing purposes? - If you distributed it, is there an archival character database for offline/inactive characters, and perhaps a series of smaller character node databases for logged in characters which replicate to the master db?
Well, I could talk shop all day, and I probably shouldn't. But I do have a more serious question - it seems to me that the "Jita Inventory System" shouldn't be required to dump someone's stuff in a station. It seems like the interactions that can be had by docked people are limited to trade windows and local chat - neither of which I can imagine being handled by the location node. It seems like it's a perfect place to further distribute. Is this an improvement you guys are planning on making or are there things I don't know about?
I got money on the second, personally.
Also: sorry for the armchair development. A very well written blog that tangentially touches on my area of expertise.
-Liang
Ed: Also, I thought I saw an email on python-dev a couple months back where Guido accepted someone's method of getting rid of the GIL.
We only have a single database and it's easier to scale that up than the sol nodes and we're already ahead of the curve in terms of what the DB can deliver. We do cache very aggressively on the server though and consolidating these character node calls onto a half a dozen nodes rather than servicing them throughout the cluster does remove a bit of the DB load since we get more cache hits, but like I said, the DB is not a big issue in this regard today. What this particular change saves us mostly is having to process relatively light and simple calls on a given node.
The inventory system is what lies at the heart of Jita's cpu cycles and it's really just a glorified DB cache. Moving items about and interacting with them causes a cascade of all sorts of events that must be handled by the game systems on that node. Therefore it's not really feasible to offload parts of those operations elsewhere.
Market hubs like Jita have the potential for load balancing stations separately of the solar system and other stations. That is something we are currently investigating as a possible 'end-all' fix to Jita. There is a fair bit of game design involved and I'm not making any promises however. :-)
Interesting tidbit about Guido-and-the-GIL. I need to google it.
|
|
|

CCP Atlas

|
Posted - 2010.08.20 21:24:00 -
[11]
Originally by: Herschel Yamamoto Those are some very impressive graphs you've got there. A few questions. What impact will this have on Jita - what will the new pop cap be? How does this seem to be affecting the jump-in lag that has plagued fleet fighting in recent months? And how will this affect lag in contexts other than people jumping into systems - does it speed things up for people who are in system doing things, or just on system load?
And thanks for a great week of dev blogs, all involved. I even understood like 2/3 of it.
This change isn't going to multiply the number of people we can cram into Jita, but I'm hopeful that it will give us 10-20% yield in population. We are taking it slow in Jita and have the population cap set at 1500 now, we will increase it once we see Jita handling that well. We would rather see a lag-free Jita at 1500 than laggy at 1800.
This will have a positive impact on the jump in lag for fleets since many of the calls that slow down the jumping are now serviced immediately elsewhere, leaving the location node free to do the important bits. This isn't a fix for jump-in lag though. We have some hopeful actual fixes (serious mitigation anyway) in the pipes for immediate future though. More blogs on that soon.
This sort of change will speed up utility functions that don't impact your solar system directly. Your client should seem a bit 'spiffier' when talking to the server. You won't see an fps increase but you don't have to wait as long for things like loading up the map, right clicking on other players and things of that nature. There is also a bit less for the location node to do so it has more buffer for the pew-pew.
|
|
|

CCP Explorer

|
Posted - 2010.08.20 22:53:00 -
[12]
Originally by: TornSoul
Quote: We have multiple market regions living on a single node and currently four nodes servicing all the market regions. If the load on the market increases we can just increase the number of nodes dedicated to that task and decrease the number of markets on a given node.
When exactly did this happen???
I recall from far back (years) that that was one of the holy grails you where working on. It was my impression (not announced? or me not catching it?) that this hadn't been achieved yet.
Reading the blog it comes of as if this has been in place some time (how I read it anyhow) Is this correct - or is it in fact part of the described change(s) - I.e. a recent thing?
The market has been run on its own set of nodes for years.
Erlendur S. Thorsteinsson Software Director EVE Online, CCP Games |
|
|

CCP Atlas

|
Posted - 2010.08.20 23:09:00 -
[13]
Originally by: Camios Edited by: Camios on 20/08/2010 22:52:22 Excellent. What are the next services you are going to "delocalize"? I read in a DevBlog some time ago that typing in local could reduce performance in fleet fights. Does it mean that chats run on the location nodes?
The local chat channels runs of your location node and in the current chat architecture that's where it needs to be since the location node is the only node that knows what people are in the solar system.
There is not a massive amount of work done in the chat channel though. Typing in local doesn't impact the server much but it does play a role in whether your client recovers or not when your session is hurting.
|
|
|

CCP Atlas

|
Posted - 2010.08.21 01:11:00 -
[14]
Originally by: Mashie Saldana With these new nodes, would it be possible to have dedicated AI nodes to bring Sleeper AI to all NPCs in EVE?
Actually, the NPC AI is a perfect example of a system that needs to live on the location node. However, there are not any outstanding scalability issue with that.
There are no technical reasons Sleeper AI or something akin to that isn't on more or all NPC's, it's a game mechanical / balancing issue which is outside my expertise... 
|
|
|

CCP Atlas

|
Posted - 2010.08.21 01:12:00 -
[15]
Originally by: Jim Luc
Originally by: Frug Dude. A bunch of beige towers leading into an original imac?
No wonder there's lag issues.
I LOL'd 
I was wondering when someone would comment on that 
|
|
|

CCP Atlas

|
Posted - 2010.08.21 11:50:00 -
[16]
Originally by: Kolatha
Originally by: Jita Dancer
Why dont you go the other way? When a node gets busy, it remaps ALL THE NON BUSY SYSTEMS somewhere else? Perhaps not as beneficial as a busy node moving itself, but by definition - the non-busy nodes are either empty or have a very small number of players attached and will (more) likely be able to successfully hand those players off onto a different node, leaving the bulk of players not exposed to a "high load handover"... Just a thought.
I would think this would be the better option. When a system starts getting busy it means something is going down. If you move that busy system with all the participants ramping up their game you are just asking for trouble, unless you can do it smoothly and seamlessly. Moving the busy node also means you need to make sure you have sufficient idle nodes waiting just for this purpose.
On the other hand I can see how a number of large roving gangs could cause some pretty hefty internal bandwidth usage as they cause system after system to get hit with remapping.
Yes, indeed. Today we do either, depending on the situation. Sometimes the system itself is moved and sometimes the other systems, leaving the fight alone.
|
|
|

CCP Atlas

|
Posted - 2010.08.21 11:57:00 -
[17]
Originally by: Tres Farmer ... And why does it take +30seconds for the mail window to be responsive after opening it the first time in the actual client session? This just looks broken! What is it waiting for? New mails? Why can't it show the old mails.. all that had been cached already and should be saved locally on my hdd right away with the menu on the left? And then if new ones come in have it show them.. Making me wait for this is not user friendly! ...
This doesn't sound like the way it should work. Can you submit a bug report for us? Mention me please so that I get it.
|
|
|

CCP Masterplan
C C P Alliance

|
Posted - 2010.08.30 18:01:00 -
[18]
Originally by: Trebor Daehdoow
Originally by: ELECTR0FREAK This is what happens when the Devs drink on the job. We get ships that look like the Moa or the Dominix and blogs get reposted.
Actually, you have it exactly wrong -- Ships like the Dominix are what you get when devs are stone-cold sober.
For the Icelandic subspecies of dev, at least, the question is not whether they should be intoxicated, but what level of inebriation is appropriate for a particular task; for example, it is clear that game designers do their best work when falling down drunk, whereas the lag team is most effective when they've just had one or two shots "to take the edge off".
The exact blood-alcohol levels needed for optimum dev productivity is a subject of intensive ongoing research at CCP (and the subject of an upcoming devblog by CCP Tequila), and there are rumors that breathalyzer authentication modules will soon be added to all workstations to prevent the devs from logging in when they are under-medicated.
I think this is what you're looking for... |
|
|
|