|
Author |
Thread Statistics | Show CCP posts - 19 post(s) |
|

CCP Fallout

|
Posted - 2010.06.16 12:00:00 -
[1]
On Wednesday, June 23, 2010 between 0900 and 1500 UTC, Tranquility will be down while we move the cluster. CCP Yokai's new dev blog gives all the details on the move and improvements to the cluster.
By the way, this is Yokai's first dev blog. Some of you may remember him from Fanfest 2008, when he and a pick up team (Bubblegum!) made it all the way to second place in the mining PvP tournament :D
Fallout Associate Community Manager CCP Hf, EVE Online Contact us |
|
|

CCP Yokai

|
Posted - 2010.06.16 12:55:00 -
[2]
"Just try to keep use abit more in the loop would be great."
That is the plan. There are lots of exciting things brewing in Virtual World Operations right now...
As I mentioned, upcoming blog posts will talk about what we are doing for:
-Remapping EVE -Next Level Fleet Fights -Hot Spot Prediction
|
|
|

CCP Valar

|
Posted - 2010.06.16 13:33:00 -
[3]
Originally by: Trick Novalight
(1x72GB hd? No raid 5? Seems like setting up 15k rpm HDs in raid 5 or raid 10 would help increase the read/write of the SQL database...)
1x72GB hard disk in each application server... and those have nothing to do with the SQL Server. The EVE server does almost no I/O so disk performance on the application servers is of no concern.
---- Senior Virtual World Database Administrator Operations department CCP Games |
|
|

CCP Yokai

|
Posted - 2010.06.16 14:20:00 -
[4]
"Did you guys consider the Cisco UCS for the blade servers?"
We have a great relationship with Cisco. They have some very cool toys and we try not to keep our eyes open for anything that makes TQ better. That being said, the IBM blades have been so good and the IBM team are working hand in hand with our team to make TQ better. We never stop looking for the best.
The UCS solution is very good for virtualized and for solutions where many if not most of the servers need lots of connection types (fiberchannel, Gig-E, Infiniband, Etc...) The EVE code is quite amazing that we can get 60,000 plus players on 64 servers with only Gig-E connectivity. Do some research and see how many servers someone like a game about secondary life needs to operate at that level.
Not to keep pimping future blogs... but, the next one is all About how we map EVE's 7929 solar systems (w/wormholes) onto those 50 or so nodes that handle solar systems and make sure the one you are playing in has the correct load. That's where we'll make the most noticable impact on performance in teh short term.
|
|
|

CCP Yokai

|
Posted - 2010.06.16 14:38:00 -
[5]
"why not have a huge VM array?"
We get this question a lot and the answer is pretty simple. Think of a server, even a very big one as a loaf of bread. Each time you make a slice you leave some crumbs behind (the overhead of VMÆs) no matter how small or efficient the slicing the fact is you donÆt get the peak capacity you could if it were dedicated to the one service.
In Eve we already virtualize so to speak by distributing solar systems onto servers based on usage data. But we donÆt need the overhead of many of those popular virtualization software providers when we do need to dedicate a node to Jita, Fleet Fights, etc. So, in some waysà Eve is very virtualized and very good at it.
|
|
|

CCP Yokai

|
Posted - 2010.06.16 15:51:00 -
[6]
Edited by: CCP Yokai on 16/06/2010 15:53:07 "Software efficiency is ALWAYS better than throwing more hardware at the problem."
I am a fan of this comment :)
Yes, but we try not to limit our efforts to just one source. I'm not a programmer but the team I work with is focused on making sure the code that does get deployed does not have inference from limited or weak infrastructure design.
|
|
|

CCP Yokai

|
Posted - 2010.06.16 16:03:00 -
[7]
Liorah,
VM Ware and similar solutions are pretty damn cool for that kind of thing. Again, not that it's a bad idea, but there are complexities to moving sessions around on the virtual nodes even in seconds.
Right now we have some very dedicated guys that make sure the systems get reallocated, and part of the tools I'm talking about in "Predicting Hot Spots" is all about knowing where and when to put nodes to dedicated status and making it completely automated.
The ease of use trade off, with virtualization is just not as high on our list as getting fights bigger... we are at or near Moore's Law and I don't expect to see us getting CPU to 3x anytime soon... so every % we can protect we do.
|
|
|

CCP Yokai

|
Posted - 2010.06.16 16:51:00 -
[8]
I'll set all my accounts to a 6 hour skill just to be sure ;)
|
|
|

CCP Yokai

|
Posted - 2010.06.16 18:39:00 -
[9]
Tobin Shalim - I know that there have been rankings in the past comparing Eve's cluster to other supercomputers/systems in the world. How does the new hardware compare/improve the listing?
Answer - Not even close to the top 500... the lowest one on the list this year was 5136 cores... while cores alone do not quantify performance... 280 vs. 5136 is still pretty far away from that group.
Andrea Griffin - I thought the TQ cluster had many more machines than this!
Answer - It has changed over the years and at one point had alot more servers. but with multi core the quantity of servers needed actually goes down even through player numbers and PCU goes up. Love progress.
Qoi - Can i visit you and fondle those blades? Just once.
Answer - No, but how about some video and pics? It's on the way.
Dakisha - How come, given the seriousness of lag in 0.0 these days - that you've not upgraded to modern cpu's?
Answer - The list today is well... from today. We are looking for good reasons to make changes, but they have to make significant impact. Since peak capacity on a node is so important for fights, 3.33GHz even on an older generation is still very high end. Give me some 10GHz CPU's and I'd be all over it.
Lord XSiV - fire your architect/systems integrator and spend the money on a qualified one.
Answer - I'm the new guy... give me a few days ;) We do have Brocade, and for Cisco it is not as simple as better or worse... finding the right solution... WS6748-ge-tx with DFC3's makes a significant impact on side to side switching. In anycase... as mentioned, only so much can be done on the internal network to help. Clock speed is still the big issue today.
|
|
|

CCP Yokai

|
Posted - 2010.06.17 18:58:00 -
[10]
CrazzyElk - Is there an approximate guesstimate on the ETA of the next blog.
Answer - Nothing exact, but I am going to go out and say weeks, not months. Just have some things to get done first so we can talk about what we are doing, not what we might do.
Hawk TT - Correct me if I am wrong, but SOL nodes should benefit from Intel QPI, larger & faster caches, more memory, more memory channels & bandwidth, Intel Turbo-Boost feature for single-threaded apps?
Answer - More than likely. Again, something we are in active conversations about with both IBM and Intel.
Hawk TT - Why not "Boot from SAN"?
Answer - Right now the ease of management is not a big issue. It's a fairly small number of servers, the data on the local disk is nothing important, and honestly we are using the high end (lots of MTBF)Hard drives so, failures are few and far between. Not that it isn't easier/better booting from HBA/LUN0 but just quite a few items down the ôto doö list.
Vahz Rex - Yokai, you briefly mention management tools in the dev blog, any chance you will cover this more in future blogs?
Answer - Yes, Remapping Eve is really about the tools we have to control what solar systems go where and how to balance. I'll talk to the software guys to see just how deep we can go down that rabbit hole.
Hustomte - Is it possible to get this very important announcement added to the Eve-Gate calendar?
Answer - Great idea... Sent a note off to community just now!
Dacil Arandur - What are the chances something like that could happen for this big move? (webcam for TQ move)
Answer - Sounds like fun, but have you seen us? Probably not the most exciting thing on cam and I couldn't handle the rejection if no one watched.
Lee Dalton - What are the RAM SAN and SSD SAN dedicated to?
Answer - Both are dedicated to the DB
joe hamil - ty for the insight into your end CCP, i will keep on turning up to mass testing as often as i can
Answer - Thank you! I know we always need more people on test servers... and it really does help.
Anikadir - haven't seen a covered cool aisle like that before. Do they make much difference?
Answer - Yes, it does. When you have unmanaged airflow... you really have a lot of waste required to over cool the space. As well you can "short cycle" the cooling systems and really screw things up. By doing a contained cold isle, you get "some" control. As another post mentioned, it not perfect or 100% controlled because we do depend on the datacenter to do their job, but I think we are confident in the datacenter we are working with given 6 years of history.
|
|
|
|

CCP Yokai

|
Posted - 2010.06.19 09:35:00 -
[11]
Koronos - As a network and datacenter administrator and all-around nerd myself I am quite geeked up by these improvements, but I'm pretty sure they won't resolve the horrible fleet lag issues we are experiencing now on TQ.
Response - Iagree, and I hope I was not unclear here at all. We are going to improve everything we can to give the software that runs Eve the breathing room it needs. This move will make sure TQ has the space, power and cooling as well as a big step up in switching. It will not single handedly fix all lag forever. CCP is not singled handed either. There are tons of people spending all day working on lag, tuning, etc. The ops guys are just doing our small part to help.
|
|
|

CCP Yokai

|
Posted - 2010.06.19 09:48:00 -
[12]
Lanu - Your my new favorite dev if you keep up with the blogging (+photo's and vids!!)!
Answer - Camera is already packed in the bags for London. The new space is dead sexy. We already have alot of pics of the space empty... we'll get a few out with TQ actually alive in there.
|
|
|

CCP Yokai

|
Posted - 2010.06.19 16:30:00 -
[13]
Originally by: Batolemaeus
Yeah, but remember that it was a _lot_ better before Dominion. People just want back to the time where you could have lag free battles with 300 people.
There are things we are hoping to do to help smaller battles get better access to dedicated hardware quicker...
Part of where we are going with predicting hot spots and remapping blogs will talk about the plans in summer cycles.
|
|
|

CCP Yokai

|
Posted - 2010.06.20 12:23:00 -
[14]
Originally by: Budsin Adar whats moving a cluster going to do to help with eve??
TheLostPenguin said it better than I did... "Moving it is going to do a few things, notably improving cooling and allowing more space for the server"
|
|
|

CCP Yokai

|
Posted - 2010.06.23 08:50:00 -
[15]
It begins... see you guys/gals in 6 hours.
|
|
|

CCP Yokai

|
Posted - 2010.06.23 22:59:00 -
[16]
Originally by: Darod Zyree I bett the devs are just f5ing this thread like we all are :D
Almost there guys...
|
|
|

CCP Shadow
C C P C C P Alliance

|
Posted - 2010.06.24 15:48:00 -
[17]
Off-topic posts removed.
|
|
|

CCP Yokai

|
Posted - 2010.06.29 17:10:00 -
[18]
Originally by: carebear one Edited by: carebear one on 25/06/2010 21:03:13 a tribut to CCP
http://www.eve-outtakes.de/server.jpg
Where are you getting your insider pics from?
|
|
|

CCP Yokai

|
Posted - 2010.06.30 14:01:00 -
[19]
Originally by: smashnecros somebody from ccp promised write new dev blog about issues during TQ upgrade. it'll be interesting to read it.
Up now! http://www.eveonline.com/ingameboard.asp?a=topic&threadID=1345417
|
|
|
|
|