| Pages: 1 2 3 4 :: [one page] |
| Author |
Thread Statistics | Show CCP posts - 20 post(s) |
|

CCP Phantom
C C P C C P Alliance
6159

|
Posted - 2015.10.13 15:54:00 -
[1] - Quote
The Tranquility server cluster is a powerful machine, enabling you to create the biggest living universe of science fiction with the most massive spaceship battles mankind has ever seen.
But you know what is even better than Tranquility? Tranquility Tech III!
Our engineers are working hard to fully revamp the server cluster with new hardware, with new storage, with new network connections, with a new location and new software. TQ Tech III will be much better than the already astonishing current TQ server.
Read more about this marvel of technology (including tech specs and pictures) in CCP Gun Show's latest blog Tranquility Tech III.
And all this is planned for very early 2016! EVE Forever!
CCP Phantom - Senior Community Developer - Volunteer Manager
|
|
|

CCP Falcon
12460

|
Posted - 2015.10.13 16:02:53 -
[2] - Quote
DELICIOUS!
CCP Falcon || EVE Universe Community Manager || @CCP_Falcon
Happy Birthday To FAWLTY7! <3
|
|

Ned Thomas
Signal Cartel EvE-Scout Enclave
1799
|
Posted - 2015.10.13 16:11:54 -
[3] - Quote
Man, we don't need more T3 shi- oh, I get it now. |

Zand Vor
Anomalous Existence Low-Class
13
|
Posted - 2015.10.13 16:19:42 -
[4] - Quote
I'm a super network geek....I really want to know what router, load balancer, and switch platforms you switched to since it sounds like you ditched Cisco.
Oh well, this is a great article and it's awesome to see just a glimpse of how all this infrastructure is designed to work together.
Thank you! |

Niraia
Nocturnal Romance Cynosural Field Theory.
345
|
Posted - 2015.10.13 16:30:40 -
[5] - Quote
[16:22:20] Niraia > /emote moistens
Thank you for posting this, very interesting read! I've been in love with ESXi for the past year, nice to know I'm in good company :)
Niraia
EVE Online Hold'Em
|
|

CCP FoxFour
C C P C C P Alliance
4142

|
Posted - 2015.10.13 16:32:26 -
[6] - Quote
Zand Vor wrote:I'm a super network geek....I really want to know what router, load balancer, and switch platforms you switched to since it sounds like you ditched Cisco.
Oh well, this is a great article and it's awesome to see just a glimpse of how all this infrastructure is designed to work together.
Thank you!
Will ask if they mind sharing said information.
@CCP_FoxFour // Technical Designer // Team Size Matters
Third-party developer? Check out the official developers site for dev blogs, resources, and more.
|
|

Rebekah Aivo
Alternate Mining Solutions
0
|
Posted - 2015.10.13 16:34:53 -
[7] - Quote
I don't know that I'd say that they ditched Cisco, just that they ditched Cisco at the LB layer.
I'd imagine that they're probably using Cisco Nexus switches for the core switching, and beefy F5s (I'd guess Viprion over LTM, but...).
Definitely some good geek porn here for sure :D
I hope to GOD we're using ESXi 5.5... it's the hotness. |

VaL Iscariot
The Concilium Enterprises The Volition Cult
82
|
Posted - 2015.10.13 16:35:44 -
[8] - Quote
the whole time I was reading this dev blog
|

Raphendyr Nardieu
Unpublished Chapter Chapters.
57
|
Posted - 2015.10.13 16:50:39 -
[9] - Quote
OMG, amazing blog. Nice that you added so much specifics.
I hope you get the virtualization working. Would provide nice benefits :) |

CCP Gun Show
C C P C C P Alliance
0
|
Posted - 2015.10.13 16:52:10 -
[10] - Quote
Rebekah Aivo wrote:I don't know that I'd say that they ditched Cisco, just that they ditched Cisco at the LB layer.
I'd imagine that they're probably using Cisco Nexus switches for the core switching, and beefy F5s (I'd guess Viprion over LTM, but...).
Definitely some good geek porn here for sure :D
I hope to GOD we're using ESXi 5.5... it's the hotness.
We are actually going with version 6.0 on ESXi  |

Tappits
North Eastern Swat Pandemic Legion
186
|
Posted - 2015.10.13 16:54:56 -
[11] - Quote
So in like 6 months we will find out the Drifter AI some how escapes and skynets the eve cluster and takes over the world.... I would like to welcome our new drifter overlords. |

BlitZ Kotare
Sniggerdly Pandemic Legion
134
|
Posted - 2015.10.13 17:00:20 -
[12] - Quote
Neat article, thanks for sharing some details with us. I'm especially looking forward to those bigger routing tables. Keep up the good work. |

ImYourMom
Republic University Minmatar Republic
90
|
Posted - 2015.10.13 17:00:58 -
[13] - Quote
Very impressive..and for people who dont know this is some.serous imvestment and I can imagine how excited you guys are just don't get toooo excited  |

Tiagra May
GoonWaffe Goonswarm Federation
14
|
Posted - 2015.10.13 17:03:07 -
[14] - Quote
+1 for Ragnarok in London |

DreadStarX
Fatum Imperium
2
|
Posted - 2015.10.13 17:03:45 -
[15] - Quote
This is absolutely fantastic. As someone who works in a Data Center, everything you said made perfect sense. However, not everyone has a clue what you're talking about.
I do have to ask, why not HP Blades? I laugh at the whopping 768GB of RAM, we've got Dell 2U Servers rolling 1TB of Physical Memory. As for WHY they need that amount, I'm clueless. They're probably running a Private Server of EVE and told no one. Darn those Goons ;)
Anyways, I'm impressed that CCP has actually taken the time to upgrade their hardware, and aren't like most of these companies who have the stance of "Oh, they want upgrades? But they keep shelling out money, and playing. So why should we care?"
Props to CCP, and for listening to Skalmold! It's my Friday Jams at work! |

Rebekah Aivo
Alternate Mining Solutions
0
|
Posted - 2015.10.13 17:07:27 -
[16] - Quote
CCP Gun Show wrote:Rebekah Aivo wrote:I don't know that I'd say that they ditched Cisco, just that they ditched Cisco at the LB layer.
I'd imagine that they're probably using Cisco Nexus switches for the core switching, and beefy F5s (I'd guess Viprion over LTM, but...).
Definitely some good geek porn here for sure :D
I hope to GOD we're using ESXi 5.5... it's the hotness. We are actually going with version 6.0 on ESXi 
Even better!! Most of the virtual environments that I manage are 5.5, looking to upgrade early next year, tho :D |

ArmEagle Kusoni
Knights of Nii The 20 Minuters
43
|
Posted - 2015.10.13 17:08:09 -
[17] - Quote
' ' = = = whoooosh > > - - - - - - - - - - - - - - - - - - head here
Did the old routers already or not (and if the latter; will the new ones) support IPv6? |

Chainsaw Plankton
IDLE GUNS IDLE EMPIRE
1890
|
Posted - 2015.10.13 17:10:27 -
[18] - Quote
I don't know what most of that meant but it sounded good! I always did like bigger numbers 
@ChainsawPlankto
|

Gilbaron
Free-Space-Ranger Northern Coalition.
1827
|
Posted - 2015.10.13 17:14:12 -
[19] - Quote
CCP Gun Show wrote:Rebekah Aivo wrote:I don't know that I'd say that they ditched Cisco, just that they ditched Cisco at the LB layer.
I'd imagine that they're probably using Cisco Nexus switches for the core switching, and beefy F5s (I'd guess Viprion over LTM, but...).
Definitely some good geek porn here for sure :D
I hope to GOD we're using ESXi 5.5... it's the hotness. We are actually going with version 6.0 on ESXi 
how come you don't have a dev tag? |

Emmy Mnemonic
Svea Rike Circle-Of-Two
51
|
Posted - 2015.10.13 17:21:52 -
[20] - Quote
Awsome post! But with all those fancy acronyms and cool h/w you REALLY should consider the possibility that TQ T3 will go self-aware and obliterate all known life on earth...
CEO Svea Rike
|

Arline Kley
PIE Inc. Praetoria Imperialis Excubitoris
627
|
Posted - 2015.10.13 17:23:50 -
[21] - Quote
Delicious, delicious tech filth.
I'll have a word with my boss about hopefully getting our servers updated to EXSi version 6... and goddamn what I would give to work there.
"For it was said they had become like those peculiar demons, which dwell in matter but in whom no light may be found." - Father Grigori, Ravens 3:57
|

Master Degree
I dont want to pay taxes
1
|
Posted - 2015.10.13 17:35:44 -
[22] - Quote
as a IT pro, from experience i can tell that high IO SQL DB running in M$ Failover cluster @ vmware is not the best choice, rather go SQL always on, more storage needed, agree, but failovers are much easier (and much faster).. and you can replicate more times eg active, replica 1, replica 2 etc, can use one of the replica for reads and dont bother with operations on active writting db .. only thing what can be problem is switching of listener between nodes during sudden HW crash or vmotion (MAC address conflict in vmware 5.0, hope they fix it in 6.0 while running vmotion on loaded hosts)
eventually switch to hyper-v (core preferably due patches), license is cheaper as esx(i), but the downside is, that hyper-v is with features at least two releases behind vmware (if you dont pay huge money for scvmm)
just my 5 cents, i assume you made the math already :-)
PS: really nice HW, just vendor is not one of my favorites :) |

Beta Maoye
81
|
Posted - 2015.10.13 17:44:07 -
[23] - Quote
Awesome hardware.
Is TQ T3 ready to host hundreds of player built structures in every star system of New Eden? |
|

Chribba
Otherworld Enterprises Otherworld Empire
14965
|
Posted - 2015.10.13 17:46:16 -
[24] - Quote
Nom nom nom! Really yummy! Great work guys!
/c
GÿàGÿàGÿà Secure 3rd party service GÿàGÿàGÿà
Visit my in-game channel 'Holy Veldspar'
Twitter @Chribba
|
|
|

CCP Gun Show
C C P C C P Alliance
1

|
Posted - 2015.10.13 17:52:14 -
[25] - Quote
Gilbaron wrote:CCP Gun Show wrote:Rebekah Aivo wrote:I don't know that I'd say that they ditched Cisco, just that they ditched Cisco at the LB layer.
I'd imagine that they're probably using Cisco Nexus switches for the core switching, and beefy F5s (I'd guess Viprion over LTM, but...).
Definitely some good geek porn here for sure :D
I hope to GOD we're using ESXi 5.5... it's the hotness. We are actually going with version 6.0 on ESXi  how come you don't have a dev tag?
Fixed , as this is my first blog and first time i had my account active on the forums so tiny hiccup in the registration  |
|

l0rd carlos
TURN LEFT The Camel Empire
1254
|
Posted - 2015.10.13 18:08:00 -
[26] - Quote
Quote:What you see here are 2x IBM SAN volume controllers which govern and control 2x IBM V5000 controllers which store all the data with 3x expansion shelves that house 9x800 GB SSD's with a grand total of 83x 1.2TB 10K SAS disks.
I don't get this. The sentence make it sound like the SSDs have SAS disks build in :D That can't be right.
Youtube Channel about Micro and Small scale PvP with commentary: Fleet Commentary by l0rd carlos
|

Amarisen Gream
Divine Demise Apocalypse Now.
137
|
Posted - 2015.10.13 18:14:51 -
[27] - Quote
my poor little brain cells now hurt.
but i'm drooling with excitement.
xoxo
Amarisen Gream
|

TheMercenaryKing
Ultimatum. The Bastion
373
|
Posted - 2015.10.13 18:16:58 -
[28] - Quote
l0rd carlos wrote:Quote:What you see here are 2x IBM SAN volume controllers which govern and control 2x IBM V5000 controllers which store all the data with 3x expansion shelves that house 9x800 GB SSD's with a grand total of 83x 1.2TB 10K SAS disks. I don't get this. The sentence make it sound like the SSDs have SAS disks build in :D That can't be right.
Likely it is a tiered system with 9 SSDs and 83 10k drives. |

virm pasuul
FRISKY BUSINESS. No Handlebars.
317
|
Posted - 2015.10.13 18:18:55 -
[29] - Quote
Thanks for the tech porn :)
Still single thread server code?
|

virm pasuul
FRISKY BUSINESS. No Handlebars.
317
|
Posted - 2015.10.13 18:21:43 -
[30] - Quote
I have fallen in love with ESXi 5.5 My windows boxes run better on 5.5 than on metal, that still amazes me. If I had a time machine and could go back in time and tell old me that win server runs better on ESX than on bare metal old me would call bull and refuse to believe time travelling me, but it's true. |

Guillame Herschel
Quantum Cats Syndicate Spaceship Bebop
70
|
Posted - 2015.10.13 18:23:12 -
[31] - Quote
Quote:...what could possibly go wrong?!
Beware of vSwitch. |
|

CCP Gun Show
C C P C C P Alliance
2

|
Posted - 2015.10.13 18:23:38 -
[32] - Quote
TheMercenaryKing wrote:l0rd carlos wrote:Quote:What you see here are 2x IBM SAN volume controllers which govern and control 2x IBM V5000 controllers which store all the data with 3x expansion shelves that house 9x800 GB SSD's with a grand total of 83x 1.2TB 10K SAS disks. I don't get this. The sentence make it sound like the SSDs have SAS disks build in :D That can't be right. Likely it is a tiered system with 9 SSDs and 83 10k drives.
yes its tiered so hot data resides on SSD's and SAS takes the cold data
does that answer your question ?
 |
|

virm pasuul
FRISKY BUSINESS. No Handlebars.
317
|
Posted - 2015.10.13 18:28:49 -
[33] - Quote
One thing to watch out for CCP is default memory interleaving settings in the BIOS. I don't know if this is true for IBM, but for Dell out of the box, memory interleaving is turned off, which drops memory performance by a factor of 3 on quad path interconnect processors ( 3 not 4 because one is processor to processor ). Also Dell default mem config is NOT optimised to use interleaved memory, so you have to be careful how you choose your physical memory config and load the paths. I imagine IBM are much more on the ball than Dell on this front, but it's worth checking. |

l0rd carlos
TURN LEFT The Camel Empire
1255
|
Posted - 2015.10.13 18:32:41 -
[34] - Quote
CCP Gun Show wrote:yes its tiered so hot data resides on SSD's and SAS takes the cold data does that answer your question ?  Yes, thank you :-*
Youtube Channel about Micro and Small scale PvP with commentary: Fleet Commentary by l0rd carlos
|

Alyxportur
From Our Cold Dead Hands
112
|
Posted - 2015.10.13 18:51:40 -
[35] - Quote
I take it this means that the lag when making contracts, opening the market/any window, etc. will be reduced/gone, but will this also reduce the number of 'Socket Closed' events?
It would be interesting to see numerical statistics (not a line chart or diagram, please) on the instances and magnitude of each TIDI occurence for (1) a year pre-Phoebe, (2) the time between Phoebe and Aegis, and (3) post-Aegis until now. |

Bienator II
madmen of the skies
3416
|
Posted - 2015.10.13 19:08:37 -
[36] - Quote
i send you a floppy disk to backup my char
how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value
|

tasman devil
HUN Corp. HUN Reloaded
54
|
Posted - 2015.10.13 19:21:30 -
[37] - Quote
CCP Phantom wrote:The Tranquility server cluster is a powerful machine, enabling you to create the biggest living universe of science fiction with the most massive spaceship battles mankind has ever seen. But you know what is even better than Tranquility? Tranquility Tech III! Our engineers are working hard to fully revamp the server cluster with new hardware, with new storage, with new network connections, with a new location and new software. TQ Tech III will be much better than the already astonishing current TQ server. Read more about this marvel of technology (including tech specs and pictures) in CCP Gun Show's latest blog Tranquility Tech III. And all this is planned for very early 2016! EVE Forever! Many thanks to the folks who wrote this Dev Blog. It is a truly good read!
I do hope it'll be an "OP SUCCESS!" :-)
And do please take pictures.
I don't belive in reincarnation
I've never believed in it in my previous lives either...
|

xrev
EVE Corporation 987654321-POP The Marmite Collective
1
|
Posted - 2015.10.13 20:16:21 -
[38] - Quote
San storage enthousiast here :)
Keep in mind before upgrading from 4 or 8 Gbps to 16 Gbps to check the available and used buffer credits in combination with the average transmitted package size. Default FC frame size is supposed to be 2112 bytes and uses 1 buffercredit to transmit. With the increased speed, more buffercredits are used to fully utilize the line. When they're used, nothing will be transmitted until buffers are freed up. Had a customer some time ago who had upgraded the linespeed and in combination with synchronous storage replication, production came to a halt because of the switches running out of buffers.
So CCP, please double check the framesizes and buffercredits before upgrading the speed ;) |

Ix Method
Brutor Tribe Minmatar Republic
475
|
Posted - 2015.10.13 20:27:07 -
[39] - Quote
Volcano-powered Singularity.
Yes.
Travelling at the speed of love.
|
|

CCP Gun Show
C C P C C P Alliance
5

|
Posted - 2015.10.13 20:30:35 -
[40] - Quote
Ix Method wrote:Volcano-powered Singularity.
Yes.
we are thinking about renaming Singularity to Eyjafjallaj+¦kull 
kidding |
|

Steve Ronuken
Fuzzwork Enterprises Vote Steve Ronuken for CSM
5623
|
Posted - 2015.10.13 20:34:55 -
[41] - Quote
CCP Gun Show wrote:Ix Method wrote:Volcano-powered Singularity.
Yes. we are thinking about renaming Singularity to Eyjafjallaj+¦kull  kidding
I'd just like to say: You are a large scary man 
Woo! CSM X!
Fuzzwork Enterprises
Twitter: @fuzzysteve on Twitter
|

Haffsol
40
|
Posted - 2015.10.13 20:59:37 -
[42] - Quote
Quote:[..... bla blah nerdy things....] what could possibly go wrong?! Exactly  |

Bienator II
madmen of the skies
3416
|
Posted - 2015.10.13 21:10:26 -
[43] - Quote
so you will have fewer solar system nodes but they will have more bandwidth and be better connected?
how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value
|

Nafensoriel
KarmaFleet Goonswarm Federation
119
|
Posted - 2015.10.13 21:29:38 -
[44] - Quote
So... since your engineers have decided to stop purchasing our superior minmatar duct tape...
Well actually that's it.. we're screwed. CCP engineers were 90% of our customer base. I guess we can start making server polish?
Seriously though awesome. Old codes going out the door and now the kludge hardware is going to. This is an awesome day for EVE.
Though seriously.. the engineers convinced you to keep the old cluster so they could play doom on it and have nerdgasams.. admit it. |

virm pasuul
FRISKY BUSINESS. No Handlebars.
318
|
Posted - 2015.10.13 21:29:48 -
[45] - Quote
Bienator II wrote:so you will have fewer solar system nodes but they will have more bandwidth and be better connected?
Hardware or software? I think the nodes are probably virtualised, so divide the total hardware resources by the number of nodes. Virtual stuff when done properly can be very efficient. e.g. a big gang roaming hops from one node to another, but they are virtual software nodes, if on the same host the net load on the underlying hardware would remain unchanged even though the gang had moved node.
CCP will be able to provision new nodes and drop unused nodes automatically. Also see the load balancing presentation they did a few fanfests ago where they explained their node balancing algorithm in detail. Moving nodes around to do hardware maintenance with virtualisation is a doddle. Nodes can be moved live from hardware host to hardware host whilst still doing active work for clients and not dropping a single packet mid move.
The hardware abstraction from virtualisation, the storage abstraction, along with all the hardware redundancy makes the setup described pretty bulletproof. The only point of failure left now is that little "feature" in the CCP automation system that no one thought could break. Amazon, Google, Microsoft, and pretty much every UK bank have all had unbreakable cloud setups break.
It is an amazing bit of kit that CCP is investing in. There's probably well over seven digits of new hardware there.
Now if only CCP could come up with multi threading server code.... :)
|
|

CCP DeNormalized
C C P C C P Alliance
302

|
Posted - 2015.10.13 21:40:59 -
[46] - Quote
Master Degree wrote:as a IT pro, from experience i can tell that high IO SQL DB running in M$ Failover cluster @ vmware is not the best choice, rather go SQL always on, more storage needed, agree, but failovers are much easier (and much faster).. and you can replicate more times eg active, replica 1, replica 2 etc, can use one of the replica for reads and dont bother with operations on active writting db .. only thing what can be problem is switching of listener between nodes during sudden HW crash or vmotion (MAC address conflict in vmware 5.0, hope they fix it in 6.0 while running vmotion on loaded hosts)
eventually switch to hyper-v (core preferably due patches), license is cheaper as esx(i), but the downside is, that hyper-v is with features at least two releases behind vmware (if you dont pay huge money for scvmm)
just my 5 cents, i assume you made the math already :-)
PS: really nice HW, just vendor is not one of my favorites :)
thx for the comment and info MD!
I hear you on the VMWare possibly not being the best choice as there is definitely overhead invovled (both in I/O resources as well as licensing costs!). We'll do some testing and see the impact it has, and if we don't get to where we want with it, it's out! :)
In regards to AlwaysOn we'll be using this on top of whatever route we go w/ the cluster. This will be our primary replication method for keeping both our DRS in sync as well as offering live reporting services to internal users.
CCP DeNormalized
DBA
Virtual World Operations
|
|
|

CCP DeNormalized
C C P C C P Alliance
302

|
Posted - 2015.10.13 21:49:41 -
[47] - Quote
Steve Ronuken wrote:CCP Gun Show wrote:Ix Method wrote:Volcano-powered Singularity.
Yes. we are thinking about renaming Singularity to Eyjafjallaj+¦kull  kidding I'd just like to say: You are a large scary man 
This doesn't become really really true until you spend 2 days of heavy drinking in the middle of the icelandic wilderness with the man...
"Don't wake the Balrog!" Is a slogan we force all new Operations team members to learn very early on :)
Ops Offsite best offsite!
CCP DeNormalized - Database Administrator
|
|

Gospadin
Bastard Children of Poinen Grumpy Space Bastards
242
|
Posted - 2015.10.13 21:52:27 -
[48] - Quote
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. |

TigerXtrm
KarmaFleet Goonswarm Federation
1287
|
Posted - 2015.10.13 22:03:50 -
[49] - Quote
No worries people. EVE is still dying on schedule. That's why they are pumping I don't even know how many hundreds of thousands of dollars into new server hardware. Because if it's going to die, it's going to die in style 
My YouTube Channel - EVE Tutorials & other game related things!
My Website - Blogs, Livestreams & Forums
|

xrev
EVE Corporation 987654321-POP The Marmite Collective
1
|
Posted - 2015.10.13 22:10:42 -
[50] - Quote
Gospadin wrote:I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine. |

Bienator II
madmen of the skies
3416
|
Posted - 2015.10.13 22:30:22 -
[51] - Quote
virm pasuul wrote:Bienator II wrote:so you will have fewer solar system nodes but they will have more bandwidth and be better connected? Hardware or software? http://i.imgur.com/xCjjFc9.png
how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value
|

Cor'len
The Silence of Thunder
10
|
Posted - 2015.10.13 23:13:39 -
[52] - Quote
Bienator II wrote: Thats prob why ccp seems to see MT as low priority atm.
Actually, CCP would love to multithread the ~space code~ (can't remember the component name, haha). But it's practically impossible to get a consistent result; operations must be done in sequence, otherwise you get dead ships killing living ships, and other ~exciting~ edge cases.
This is the ultimate limiter on EVE performance. They might conceivably be able to MT the processing of different grids in a single system, but everything that happens on a single grid must execute in a deterministic fashion, and in the correct order.
Plus, even if that wasn't a problem, they run Stackless Python, with the beloved global interpreter lock which effectively prevents multithreading.
tl;dr CCP wants to multithread all the things, but it's so hard it's bordering on impossible. Hence the effort to not have big fights. |

Gospadin
Bastard Children of Poinen Grumpy Space Bastards
242
|
Posted - 2015.10.13 23:16:18 -
[53] - Quote
xrev wrote:Gospadin wrote:I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine.
I know how it works.
It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead) |

Bienator II
madmen of the skies
3416
|
Posted - 2015.10.13 23:54:02 -
[54] - Quote
Cor'len wrote:Bienator II wrote: Thats prob why ccp seems to see MT as low priority atm. Actually, CCP would love to multithread the ~space code~ (can't remember the component name, haha). But it's practically impossible to get a consistent result; operations must be done in sequence, otherwise you get dead ships killing living ships, and other ~exciting~ edge cases. splitting tasks up is only one way of parallelism. You can distribute sequential tasks on different compute hardware via pipelining/layering etc.
but the thing is ccp does not have to do that. since they already can reach parallelism by simply running multiple processes on the same node. again: they are running 100+ systems on a single node. All they have to do is to run them in N processes instead of 1. (would not surprise me if they would run every system in its own process tbh)
mustithreading would only help in the worst case scenario: whole eve population is in the same system but according to ccp this is not even certain since the bottleneck seems to be memory bandwidth not computing power.
how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value
|

Berahk
Lightweight Dynamics
0
|
Posted - 2015.10.14 00:13:40 -
[55] - Quote
So, few questions
How much closer does this server setup bring us to never needing downtime?
Also
How closer to being able to failover a tremendously busy system onto one of the combat nodes without having to wait until the following downtime? (or booking it in advance)
Thanks
/b |

Mara Rinn
Cosmic Goo Convertor
5835
|
Posted - 2015.10.14 00:56:01 -
[56] - Quote
Berahk wrote:How much closer does this server setup bring us to never needing downtime?
Most important question in the thread :D
http://community.eveonline.com/news/dev-blogs/death-to-downtimes/
Day 0 Advice for New Players
|

Alundil
Isogen 5
1034
|
Posted - 2015.10.14 01:45:01 -
[57] - Quote
Raphendyr Nardieu wrote:OMG, amazing blog. Nice that you added so much specifics.
I hope you get the virtualization working. Would provide nice benefits :) Came to say this. Excellent article. Vmotion on terrific hardware is sweet sweet sweet. We use this in our 20000 user environment to great effect.
Kerri up the great work.
I'm right behind you
|

Shamwow Hookerbeater
Nine Inch Ninja Corp
0
|
Posted - 2015.10.14 03:26:38 -
[58] - Quote
Gospadin wrote:xrev wrote:Gospadin wrote:I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine. I know how it works. It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)
Kinda funny in a way, at my last company we had some rather beefy 7420 ZFS appliances with ram/ssd/15K disks and we weren't happy when we were only getting approx 50-60K IOPS from pure disk operations across multiple pools. We could hit 200K+ on things that were cached....but we only needed that performance for some edge cases of ours. then we tested an AFF on our extreme edge cases...and were like crap why didn't these things get cheaper faster.
The AFF was incredibly faster than our 7420s in most cases especially anything approaching high levels of random IO (not surprising) it was so bad that a moderately powered vm (4 or 8 vcpus and like 64GB) was beating our 24 core 196GB physical boxes in total transactions when running things like HammerOra |

Bienator II
madmen of the skies
3416
|
Posted - 2015.10.14 04:56:12 -
[59] - Quote
having DT only every second day would be a start :P
how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value
|

Raiz Nhell
Internet Terrorists SpaceMonkey's Alliance
431
|
Posted - 2015.10.14 05:21:32 -
[60] - Quote
Amazing stuff...
Wish I could convince the boss that we need a 10th of this stuff...
Keep up the good work...
P.S. Would like to see photos of Sisi's Volcano powered lair :)
There is no such thing as a fair fight...
If your fighting fair you have automatically put yourself at a disadvantage.
|

Puer Servus
Republic Military School Minmatar Republic
3
|
Posted - 2015.10.14 05:50:25 -
[61] - Quote
Cor'len wrote:Bienator II wrote: Thats prob why ccp seems to see MT as low priority atm. Actually, CCP would love to multithread the ~space code~ (can't remember the component name, haha). But it's practically impossible to get a consistent result; operations must be done in sequence, otherwise you get dead ships killing living ships, and other ~exciting~ edge cases. This is the ultimate limiter on EVE performance. They might conceivably be able to MT the processing of different grids in a single system, but everything that happens on a single grid must execute in a deterministic fashion, and in the correct order. Plus, even if that wasn't a problem, they run Stackless Python, with the beloved global interpreter lock which effectively prevents multithreading. tl;dr CCP wants to multithread all the things, but it's so hard it's bordering on impossible. Hence the effort to not have big fights.
Actually there is a ~space code~ multithread project called The Destiny Dispatcher.
http://www.youtube.com/watch?v=NEJbwZCgNgU&t=1h16m00s
http://www.youtube.com/watch?v=UcEUB6h4Br0&t=8m10s
Can devs give any update on this project? |

Corraidhin Farsaidh
Farsaidh's Freeborn
1738
|
Posted - 2015.10.14 07:58:45 -
[62] - Quote
The only problem I can see is with the database...should have been Oracle RAC cluster with ASM and Dataguard Active-Active failover :D
Ed; Admittedly the Oracle licensing structure makes me think it was designed by the Mittani himself (or one of his little minions) but everything has it's drawbacks... |
|

CCP DeNormalized
C C P C C P Alliance
304

|
Posted - 2015.10.14 09:43:49 -
[63] - Quote
Gospadin wrote:xrev wrote:Gospadin wrote:I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine. I know how it works. It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)
the DB averages around 2K IOPS during a regular run and while we spike upwards of 60-70K IOPS during startup, typically things are somewhat calm (2,000 batches per second @ the DB layer isn't massive by any means, but it's also far from quiet)
So in the end we have a flash tier of 5+ TB with a DB that's only 3 TB, plus we have over 700 GB of RAM for buffer pool space.
We really just don't need anything faster :)
CCP DeNormalized - Database Administrator
|
|
|

CCP Gun Show
C C P C C P Alliance
13

|
Posted - 2015.10.14 09:57:35 -
[64] - Quote
xrev wrote:San storage enthousiast here :)
Keep in mind before upgrading from 4 or 8 Gbps to 16 Gbps to check the available and used buffer credits in combination with the average transmitted package size. Default FC frame size is supposed to be 2112 bytes and uses 1 buffercredit to transmit. With the increased speed, more buffercredits are used to fully utilize the line. When they're used, nothing will be transmitted until buffers are freed up. Had a customer some time ago who had upgraded the linespeed and in combination with synchronous storage replication, production came to a halt because of the switches running out of buffers.
So CCP, please double check the framesizes and buffercredits before upgrading the speed ;)
Oh, just reminded, Iirc, the 48B-5 switches have 2 Condor3 asics, dividing the 48 ports in 0-23 on Asic1, and 24-47 on asic2. Keep in mind there's a default oversubscription on the asics of 1.5:1. So make sure to distribute the FC ports of a host evenly over both asics to control the performance :)
Cool stuff though. Make pictures when it arrives in the datacenters!
Excellent advice , Thanks for the constructive feedback  |
|
|

CCP DeNormalized
C C P C C P Alliance
304

|
Posted - 2015.10.14 10:00:05 -
[65] - Quote
Corraidhin Farsaidh wrote:The only problem I can see is with the database...should have been Oracle RAC cluster with ASM and Dataguard Active-Active failover :D
Ed; Admittedly the Oracle licensing structure makes me think it was designed by the Mittani himself (or one of his little minions) but everything has it's drawbacks...
Oracle RAC is super sexy for sure! But as you say, the licensing is nuts and at this point we just don't see any need to switch.
Cost / benefit just isn't there, and really MS SQL is quite reliable and has great HA/DR features with AlwaysOn and Availablity Groups.
we'll be doing some fun tests where we have a 4 node cluster with multiple AlwaysOn read secondary's: 2 nodes in our primary data center with a 3rd node in our DR Site and finally a 4th Node hosted @ amazon.
This is level of DR/BC that we're happy with :)
CCP DeNormalized - Database Administrator
|
|

virm pasuul
FRISKY BUSINESS. No Handlebars.
319
|
Posted - 2015.10.14 10:23:46 -
[66] - Quote
I wonder how many techies and possibly even sales & management of your equipment vendors actually play Eve? I bet among Eve players you probably have as much tech workers as a fairly high tier vendor. Add in Eve Serenity players and you have the manufacture base covered too.
|

Xian Reevs
Armilies Corporation
0
|
Posted - 2015.10.14 10:38:19 -
[67] - Quote
I love when they talk dirty like that. Geek porn.
I am just wondering if anyone have even considered Hyper-V clustering? And no, I don't want to start a Hyper-V vs SEXi discussion. ;)
CCP DeNormalized wrote:We really just don't need anything faster :)
There is no such thing as "fast enough" for an IT enthusiast! |

Corraidhin Farsaidh
Farsaidh's Freeborn
1739
|
Posted - 2015.10.14 10:53:16 -
[68] - Quote
CCP DeNormalized wrote:Corraidhin Farsaidh wrote:The only problem I can see is with the database...should have been Oracle RAC cluster with ASM and Dataguard Active-Active failover :D
Ed; Admittedly the Oracle licensing structure makes me think it was designed by the Mittani himself (or one of his little minions) but everything has it's drawbacks... Oracle RAC is super sexy for sure! But as you say, the licensing is nuts and at this point we just don't see any need to switch. Cost / benefit just isn't there, and really MS SQL is quite reliable and has great HA/DR features with AlwaysOn and Availablity Groups. we'll be doing some fun tests where we have a 4 node cluster with multiple AlwaysOn read secondary's: 2 nodes in our primary data center with a 3rd node in our DR Site and finally a 4th Node hosted @ amazon. This is level of DR/BC that we're happy with :)
What's funny is you most likely have better DR availability and testing than most of the major banks I've worked at. I wish they would pay half the attention to detail as you have :D
Ed: Actually I don't. I'd be out of a job then.... |

Steve Ronuken
Fuzzwork Enterprises Vote Steve Ronuken for CSM
5626
|
Posted - 2015.10.14 11:23:41 -
[69] - Quote
Oracle licensing. ewww.
1 CPU license required per 2 cores. (for most intel kit) at around 32K per license. Then there's 15k for each RAC license (1 per 2 cores). And then there can be all the addins, which can double that easily. And to become liable for licensing, it can be as simple as 'run one query' or 'change one configuration setting')
Yes, I'm bitter
Woo! CSM X!
Fuzzwork Enterprises
Twitter: @fuzzysteve on Twitter
|

Nafensoriel
KarmaFleet Goonswarm Federation
124
|
Posted - 2015.10.14 12:05:21 -
[70] - Quote
virm pasuul wrote:I wonder how many techies and possibly even sales & management of your equipment vendors actually play Eve? I bet among Eve players you probably have as much tech workers as a fairly high tier vendor. Add in Eve Serenity players and you have the manufacture base covered too.
You would be amazed at how many physicists, geneticists, chemists, etc play EVE... Not even considering the "just graduated line members" we have access to a veritable horde of skilled technical and amazing minds.
It's not even limited to the sciences. Pick a topic and start a discussion and chances are you'll find someone in EVE online whos a kindred spirit to your discipline. |

Natya Mebelle
Center for Advanced Studies Gallente Federation
314
|
Posted - 2015.10.14 12:41:33 -
[71] - Quote
Every now and then I read an Eve tech article to see how much I understand. It sounds foreign and astounding as usual, because I really have no point of reference even with the statistics for what will be done and all. But for all that tech talk, it leaves me with little that I can actually take away from the devblog to look forward to :c Even browsing the 4 pages on the forum didn't get me anywhere further.
So it leaves me with two big questions:
One... because I'm curious! How much does the entire thing cost in total?
Two... what will ACTUALLY change for us players?
Will session timers will be further reduced? Will transition between systems be faster? Will we get less dreaded "Socket closed" or other connectivity error messages? Because I know not every single one of them is related to the user. Will there be less random Tidi now in remote systems that are hardly populated just because the performance is needed elsewhere? And most importantly... are there plans with all that new power under the hood to fight the war against the 1second server tick? c:
oh and one that I forgot: With all the "redundancy" in the best way, is there a chance to look at no downtime or even faster downtime than we have nowadays? |

Alundil
Isogen 5
1040
|
Posted - 2015.10.14 13:12:10 -
[72] - Quote
Bienator II wrote:having DT only every second day would be a start :P Congratulations. You've succeeded in nerfing capital escalations by 50%
I'm right behind you
|

Corraidhin Farsaidh
Farsaidh's Freeborn
1741
|
Posted - 2015.10.14 13:16:27 -
[73] - Quote
Nafensoriel wrote:virm pasuul wrote:I wonder how many techies and possibly even sales & management of your equipment vendors actually play Eve? I bet among Eve players you probably have as much tech workers as a fairly high tier vendor. Add in Eve Serenity players and you have the manufacture base covered too.
You would be amazed at how many physicists, geneticists, chemists, etc play EVE... Not even considering the "just graduated line members" we have access to a veritable horde of skilled technical and amazing minds. It's not even limited to the sciences. Pick a topic and start a discussion and chances are you'll find someone in EVE online whos a kindred spirit to your discipline.
One of the reasons why I like the game :) |

Corraidhin Farsaidh
Farsaidh's Freeborn
1742
|
Posted - 2015.10.14 13:27:44 -
[74] - Quote
Steve Ronuken wrote:Oracle licensing. ewww.
1 CPU license required per 2 cores. (for most intel kit) at around 32K per license. Then there's 15k for each RAC license (1 per 2 cores). And then there can be all the addins, which can double that easily. And to become liable for licensing, it can be as simple as 'run one query' or 'change one configuration setting')
Yes, I'm bitter
Addins which include the Grid Control stuff which is pretty much a necessity! I too am not at all bitter about it. Mainly because I never have to pay for it, my clients do :D |

Tiddle Jr
Brutor Tribe Minmatar Republic
592
|
Posted - 2015.10.14 13:36:31 -
[75] - Quote
i'm excited! And you? |

Esrevid Nekkeg
Justified and Ancient
513
|
Posted - 2015.10.14 14:02:23 -
[76] - Quote
Nafensoriel wrote:... It's not even limited to the sciences. Pick a topic and start a discussion and chances are you'll find someone in EVE online whos a kindred spirit to your discipline. Ships Carpenter here (woodworking on pricey yachts in my case)... Natya Mebelle wrote:Every now and then I read an Eve tech article to see how much I understand. It sounds foreign and astounding as usual, because I really have no point of reference even with the statistics for what will be done and all. But for all that tech talk, it leaves me with little that I can actually take away from the devblog to look forward to :c Even browsing the 4 pages on the forum didn't get me anywhere further. ... Same here. But I am more than willing to relax in the back seat of this limo enjoying the ride wile the technicians discuss the things going on under the hood.
Thanks CCP for the hard and undoubtedly necessary hard work on keeping TQ reliably going, now and in the future!
Here I used to have a sig of our old Camper in space. Now it is disregarded as being the wrong format.
Looking out the window I see one thing: Nothing wrong with the format of our Camper! Silly CCP......
|

Aryth
GoonWaffe Goonswarm Federation
1866
|
Posted - 2015.10.14 14:51:45 -
[77] - Quote
In the writeup I don't see why IBM. Did you bake these off against UCS and they were faster? Or just whitebox.
Maybe your performance needs are very niche but I have yet to see a bakeoff where IBM won against almost anyone.
Leader of the Goonswarm Economic Warfare Cabal.
Creator of Burn Jita
Vile Rat: You're the greatest sociopath that has ever played eve.
|

Freelancer117
So you want to be a Hero
352
|
Posted - 2015.10.14 15:45:24 -
[78] - Quote
Gratz on moving to DDR4 
Your new server cpu's are over a year on the market and mid range, hope you got a good price.
source: http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-Family#@All
You mentioned Eve Forever and Dust514 wil run on Eve proxies, is CCP games still lasor focused on tying Dust514 and New Eden capusleers together in A Future Vision with enhanced interactions ?
Regards, a Freelancer
The players will make a better version of the game, then CCP initially plans.
http://eve-radio.com//images/photos/3419/223/34afa0d7998f0a9a86f737d6.jpg
The heart is deceitful above all things and beyond cure. Who can understand it?
|
|

CCP DeNormalized
C C P C C P Alliance
306

|
Posted - 2015.10.14 15:56:25 -
[79] - Quote
CPU's are E7-8893 v3, not E5
http://ark.intel.com/products/84688/Intel-Xeon-Processor-E7-8893-v3-45M-Cache-3_20-GHz
Launch Date Q2'15
Errr, at least the DB CPU's are :) I don't really care so much about the others :)
CCP DeNormalized - Database Administrator
|
|

Freelancer117
So you want to be a Hero
354
|
Posted - 2015.10.14 16:01:11 -
[80] - Quote
Hehe, wouldn't mind switching my I7 cpu to one of those ! 
The players will make a better version of the game, then CCP initially plans.
http://eve-radio.com//images/photos/3419/223/34afa0d7998f0a9a86f737d6.jpg
The heart is deceitful above all things and beyond cure. Who can understand it?
|

Josia
Sunrise Services
2
|
Posted - 2015.10.14 18:31:45 -
[81] - Quote
Can you run Minecraft on it? |

Luca Lure
Obertura
48
|
Posted - 2015.10.14 19:32:36 -
[82] - Quote
Clearly EVE is dying. Hamsters need new cages.
GÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇòGÇò
The essence of the independent mind lies not in what it thinks, but in how it thinks.
|

Indahmawar Fazmarai
4030
|
Posted - 2015.10.14 20:05:15 -
[83] - Quote
Out of curiosity... where will come from all the additional players needed to use/justify such powerful hardware? 
CCP Seagull: "EVE should be a universe where the infrastructure you build and fight over is as player driven and dynamic as the EVE market is now".
62% of players: "We're not interested. May we have Plan B, please?"
CCP Seagull: "What Plan B?"
|

Corraidhin Farsaidh
Farsaidh's Freeborn
1745
|
Posted - 2015.10.14 21:02:55 -
[84] - Quote
The others matter?
|
|

CCP Gun Show
C C P C C P Alliance
14

|
Posted - 2015.10.14 21:07:34 -
[85] - Quote
Aryth wrote:In the writeup I don't see why IBM. Did you bake these off against UCS and they were faster? Or just whitebox.
Maybe your performance needs are very niche but I have yet to see a bakeoff where IBM won against almost anyone.
We did an intensive comparison with couple of vendors but came to this conclusion out of couple of reasons
This is a vague answer and does not tell you much apart from that we did our du diligence 
Plus our relationship with the Icelandic vendor is excellent after decade of cooperation, I literally can call the lead IBM SAN expert anytime 24/7 and they are quick to support us in a critical scenario with good escalation path into IBM
Hope this answer helps and please keep on asking about TQ Tech III |
|
|

CCP Gun Show
C C P C C P Alliance
14

|
Posted - 2015.10.14 21:09:34 -
[86] - Quote
Corraidhin Farsaidh wrote:
Oh yes they do ! The entire cluster matters
Demoralized is just lazer focused on the DB machines apparently  |
|

virm pasuul
FRISKY BUSINESS. No Handlebars.
319
|
Posted - 2015.10.14 22:20:57 -
[87] - Quote
What's the warranty on the new kit? Default 3 year or extended?
|
|

CCP DeNormalized
C C P C C P Alliance
307

|
Posted - 2015.10.14 22:27:23 -
[88] - Quote
CCP Gun Show wrote:Oh yes they do ! The entire cluster matters CCP Denormalized is just lazer focused on the DB machines apparently 
I suppose I care about the rest as well... If not for those others my shiny DB servers would just sit idle all day long :)
CCP DeNormalized - Database Administrator
|
|

Disco Dancer Dancing
State War Academy Caldari State
0
|
Posted - 2015.10.14 22:51:29 -
[89] - Quote
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?
Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations
L1 cache reference 0.5 ns Branch Mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1KB with Zippy 3,000 ns Sent 1KB over 1Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1MB sequentially from memory250,000 ns 0.25 ms Round trip within datacenter 500,000 ns 0.5 ms Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory Disk seek 10,000,000 ns10 ms, 20x datacenter round trip Read 1MB sequentially from disk 20,000,000 ns20 ms, 80x memory, 20x SSD Send packet CA -> Netherlands -> CA150,000,000 ns150 ms
Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).
On another note, if we look on a mainstream enterprise SSD we can find a few figures: 500MB/s Read and 460MB/s Write If we put these into the following calculation to see when we saturate a traditional storage network: numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))
We get the following table: Network BW SSDs required to saturate network BW Controller ConnectivityAvailable Network BWRead I/OWrite I/O Dual 4Gb FC 8Gb == 1GB 2 3 Dual 8Gb FC 16Gb == 2GB 4 5 Dual 16Gb FC 32Gb == 4GB 8 9 Dual 1Gb ETH 2Gb == 0.25GB 1 1 Dual 10Gb ETH 20Gb == 2.5GB 5 6
This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.
We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.
There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?
|
|

CCP Gun Show
C C P C C P Alliance
16

|
Posted - 2015.10.15 00:05:47 -
[90] - Quote
Disco Dancer Dancing wrote:Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?
Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations
L1 cache reference 0.5 ns Branch Mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1KB with Zippy 3,000 ns Sent 1KB over 1Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1MB sequentially from memory250,000 ns 0.25 ms Round trip within datacenter 500,000 ns 0.5 ms Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory Disk seek 10,000,000 ns10 ms, 20x datacenter round trip Read 1MB sequentially from disk 20,000,000 ns20 ms, 80x memory, 20x SSD Send packet CA -> Netherlands -> CA150,000,000 ns150 ms
Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).
On another note, if we look on a mainstream enterprise SSD we can find a few figures: 500MB/s Read and 460MB/s Write If we put these into the following calculation to see when we saturate a traditional storage network: numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))
We get the following table: Network BW SSDs required to saturate network BW Controller ConnectivityAvailable Network BWRead I/OWrite I/O Dual 4Gb FC 8Gb == 1GB 2 3 Dual 8Gb FC 16Gb == 2GB 4 5 Dual 16Gb FC 32Gb == 4GB 8 9 Dual 1Gb ETH 2Gb == 0.25GB 1 1 Dual 10Gb ETH 20Gb == 2.5GB 5 6
This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.
We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.
There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?
Wow that's one serious question right there , hope you will come to fanfest 2016 to talk about latency 
I just saw this on my mobile and allow me to get you a proper answer in a day or two
Excellent stuff !
|
|

BogWopit
Star Frontiers Brotherhood of Spacers
15
|
Posted - 2015.10.15 07:10:13 -
[91] - Quote
Nerdgasm,
Be interesting to see if you get SQL to perform on top of HyperV, seen it done wrong so many times to the detriment of performance. |

Steve Ronuken
Fuzzwork Enterprises Vote Steve Ronuken for CSM
5629
|
Posted - 2015.10.15 11:06:28 -
[92] - Quote
CCP DeNormalized wrote:CCP Gun Show wrote:Oh yes they do ! The entire cluster matters CCP Denormalized is just lazer focused on the DB machines apparently  I suppose I care about the rest as well... If not for those others my shiny DB servers would just sit idle all day long :)
But then you've got plenty of time for maintenance tasks! Users just cause trouble for databases!
Woo! CSM X!
Fuzzwork Enterprises
Twitter: @fuzzysteve on Twitter
|

Sithausy Naj
School of Applied Knowledge Caldari State
0
|
Posted - 2015.10.15 11:24:36 -
[93] - Quote
CCP Phantom wrote:The Tranquility server cluster is a powerful machine, enabling you to create the biggest living universe of science fiction with the most massive spaceship battles mankind has ever seen. But you know what is even better than Tranquility? Tranquility Tech III! Our engineers are working hard to fully revamp the server cluster with new hardware, with new storage, with new network connections, with a new location and new software. TQ Tech III will be much better than the already astonishing current TQ server. Read more about this marvel of technology (including tech specs and pictures) in CCP Gun Show's latest blog Tranquility Tech III. And all this is planned for very early 2016! EVE Forever!
Hey dears
Is there anyone around that can hop on and chat regarding storage for a while. Are you using Child pools on Storwize V5k/SVC, is SSD used and in what configuration, is it Easy Tier enabled or just standalone allocation, Flash Copy usage? SVC Generation? Is it 32 or 64GB cache model and are you consider using Compression? What primary protocol is used (for SVC - V5k is definately FC but is it 8 or 16 FC Frontend protocol or 10Gbps iSCSI for Flex connection? |

Sithausy Naj
School of Applied Knowledge Caldari State
0
|
Posted - 2015.10.15 11:42:02 -
[94] - Quote
Disco Dancer Dancing wrote:Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?
Mate fair enough but you not calculated TCO of overall solution, In memory DB and processing require a little bit different approach and will need to rebuild whole concept of current infrastructure while more traditional gives ability to extended what they have without much changes.
SSD and in memory might be solution while compared to traditional HDDs. As we can see they are going to use IBM SVC that is able to virtualize external storage that means - possibility to add either SSD to Storwize V5000 or whole IBM FlashSystem at FC backend level.
While traditional SSD have standard SAS 2.0\3.0 interface this days, IBM Flash is using Direct PCI-E Flash Modules with FPGA based architecture allowing faster and direct acccess to storage media by itself. SVC allows to disable caching on the side of Storage Hypervisor meaning IO will be passing from Server HBA to Flash System with (basicly) 2 FC hops. Switch latency ~ 5-25us according to Brocade documentations (If I'm not mistaken)
So there is a lot of things to consider and not ONLY technical |

Sithausy Naj
School of Applied Knowledge Caldari State
0
|
Posted - 2015.10.15 12:55:10 -
[95] - Quote
Gospadin wrote:xrev wrote:Gospadin wrote:I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold. It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine. I know how it works. It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)
Ouch this is strange consideration.
If you wanna see close numbers lets assume they going to have 8 SSD drives each 800GB capacity in RAID 5 (so one global spare = 9) And 80 drives 1.2TB each 10k rpm SAS all in RAID 5 = 8 drives for each RAID ( 7+Parity) and 3 Global spare - there we go for basic Storwize V5000 configuration in Dev Blog.
Lets assume all of them are in one pool so have overall capacity available for mapping - 84000 GB (SAS) + 5600 GB (SSD) Lets say we have one host connected through 4 x 8Gbps FC from server side and 8 x 8Gbps FC from Storage side with allocated usage of capacity at 70000 GB (not gonna push the limits.
According to what Storwize is capable of. Lets not be conservative and load it a little bith with 16KiB block size for transfers and start fairly from 5000 IOPS
Assuming Cache statistic Read Percentage - 70% Read Sequential - 20% Read hit - 60% Random Read hit - 40% Sequential Read hit - 20% Write Percentage - 30% 20% of all writes will be sequential Seek percentage - 33% Random Write Efficiency - 35%
At 5000 IOPS we will be here: Total Service Time: 1.0 ms Read Service Time: 1.3 ms Write Service Time: 0.2 ms Channel Queue Time: 0.0 ms Processor Utilization for I/O: 1.5 % Channel Utilization: 1.2 % Host Adapter Utilization: 1.1 % SAS Interface Utilization: 3.3 % Flash Drive Utilization: 2.2 % SAS 10K Drive Utilization: 9.0 %
While increasing load before drive get 60-70% it will be close to this metrics:
Service Time with IO rate growth
And
Highest SAS interface utilization with IO rate growth
So you might be right considering 10k as for clear SAS performance (tho it still depend from a lot of factors) But you definitely not right saying 10k IOPS while there is SSD and Tiering involved.
As of Utilization
Utilization overview with IO rate
Edited. And this is not considering how powerful SVC is that stands on top of this :)
And I'm sure there is more than 70% read :) |

ISK IRON BANK
I Want ISK Corp
34
|
Posted - 2015.10.15 14:15:37 -
[96] - Quote
Question is:
Does it play Crysis on Max settings  |

Disco Dancer Dancing
State War Academy Caldari State
1
|
Posted - 2015.10.15 15:01:15 -
[97] - Quote
Sithausy Naj wrote:Disco Dancer Dancing wrote:Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?
Mate fair enough but you not calculated TCO of overall solution, In memory DB and processing require a little bit different approach and will need to rebuild whole concept of current infrastructure while more traditional gives ability to extended what they have without much changes. SSD and in memory might be solution while compared to traditional HDDs. As we can see they are going to use IBM SVC that is able to virtualize external storage that means - possibility to add either SSD to Storwize V5000 or whole IBM FlashSystem at FC backend level. While traditional SSD have standard SAS 2.0\3.0 interface this days, IBM Flash is using Direct PCI-E Flash Modules with FPGA based architecture allowing faster and direct acccess to storage media by itself. SVC allows to disable caching on the side of Storage Hypervisor meaning IO will be passing from Server HBA to Flash System with (basicly) 2 FC hops. Switch latency ~ 5-25us according to Brocade documentations (If I'm not mistaken) So there is a lot of things to consider and not ONLY technical I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).
There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.
HCI ain't a one-size fits all, henche the questions.
|

Raithius
Discrete Astrographic Reconnaissance Technologies Wrong Hole.
3
|
Posted - 2015.10.15 16:00:52 -
[98] - Quote
/me Prays for an end to "The socket was closed" dc's. |
|

CCP DeNormalized
C C P C C P Alliance
307

|
Posted - 2015.10.15 16:13:42 -
[99] - Quote
Disco Dancer Dancing wrote:I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).
There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.
HCI ain't a one-size fits all, henche the questions.
As a DBA who's just recently started to get invovled on the SAN storage side, everything you say its well beyond me :) But it's interesting!
Can you give some real examples of what you are talking about and not just buzz words? :)
Edit: ok, so looking here: http://purestorageguy.com/2015/03/12/hyper-converged-infrastructures-are-not-storage-arrays/
This seems to be how hadoop and these other similar systems work? It's a bunch of servers with local disks that sit behind some shared filesystem to distribute the data cross all the server nodes?
CCP DeNormalized - Database Administrator
|
|

Sithausy Naj
School of Applied Knowledge Caldari State
0
|
Posted - 2015.10.15 18:30:41 -
[100] - Quote
CCP DeNormalized wrote:Disco Dancer Dancing wrote:I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).
There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.
HCI ain't a one-size fits all, henche the questions.
As a DBA who's just recently started to get invovled on the SAN storage side, everything you say its well beyond me :) But it's interesting! Can you give some real examples of what you are talking about and not just buzz words? :) Edit: ok, so looking here: http://purestorageguy.com/2015/03/12/hyper-converged-infrastructures-are-not-storage-arrays/
This seems to be how hadoop and these other similar systems work? It's a bunch of servers with local disks that sit behind some shared filesystem to distribute the data cross all the server nodes?
You are right.
In roots, but it can be both bunch of servers or set of storage devices - server with drives, JBOD, NAS, SAN or any type of storage both local or shared or in cloud or in server or anywere.
In terms of in-memory processing it's a little bit different approach. Need RAM only or flash as direct access without any SAN/NAS/DAS storage attached.
If you decide to go with any type of fast storage (to be honest) you can use any behind SVC. Eiher SSD/Flash/or any other implementation - but all of them will need Fibre Chanel connectivity.
You guys, actually, in sweet spot with Storage hypervisor.
HCI is kind of implementation, in case you wanna look for one while staying with IBM ask that guys about Spectrum Scale, they should know it.
|

xrev
EVE Corporation 987654321-POP The Marmite Collective
6
|
Posted - 2015.10.15 19:25:37 -
[101] - Quote
The general answer on storage questions is "It depends..."
I see a lot of discussion about pure technical stuff, like spindles, raid-levels, cache, hops, latency etc etc. But it really depends on from what station you leave. Let's break it up for a moment (from my point of view)
If you want to avoid latency from computerlayer (servers) to storage array and storage network (San) you could use a number of Ssd's in every server, so that you have enough volume and iops to serve the application without (imo) the small delays. The downside to this, is that it's not very scalable and cost-effective. Not even talking about distributing the data to each server in need of it.
If you want a central storage supply with or without replication to another node, you get a easier to manage solution with higher volume, higher iops etc. where you can serve apropriate chunks to the computing layer. Downside is the mentioned latency and expensive networking. In my experience you choose fibrechannel over iScsi if you want the least amount of latency and network overhead. Yes, iScsi has a higher throughput, but if you take into account the overhead and possible retransmits and temporary storage buffer and recalculation of the packets, it's about the same speed as FC. FC goes just that extra mile for you if you need it.
Then we have the topic of latency; Are those milliseconds/nanoseconds really that important? Yes ofcourse, but of bigger importance is that you manage the full stack from application to storage altogether. You f*ck up the disk alignment? Needed iops go at least times two. Build yourself a fancy sql query that proves to reread the complete table 5 times in a row to get a selection? Your storage won't help you much there. In fact, poorly written software will be able to cripple your fancy storage array in less than no-time. So review the complete stack from application down to storage before blindly focussing complete on the storage array and network.
I haven't seen many examples where customers were able to congest the complete FC network, other than with large datastreams (backups, replication etc). Most issues normally come from too small frames, disk misalignment and fancy sql queries that need to come from disk instead of memory.
Last topic for this post is the question about the chosen storage solution. The chosen solution maybe a classical one but sometimes, that's just what you need. It's proven technology, where the needed expertise is easier to get than the challenging solutions (purestorage (love it btw), Tintri or SolidFire for example). The demands of CCP are high, but not as high like a big financial. The current storage leaders (HP, EMC, IBM, hell even NetApp) are able to reach that demand and like I said, it's proven and easier to get support on than the challengers.
So CCP, way to go looking at the full stack rather than chunk up the several layers. |

Disco Dancer Dancing
State War Academy Caldari State
2
|
Posted - 2015.10.15 21:02:42 -
[102] - Quote
A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.
Too answer, HCI do have some alignment in your claim towards Hadoop, or sort of. While it is a bunch of servers, with local disk (Mainly SSD with Cold-Data on HDD to get the volume), there a few key components to think about when talking about HCI, first, Hyper-Converged means different depending on what you are talking about, storage, compute, networking, but in essence it means that we combine two or more solutions into a single, scalable unit. Looking at datacenters this is mainly Compute and Storage.
This in essence means that once we scale, we scale both Compute and Storage linear (Both IOPS and Capacity). As we no longer have any dependencies to what a SAN controller can handle until it bottlenecks, that are out of concern, neither are we in theory not depended on the SAN-network and when it may become a bottleneck, since when we scale with a unit we also in theory scale the throughput on the "SAN-network" since we still need to have data protection of some sort with blocks in different nodes to tolerate failures.
Depending on vendor, a few utilize data-locality, meaning that they migrate the bits as close to the actual compute as possible, others depend on Low-Latency Connections (RDMA over Ethernet, InfiniBand or the like) as this is already invested in the ToR switches.
We also raised some concern regarding scalable and cost-effective on a solution utilizing local storage layer, which I would say is untrue. HCI by design is built for scalability. You can argue and say that HCI by design is: - Predictable as every unit adds Compute, Memory, Storage Capacity and Storage IOPS. - Repeatable as HCI is built from the beginning to be clustered by design you get a "Single Pane of glass" management and monitoring for the whole cluster, including Storage. - Scalable as we combine together the above, we can predict how we scale with each node and since the solution is built on being clustered and repeatable we can easily scale up without much intervention.
A key point to remember is that the same person keeping track on the compute-layer now also keeps track on the storage-layer. This in most cases means lower TCO. Just to give a pure figure on what is achievable from a rather standard setup from one of the vendors: 6U, 2 nodes per 2U Random Read IOPS : 30.000/node Random Write IOPS : 27.300/node
6 Nodes giving a total of: Random Read IOPS : 30.000*6 = 180,000 Read IOPS Random Write IOPS : 27.300*6 =163.800 Write IOPS
What about usable storage? Roughly 70TB counting with a redundancy factor of 2. Now, as in all solutions, as soon as we start hitting Cold Data, those figures will go down, but in those cases I would claim that calculations on an Daily working-set of data has not been done correctly.
As in all solutions HCI also got a few caveats, one mainly being that workloads in most Enterprises doesnGÇÖt look the same, which means that a few needs a lot of memory, a few needs a lot of CPU, and a few needs a lot of capacitor and/or IOPS. But as we try and scale linear, with both CPU, Memory and Storage for each unit something might be skewed. A few vendors then have more storage Heavy options, or vice versa on Memory/CPU. But seeing as in this case the workloads are known, and should be the same over the whole solution, then it should be doable to find a unit that scales well for all aspects.
As said before, HCI doesnGÇÖt fit all solutions and both discussed have their pros and cons. But since you from the looks of it can get a smaller solution from the start, with lower TCO, complexity that rather scales when you need and does it with ease, but still give you the redundancy and performance directly from the start but in a smaller suite so to say.
And just to clarify, Tintri, Purestorage, SolidFire and the like are not HCI solutions as a few of them should be seen as AFA solutions etc. GridStore, SimpliVity, Nutanix, EVO:RAIL (I bet that I miss out on several others) should be seen as HCI solutions as they combine several components into one, scalable unit.
*Ninja Edit* Most HCI solutions are appliances, meaning that they have very low overhead when it comes to handling the whole stack, including automation, self-healing and the like. |

Numa Pompilious
Viziam Amarr Empire
1
|
Posted - 2015.10.15 21:05:01 -
[103] - Quote
I currently work with a very similar configuration. HS22 blade centers, IBM FC storage, cisco switching and routing, VMware ... currently working on integrating hypervisor though.
It's sexy ... VMotion is a winner, though I'm unsure if I am going to stay with IBM blade centers ... next upgrade i have allocated is for EMC storage vice IBM ... unless, of course, CCP depletes the worlds supply of SAS drives with TQ TIII
|
|

CCP DeNormalized
C C P C C P Alliance
307

|
Posted - 2015.10.16 10:07:15 -
[104] - Quote
Disco Dancer Dancing wrote:A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.
Thanks for crazy details Dancer, I appreciate the time spent in these repliesl!
Can you throw out a ball park $$ figure for that setup?
So can you run windows servers and such on this stuff? Can I run my MS SQL Cluster on top of this? Do I just carve out luns as with a typical SAN and present them to the cluster?
In which case how would that stack look? There would be say 6U of applicances - plus now the hardware for the windows cluster/DB (or does that run on the appliances as well?)
Cheers!
CCP DeNormalized - Database Administrator
|
|

Lucian Thorundan
House Of Serenity. Suddenly Spaceships.
10
|
Posted - 2015.10.16 10:43:42 -
[105] - Quote
CCP FoxFour wrote:Zand Vor wrote:I'm a super network geek....I really want to know what router, load balancer, and switch platforms you switched to since it sounds like you ditched Cisco.
Oh well, this is a great article and it's awesome to see just a glimpse of how all this infrastructure is designed to work together.
Thank you! Will ask if they mind sharing said information.
+1 i would be very interested as a networking career nerd as well as to the answer for this, a diagram with no models on it is like a stripper that doesn't take their clothes off.
I presume the answer is F5 LB's as well, but i've seen Netscalers around in a lot of big deployments and they cost out better generally so it may well be that way too.
Also, can the dbAteam please tell CCP Stephanie to change that name to CCP StephsQL (or something more creative than my quick thought) |

Disco Dancer Dancing
State War Academy Caldari State
3
|
Posted - 2015.10.16 11:59:28 -
[106] - Quote
CCP DeNormalized wrote:Disco Dancer Dancing wrote:A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.
Thanks for crazy details Dancer, I appreciate the time spent in these repliesl! Can you throw out a ball park $$ figure for that setup? So can you run windows servers and such on this stuff? Can I run my MS SQL Cluster on top of this? Do I just carve out luns as with a typical SAN and present them to the cluster? In which case how would that stack look? There would be say 6U of applicances - plus now the hardware for the windows cluster/DB (or does that run on the appliances as well?) Cheers! Most of the HCI solutions combines Storage and Compute into one unit, and they utilize virtualization on top of it to still have the flexibility of Technologies like vMotion and others. Without going into detail, several vendors have options with either VMware, Hyper-V, KVM etc. So we are not talking about adding an extra layer of hardware above the HCI solution where our SQL and other workloads live since this is already integrated (Although, it is possible from a few vendors to use for an example IBM blades as compute where SQL would live, but then again this wouldn't really be HCI and here a traditional SAN would be a better fit from my Point of View, but during a transition phase this is possible)
So to answer the question, yes your Windows Servers, SQL and the like will reside inside this 6U, 6 Node cluster, with each node having 512GB of RAM along with 28 Cores @ 2.6GHz. They would however have to be virtualized inside either VMware, Hyper-V or the like to be able to utilize the platform as virtualization is a key-factor in most HCI solutions.
As I'm no sales-rep nor from any vendor I can't really give any correct $$$ figures, and I'm not sure how much I can share from those figures I have from vendors, but if I salt them a bit and add a few $$$ we can have a ballpark figure around 620-680k $ for the solution in this discussion. This is with redundancy inside one datacenter, being able to fail both nodes, disks and blocks and option to synchronize to another 6U solution in case of disaster (As an example, Iceland) To summarize we are talking about per node: 512GB RAM 28 Cores @ 2.6GHz Random Read IOPS : 30.000 Random Write IOPS : 27.300
Total solution: 3TB RAM 168 Cores @ 2.6GHz Random Read IOPS : 30.000*6 = 180,000 Read IOPS Random Write IOPS : 27.300*6 =163.800 Write IOPS 70TB of capacity.
HCI solutions are on some basis a higher CapEx investment if you size a traditional solution and compare the performance and price, but since they include both Compute and Storage, fewer personnel to maintain, less Rackspace, less energy consumption, less cooling needs and less complexity as we piggy-back on the investments done on the ToR switches instead of a dedicated network for the storage layer, combining that gives a lower TCO for most of the part. Also instead of being needed to pat the hardware to make sure itGÇÖs okay those guys can instead focus on other core business that helps the company.
Now we havenGÇÖt really discussed any specific vendor, technology behind it (Data-locality, compression, deduplication, MapReduce etc.) nor havenGÇÖt touched all caveats, pros/cons and the like. There are several vendors out there that IGÇÖm sure would happily talk with you, and since you seem to have in-house expertise already on traditional setups you should be able to compare them and find what best fits your needs. But seeing that you know what your workloads are, how they perform and how you need to scale, you should be able to start small and only scale when you need, as an example when Dust hits PC, Valkyrie is released or the like, and if needed you can also scale down as there are no hardware tied together between the blocks as this is done in software instead. Keep in mind that HCI is by design meant to be clustered, and even the smallest solutions are by design fault-tolerant and gives you the redundancy needed even at the smallest setups.
|
|

CCP DeNormalized
C C P C C P Alliance
307

|
Posted - 2015.10.16 12:49:40 -
[107] - Quote
Disco Dancer Dancing wrote:
So to answer the question, yes your Windows Servers, SQL and the like will reside inside this 6U, 6 Node cluster, with each node having 512GB of RAM along with 28 Cores @ 2.6GHz. They would however have to be virtualized inside either VMware, Hyper-V or the like to be able to utilize the platform as virtualization is a key-factor in most HCI solutions.
Great info again DiscoD! Really gives me a good idea of what this HCI is all about, massives thanks for the time spent to share knowledge!
CCP DeNormalized - Database Administrator
|
|

Tholuse
Imperium of Suns Nightwatchers.
0
|
Posted - 2015.10.16 16:18:19 -
[108] - Quote
Good News ... for new Hardware Enviroment 
I was this week in Barcelona in the VMware VMworld 2015 and have talk with many Friends, co-workers and Costumers was playing EVE. All was happy to the Hardware Change and use VMware Technology for EVE Online.
I have many Costumers was use IBM SVC Technology with VMware vSphere ... was very Fast System.
I Hope the Transformation Steps ... virtualisation from existing Systems works Fine.
Im sure the change to VMware Virtualisation Technology ... was the right Step ... go to new Expirence with EVE
EVE FOREVER
Tholuse *in real life System Engineer for VMware,VCP,VTSP and Storage Consultant*
I have Idea ... we make VMware User Group call VMEVE or VEVE and next year on VMworld Barcelona 2016 make Special EVENT for EVE Players was visit this Event. |

Rillek Ratseye
Abysmal Gentlemen We Didn't Mean It
3
|
Posted - 2015.10.16 16:43:18 -
[109] - Quote
You need to be very careful with the EasyTier on the Storwizes and SVC.
If I was configuring that stuff I'd dedicate some of the SSD space for the DB.
If you dont do this, when you failover to the secondary SAN, nothing will be tiered right, as the vDiskmirror will do all reads from one v5000 and the other would do writes only. - So the second v5000 would detect hotspots based on write workload only. And you really want it based on read workload. If just the SVC could load balance reads across both mirrored copies...o O (one can only dream of the future!)
Also, why did you opt for the x240 compute node? the x222 node seems more fit for some of the stuff, and you get double density at 28 nodes per flex chassis. Internal disks or the lack of 4th memmory channel per cpu is the only reason I can imagine. - But they might of course be significant.
And lastly, you bought Lenovo stuff, not IBM! The v5000 is a lenovo product these days. The Flex chassis is Lenovo now. (only the Pure systems are still IBM)
And as someone wrote earlier in the thread, it seems the players have all the knowledge you need to run TQ! I'm personally a VMWare/IBM Storage consultant, and work with Storwizes, SVC's, Flex'es etc. daily.
/Ratseye
|

xrev
EVE Corporation 987654321-POP The Marmite Collective
8
|
Posted - 2015.10.16 16:49:53 -
[110] - Quote
We should create a corp for all the storage/virtualization workers... IOPS fleets and some iScsi congestion on the gate 
@Ccp, build some inter-station darkfiber connections ;) |

Sithausy Naj
School of Applied Knowledge Caldari State
1
|
Posted - 2015.10.16 21:42:18 -
[111] - Quote
Rillek Ratseye wrote: And lastly, you bought Lenovo stuff, not IBM! The v5000 is a lenovo product these days. The Flex chassis is Lenovo now. (only the Pure systems are still IBM)
They are not :) |

ShyLion
Side Effect Gaming Violent Declaration
0
|
Posted - 2015.10.20 20:50:48 -
[112] - Quote
Posting from my cell in Portugal (while on vacations from the US)...
Before finalizing anything, I would suggest a POC using the new Cisco blade chassis', F5 load balancing (LTM and for geographical DBA balancing GTM) and ASM/firewall products, and to top off the storage use EMC's XtremeIO storage. As a systems engineer for over 20 years with experience and certificates in VMware/Redhat virtualization, Dell, IBM, and now Cisco products as Well as automation on the fly with no interruption I believe their is no equal at this point in time. If you have any questions you can contact me. If you have any issues contacting sales reps let me know, I can facilitate vendor contacts. From a player and Engineering perspective, this a great but costly alternative with a lot of flexibility and great redundancy depending on implementing. |

bbb2020
Carebears with Attitude
87
|
Posted - 2015.10.20 21:29:27 -
[113] - Quote
Don't know if anyone have asked before CCP but can I have your "old" stuff? |

Merior
Class D In Space Weyr Syndicate
9
|
Posted - 2015.10.21 04:08:06 -
[114] - Quote
Will the server have a choice of skins?  |

Indahmawar Fazmarai
4054
|
Posted - 2015.10.21 13:50:36 -
[115] - Quote
Q: Indahmawar Fazmarai wrote:Out of curiosity... where will come from all the additional players needed to use/justify such powerful hardware? 
A: Those players will hold free to play accounts, of course.
That's why they need such massive hardware for TQ-III even as TQ-II is running at 20% capacity...
(/tinfoil hat)
I accept alternate explanations. Tell me that TQ-III is just the smallest it can be and all the extra power comes from technological evolution and I'll accept it... tell me that it's because of Valkyrie / Gunjack server needs and i'll accept it too...
CCP Seagull: "EVE should be a universe where the infrastructure you build and fight over is as player driven and dynamic as the EVE market is now".
62% of players: "We're not interested. May we have Plan B, please?"
CCP Seagull: "What Plan B?"
|

Indahmawar Fazmarai
4054
|
Posted - 2015.10.21 13:51:45 -
[116] - Quote
bbb2020 wrote:Don't know if anyone have asked before CCP but can I have your "old" stuff?
IIRC, they raffled some used TQ blades last Fanfest... 
CCP Seagull: "EVE should be a universe where the infrastructure you build and fight over is as player driven and dynamic as the EVE market is now".
62% of players: "We're not interested. May we have Plan B, please?"
CCP Seagull: "What Plan B?"
|

Insane TacoPaco
The Scope Gallente Federation
0
|
Posted - 2015.10.21 16:49:28 -
[117] - Quote
That's a fair amount of SQL Server Enterprise licenses. I bet your MS EA rep loves you guys  |
|

CCP Gun Show
C C P C C P Alliance
24

|
Posted - 2015.10.24 20:51:19 -
[118] - Quote
CCP Gun Show wrote:Disco Dancer Dancing wrote:Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?
Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations
L1 cache reference 0.5 ns Branch Mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1KB with Zippy 3,000 ns Sent 1KB over 1Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1MB sequentially from memory250,000 ns 0.25 ms Round trip within datacenter 500,000 ns 0.5 ms Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory Disk seek 10,000,000 ns10 ms, 20x datacenter round trip Read 1MB sequentially from disk 20,000,000 ns20 ms, 80x memory, 20x SSD Send packet CA -> Netherlands -> CA150,000,000 ns150 ms
Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).
On another note, if we look on a mainstream enterprise SSD we can find a few figures: 500MB/s Read and 460MB/s Write If we put these into the following calculation to see when we saturate a traditional storage network: numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))
We get the following table: Network BW SSDs required to saturate network BW Controller ConnectivityAvailable Network BWRead I/OWrite I/O Dual 4Gb FC 8Gb == 1GB 2 3 Dual 8Gb FC 16Gb == 2GB 4 5 Dual 16Gb FC 32Gb == 4GB 8 9 Dual 1Gb ETH 2Gb == 0.25GB 1 1 Dual 10Gb ETH 20Gb == 2.5GB 5 6
This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.
We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.
There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?
Wow that's one serious question right there , hope you will come to fanfest 2016 to talk about latency  I just saw this on my mobile and allow me to get you a proper answer in a day or two Excellent stuff !
I promised a answer in a day or two which i failed to delivery on , i apologize for that
the reasons are the thread took a interesting turn but regardless i have been working with our LENOVO (IBM name is stuck in my head sorry lenovo marketing) yeah been working with our partners and my storage team on the best response to this.
so little more time but i have not forgotten that i owe you all a response |
|

Freelancer117
So you want to be a Hero
371
|
Posted - 2015.11.01 22:42:09 -
[119] - Quote
In another 5 years time, please create a "SOL" server and revitalize the New Eden Wormhole so we can visit the Terran System 
The players will make a better version of the game, then CCP initially plans.
http://eve-radio.com//images/photos/3419/223/34afa0d7998f0a9a86f737d6.jpg
The heart is deceitful above all things and beyond cure. Who can understand it?
|
| |
|
| Pages: 1 2 3 4 :: [one page] |