Pages: 1 2 3 4 5 [6] :: one page |
|
Author |
Thread Statistics | Show CCP posts - 5 post(s) |
branodn lee
|
Posted - 2006.05.26 20:49:00 -
[151]
thank you oveur for giveing me a reply. thats all i want and im sure alot of others in eve are wanting is just some answers on whats up and not to be in the dark. i know you all at ccp are working your buts off for the last while. i give you all props for everything thats been done.
|
Shadris
|
Posted - 2006.05.26 20:52:00 -
[152]
Originally by: branodn lee well its not just a freaky node death. when its starting to happen every week then there is a problum that makes this happen so freaky has nothen to do with it.
You should have been here in the first couple of weeks after release. The nodes were up and down like yoyo's. Not to mention a server roll back. This is just a minor inconvenience and nothing more.
|
Jim McGregor
|
Posted - 2006.05.26 20:53:00 -
[153]
I think the cluster is cool because it gives me money after every crash.
--- The Eve Wiki Project |
lofty29
|
Posted - 2006.05.26 20:56:00 -
[154]
Lol ---------------------------
I wanna be dev-jacked |
Stev Tomias
|
Posted - 2006.05.26 20:58:00 -
[155]
Eh I saw their was a problem bringing the server down in 4ish min from my logon so I set a long term skill to train untill they can get things back to normal
|
Elve Sorrow
|
Posted - 2006.05.26 21:02:00 -
[156]
More crashes please. Ensure atleast 10minutes between each though, hard to run a complex in shorter time.
|
KillmAll187
|
Posted - 2006.05.26 21:09:00 -
[157]
Seems like memory leak is back again. No computer genious but my page file was over a gig yesterday. Needless to say I crashed. Got back in eve pagefile went down to 411k.
|
|
Valar
|
Posted - 2006.05.26 21:09:00 -
[158]
A little post-mortem.
At 20:01 we lost all nodes except 13 and all players were dropped from the cluster. As players started to log in, resources got remapped and reloaded.
As soon as I was able to get to my computer, and connect to the cluster to find out what was wrong, I initiated an emergency reboot, but at that time we had over 13k players running on only 13 nodes Thats incredible that the 13 nodes survived the load of 13.000 players logging in.
As soon as the cluster was down and cleanup tasks were done I started the cluster again. The server was up and accepting connections less then 20 minutes after the original node crash and less then 10 minutes after I arrived.
This incredible rebooting time is due to the new SQL hardware and a hotfix that fixes distributed compression of serverlogs. After getting two fixes I have been waiting for, one with the patch and one as a hotfix yesterday I was able to change the accept delay of the cluster to 2 minutes instead of the 10 minute wait it was before and increase the throttling speed by 300 connection per minute. The fix for distributed compression also makes cleanups after shutdowns much quicker.
Well, since I started the cluster back up I've been looking into what happened. While I can't be exacly sure what happened at this point the serverlogs show what looks like a mass disconnection between nodes internally... however the logs also say that the connections being dropped weren't actually supposed to be there...
I guess I'll spend a big part of the evening playing detective. ------ Valar Database admin - Server operations team CCP Games How to write a good bugreport |
|
Raven Aure
|
Posted - 2006.05.26 21:11:00 -
[159]
Originally by: Valar A little post-mortem.
At 20:01 we lost all nodes except 13 and all players were dropped from the cluster. As players started to log in, resources got remapped and reloaded.
As soon as I was able to get to my computer, and connect to the cluster to find out what was wrong, I initiated an emergency reboot, but at that time we had over 13k players running on only 13 nodes Thats incredible that the 13 nodes survived the load of 13.000 players logging in.
As soon as the cluster was down and cleanup tasks were done I started the cluster again. The server was up and accepting connections less then 20 minutes after the original node crash and less then 10 minutes after I arrived.
This incredible rebooting time is due to the new SQL hardware and a hotfix that fixes distributed compression of serverlogs. After getting two fixes I have been waiting for, one with the patch and one as a hotfix yesterday I was able to change the accept delay of the cluster to 2 minutes instead of the 10 minute wait it was before and increase the throttling speed by 300 connection per minute. The fix for distributed compression also makes cleanups after shutdowns much quicker.
Well, since I started the cluster back up I've been looking into what happened. While I can't be exacly sure what happened at this point the serverlogs show what looks like a mass disconnection between nodes internally... however the logs also say that the connections being dropped weren't actually supposed to be there...
I guess I'll spend a big part of the evening playing detective.
Much <3 and all the best in your dective work. You guys really do spoil us ungrateful lot sometimes. ______________________ 106 days and still a hijack virgin... Cherry popped! ~kieron Kieron... I have some bad news... |
Jim McGregor
|
Posted - 2006.05.26 21:14:00 -
[160]
Originally by: Valar
Well, since I started the cluster back up I've been looking into what happened. While I can't be exacly sure what happened at this point the serverlogs show what looks like a mass disconnection between nodes internally... however the logs also say that the connections being dropped weren't actually supposed to be there...
Even hackers want to play eve, appearently.
Good work with the updates and fixes... very cool with speedy reboots!
--- The Eve Wiki Project |
|
branodn lee
|
Posted - 2006.05.26 21:18:00 -
[161]
THANK YOU valar for you post. thats 2 devs that have posted in here now and i would like to say thank you very much for leting us know whats going on.
|
Vincent Gaines
|
Posted - 2006.05.26 21:22:00 -
[162]
Edited by: Vincent Gaines on 26/05/2006 21:23:02
Originally by: Oveur
Originally by: branodn lee ok lets just say whats really happening here. ccp is spending so much time with the new china server that we are getting the very short end of the stick. this is the 2nd node death in 2 weeks and if im right the one last week was on the same day as this one. yet ccp has said nothen about what the deal is or if they are looking into it. ever since they started working on the other server we here on TQ are getting left in the dark and not told anything about the problums our server is haveing. so please ccp, devs, gms or anyone that can give a reason for why this node death is happening everyone week now would be nice.
Most of it is actually due to our extensive hardware upgrades done to Tranquility in the last months. Practically everything is new or has been upgraded in the last three months - covering both hardware and software.
New stuff development for Tranquility suffering because of China? Yes, I pointed that out some time ago.
Tranquility stability and fixing suffering from China? No. It's growing pains. EVE doubled in the last year.
What drew us down today, that's still being investigated, first order of business was to get the business back in order
STP failure FTW?
edit: IAofficialReply
|
Julia Reave
|
Posted - 2006.05.26 21:29:00 -
[163]
Valar, thanks for quick information. Keep us posted ;)
|
|
Valar
|
Posted - 2006.05.26 22:37:00 -
[164]
An update. I found out what caused this.
After startup today, a proxy process went into a state where it kept erroring. At 20:01 it started dropping its connections and each connection to a sol server that dropped caused an event that broadcast a node death for the node that was connected to the disconnected transport. So the node death detector on a proxy process that was broken caused the server to kill almost all its nodes.
I'll forward this info to PapaSmurf and porkbelly and they will hopefully find a way to prevent this from happening in the future. Thankfully, proxy deaths are very rare. ------ Valar Database admin - Server operations team CCP Games How to write a good bugreport |
|
Michiyo Daishi
|
Posted - 2006.05.26 22:48:00 -
[165]
Originally by: Valar Thankfully, proxy deaths are very rare.
ummrare eh? XD someone get the probability mathematicians in here!
Seriously though, great work :D CCP at it again!
Now where's my cookie?! XD -
*posts posted are not official statements of EVEnews.com, and are the poster's own* |
Oron
|
Posted - 2006.05.26 22:52:00 -
[166]
Just name me one single MMORGP where a database admin stepps into a downtime rant and explains what happens? Never saw this before - except in eve.
Thx ya and now get those servers happy again ASAP! ****** wipe*
Need drugs? |
Felix Cole
|
Posted - 2006.05.27 00:38:00 -
[167]
When I read your replies to this I feel like im reading a mission analysis from a NASA mission. It sounds cool to me, and its nice to get a cut and dried response rather then jokes.
|
|
Valar
|
Posted - 2006.05.27 00:57:00 -
[168]
Originally by: Michiyo Daishi
Originally by: Valar Thankfully, proxy deaths are very rare.
ummrare eh? XD someone get the probability mathematicians in here!
Seriously though, great work :D CCP at it again!
Now where's my cookie?! XD
Proxy deaths are very rare and in 50% of cases are very visible to the players due to limitations of our load balancer. Sol node deaths are however more common. ------ Valar Database admin - Server operations team CCP Games How to write a good bugreport |
|
Calshim
|
Posted - 2006.05.27 01:32:00 -
[169]
Much respect to the Dev team especially Valar for the response on this.
How many other MMO's can you name that would have the Dev Team posting at 1am telling the player base what was going on ?
Most would have an admin reboot the server and pray it holds till they get into the office next morning, and other than a box standard "we are looking into the problem" you'd get bugger all communication.
Thats why I continue to play EvE
CCP giving Techie types their fix since 2003. ------------------------------------------ BiteFight |
|
|
|
Pages: 1 2 3 4 5 [6] :: one page |
First page | Previous page | Next page | Last page |