Pages: 1 2 3 4 5 6 7 8 9 [10] 11 .. 11 :: one page |
|
Author |
Thread Statistics | Show CCP posts - 12 post(s) |
|
CCP Valar
|
Posted - 2009.12.06 00:24:00 -
[271]
Hi everyone.
I'm Valar, the senior virtual world database administrator for EVE Online.
I'd like to start by offering sincere apologies for the outage tonight. At first we believed we had a network problem, but it quickly became clear that it was the database that was the actual issue. I seem to have not communicated it properly to the community team that there was not a network problem, so they continued talking about those in this thread and that was a mistake on my part.
The problem we had is not connected to the Dominion expansion. It is actually exactly the same problem as caused the crash we had last Sunday at 23:49, before the deployment of the expansion. This is a hard crash in the SQL Server engine and the SQL Server does not fail over automatically after the crash, so our high availability system does not work in this case. We have a high priority case open with Microsoft and they have escalated it to the SQL Server development team and they are looking into the problem for us.
For the people that were wondering about our backup servers and redundancy. We do have a standby SQL Server that is an exact copy of our main one, in a shared storage failover configuration. If one fails, the other one should take over. The failover is not seamless and in the past it has always crashed EVE.
The game simulation is run on a cluster of machines, 20 CPUs for proxies and over 200 for solarsystem simulation. We have some machines not running any solarsystems, so if one machine goes down, the solarsystem is mapped to another machine. All client connections to the systems are lost while the system is remapped. Node deaths in EVE are very rare nowadays so this does not affect a lot of people. Most disconnects are due to external factors, like ISP peering problems. ---- Virtual World Database Administrator Operations department CCP Games |
|
Rosmary
|
Posted - 2009.12.06 00:30:00 -
[272]
Thank you for the explanation and apology accepted.
It is good to see answers and things being done.
Thumbs up. But next time, maybe, just maybe, release these info so people could understand and become less frustrated.
I still love CCP.
Don't flame me. Bye
|
JaseNZ
Gallente
|
Posted - 2009.12.06 00:30:00 -
[273]
Edited by: JaseNZ on 06/12/2009 00:32:54 Thanks a lot for the detailed explaination and apology Valar.
It is appreciated that you've taken the time to come on here and tell us yourself.
Originally by: Rosmary Thank you for the explanation and apology accepted.
It is good to see answers and things being done.
Thumbs up. But next time, maybe, just maybe, release these info so people could understand and become less frustrated.
I still love CCP.
Don't flame me. Bye
As Wrangler said, he didn't want to disturb the techs while they were working on the problem. As I said in reply to that earlier, an update after the fact makes more sense, than trying to get one during the problem.
Thanks again Valar, if I am ever in Iceland, I shall shout you a beer! lol.
|
px3118
|
Posted - 2009.12.06 00:31:00 -
[274]
Originally by: CCP Valar Hi everyone.
I'm Valar, the senior virtual world database administrator for EVE Online.
I'd like to start by offering sincere apologies for the outage tonight. At first we believed we had a network problem, but it quickly became clear that it was the database that was the actual issue. I seem to have not communicated it properly to the community team that there was not a network problem, so they continued talking about those in this thread and that was a mistake on my part.
The problem we had is not connected to the Dominion expansion. It is actually exactly the same problem as caused the crash we had last Sunday at 23:49, before the deployment of the expansion. This is a hard crash in the SQL Server engine and the SQL Server does not fail over automatically after the crash, so our high availability system does not work in this case. We have a high priority case open with Microsoft and they have escalated it to the SQL Server development team and they are looking into the problem for us.
For the people that were wondering about our backup servers and redundancy. We do have a standby SQL Server that is an exact copy of our main one, in a shared storage failover configuration. If one fails, the other one should take over. The failover is not seamless and in the past it has always crashed EVE.
The game simulation is run on a cluster of machines, 20 CPUs for proxies and over 200 for solarsystem simulation. We have some machines not running any solarsystems, so if one machine goes down, the solarsystem is mapped to another machine. All client connections to the systems are lost while the system is remapped. Node deaths in EVE are very rare nowadays so this does not affect a lot of people. Most disconnects are due to external factors, like ISP peering problems.
All Hail CCP!
|
Tubbie
|
Posted - 2009.12.06 00:32:00 -
[275]
Edited by: Tubbie on 06/12/2009 00:36:34
Originally by: Rosmary
If that's so, if you went to a doctor and his response time is 2hrs but you have to pay 30grand, is that reasonable at all.
The difference being that you waited 30 minutes, payed 15 dollars for the entire month and health care is just a tiny bit more important than a game.
Otherwise a great comparison...
Originally by: Rosmary Response time and money cannot be linked in any context whatsoever.
They can, and are. If you have a problem with your ISP you have to call them, hold for some time, get an underpaid and undereducated employee on the line to explain your problem to. They may or may not escalate the problem to the people actually supposed to fixed the issue. If the problem is somewhat serious it's unlikely to be fixed within 24 hours. This 'service' costs you about 20/30 euro a month.
A small business paying 200/300 euro a month can get the same problem fixed within hours.
|
Kile Kitmoore
|
Posted - 2009.12.06 00:36:00 -
[276]
Thanks CCP Valar but I just hope you were not to late in your explanation before people started hurting themselves or punching cute fuzzy kittens in the back of the head.
|
Mioelnir
Minmatar Meltdown Luftfahrttechnik
|
Posted - 2009.12.06 00:41:00 -
[277]
As always, Valar delivers.
And you really shouldn't have undergone that facial surgery, your Jovian self was much better.
|
|
CCP Valar
|
Posted - 2009.12.06 00:44:00 -
[278]
Originally by: Mioelnir As always, Valar delivers.
And you really shouldn't have undergone that facial surgery, your Jovian self was much better.
I needed to have my face modified so I could infiltrate your ranks
(And as always, because some people take everything seriously... I'm obviously joking :)) ---- Virtual World Database Administrator Operations department CCP Games |
|
Louis deGuerre
Gallente The Rise of The Dragon Knights Void Alliance
|
Posted - 2009.12.06 00:56:00 -
[279]
I am going to stop planning ops on saturday night.
Ah well. Sol: A microwarp drive? In a battleship? Are you insane? They arenĘt built for this! Clear Skies - The Movie ROTDK is recruiting
|
Tuscun Nebular
|
Posted - 2009.12.06 01:26:00 -
[280]
Edited by: Tuscun Nebular on 06/12/2009 01:26:49 Save face, And the stories go on and on!
|
|
Lekegolo Khanid
|
Posted - 2009.12.06 01:28:00 -
[281]
It would appear Abudban is jumping machines more frequently then Will Smith in I Robot. This is messing with my virtual livelihood, make it stop plox.
kthxbai. |
Zheng Guo
Community against Justice INDUSTRIAL REV0LUTI0N
|
Posted - 2009.12.06 01:32:00 -
[282]
Edited by: Zheng Guo on 06/12/2009 01:36:36
I lost a Rorqual
to prepare for the Standart CCP answer "we had no Problems with the Server"
i have screenshots of all the posts here, and i have a Video of my crying in front of a webcam also i will give my dog no food for 2 Days and when he looks real sad i make also pictures (emo factore) Linkage
if you not reimburst i will send both to UNICEF
|
Tuscun Nebular
|
Posted - 2009.12.06 01:48:00 -
[283]
Originally by: CCP Valar
Originally by: Mioelnir As always, Valar delivers.
And you really shouldn't have undergone that facial surgery, your Jovian self was much better.
I needed to have my face modified so I could infiltrate your ranks
Its good to see some CCP members ave a sence of humor :)
(And as always, because some people take everything seriously... I'm obviously joking :))
|
Presson
|
Posted - 2009.12.06 02:33:00 -
[284]
Howdy Yall,
I was playing eve just fine then the server went down and i have not yet been able to get the sighn in screen to come up I was wondering how much more longer it is going to be before I am able to play eve again. I have not been able to get into eve for about 4 hours now.
any help on this would be greatfull. -Presson-
|
RC Denton
|
Posted - 2009.12.06 08:32:00 -
[285]
Originally by: CCP Wrangler The following is from CCP Valar, who posted below:
Hi everyone.
I'm Valar, the senior virtual world database administrator for EVE Online.
I'd like to start by offering sincere apologies for the outage tonight. At first we believed we had a network problem, but it quickly became clear that it was the database that was the actual issue. I seem to have not communicated it properly to the community team that there was not a network problem, so they continued talking about those in this thread and that was a mistake on my part.
The problem we had is not connected to the Dominion expansion. It is actually exactly the same problem as caused the crash we had last Sunday at 23:49, before the deployment of the expansion. This is a hard crash in the SQL Server engine and the SQL Server does not fail over automatically after the crash, so our high availability system does not work in this case. We have a high priority case open with Microsoft and they have escalated it to the SQL Server development team and they are looking into the problem for us.
For the people that were wondering about our backup servers and redundancy. We do have a standby SQL Server that is an exact copy of our main one, in a shared storage failover configuration. If one fails, the other one should take over. The failover is not seamless and in the past it has always crashed EVE.
The game simulation is run on a cluster of machines, 20 CPUs for proxies and over 200 for solarsystem simulation. We have some machines not running any solarsystems, so if one machine goes down, the solarsystem is mapped to another machine. All client connections to the systems are lost while the system is remapped. Node deaths in EVE are very rare nowadays so this does not affect a lot of people. Most disconnects are due to external factors, like ISP peering problems. ---- Virtual World Database Administrator Operations department CCP Games
Update: We have now stopped throttling logins.
Update: Tranquility is now up, but logins to the server are being throttled due to the issues we are experiencing.
Update: The server is now expected to come up at 22:20 GMT/UTC.
We are bringing the server back up after experiencing network issues. The server is estimated to be up at 21:50 GMT/UTC.
Welcome to the wonderful world of the non-yielding scheduler fault problem.
|
Shrug123
|
Posted - 2009.12.06 08:58:00 -
[286]
Thanks Valar for the clarification. I seem to have been one of the ones getting in before turning in, and retrieving my ship which was scampering about, I don't know where!
Seems a lot of folks still have issues late into the night.
|
Pteranodon
Caldari Stealthfield Ihatalo Cartel Navy
|
Posted - 2009.12.06 10:42:00 -
[287]
Edited by: Pteranodon on 06/12/2009 10:51:56 Some of my thoughts on the deployment of new patches and if you don't agree- well we are not arguing that, just be democratic.
Many years ago I worked with a utility company who supported tens of thousands of customers.
They had a whole floor of a large building stuffed full of computers effectively a parallel test infrastructure. Every single application used by that company was tested to destructive down to the level of DLL analysis & conflict tables being mapped where various applications had known issues. If any changes or updates to the infrastructure where planned they were tested months in advance. Upgrades were subsequently smooth & problems were very few.
My take on the Eve update patch situation is not enough testing with the user base being used as a feedback machine to correct issues. I have the opinion that the infrastructure as it stands at the moment cant support 50K users logged in at once. A history of all the major crashes seems to point at when there are large numbers of users in game. When do we get large numbers of users?- after patch day of course.
As for the databases- I'm no expert but there has to be something better than SQL to manage the data-lets face it it has legendary reputation for being flaky.
Why have content updates twice yearly. Why not nine months development & three months testing. Seems better than rushing out hasty patches of new flaky content?
My final comment is that I pay to play Eve not sit in a queue of 500 waiting to login & I have paid enough subscriptions to earn my point of view.
The test server is a mute point because even when we test & provide feedback you at CCP will do your own thing anyway so get rid of it for public access & use it yourselves as Eve as it stands could do with an awful lot of testing.
|
J Random
Tax Protestants
|
Posted - 2009.12.06 11:48:00 -
[288]
Edited by: J Random on 06/12/2009 11:48:44 Bah .. I still don't understand why companies with real internet services use half ass solutions like a clustered Microsoft SQL environment .... go buy Oracle on AIX or Solaris like a real company ... there is a reason you don't see AMEX, Visa, or any large EDI like structure running Microsoft SQL to support their public facing front ends (or mission critical back ends). Hello you think Microsoft is running its Peoplesoft or SAP systems on Microsoft SQL .. guess what, surprise surprise, they aren't.
Jesus, use the right tools for the right job.
|
TraderJade
Caldari Secure Production Research and Trading
|
Posted - 2009.12.06 12:20:00 -
[289]
oh ffs, everytime theres a problem out come the microsoft bashing.
Give it a rest, you don't think they already looked at the rest and decided ms offered the best deal? visa, etc aren't all running a single chard mmorpg either and i'm sure both ccp and microsoft will be looking to fix whatever is up asap.
|
Tzestocteru
|
Posted - 2009.12.06 12:28:00 -
[290]
my character is stucked with a "fetching mail" message. Tried to restart, same thing ( without requesting mail this time )
|
|
Dahl Evonitek
|
Posted - 2009.12.06 12:30:00 -
[291]
I had suspected a DB issue right after getting back into the game after the crash. Getting back in I found my sister core probes gone from my launcher and 10 additional sisters combat probes in there instead.
I wonder when the fall-back DB gets its content replicated as I had indeed been using combats but had unloaded and loaded the cores fairly shortly before the crash I think (was afk at the time of the crash though, so not sure how shortly before that was).
It also seems weird that those 10 probes suddenly "materialized" out of thin air (and the cores vanishing into said thin air). If an outdated replicate DB had been the issue I'd have expected to have the cores in my hold and 10 of my owned combats in the launcher instead (the previous state of my ship before loading the cores).
The DB being such an integral part of eve, I hope these issues get resolved quickly. Regards, Dahl.
|
Gnulpie
Minmatar Miner Tech
|
Posted - 2009.12.06 12:35:00 -
[292]
Originally by: J Random
Bah .. I still don't understand why companies with real internet services use half ass solutions like a clustered Microsoft SQL environment .... go buy Oracle on AIX or Solaris like a real company ... there is a reason you don't see AMEX, Visa, or any large EDI like structure running Microsoft SQL to support their public facing front ends (or mission critical back ends). Hello you think Microsoft is running its Peoplesoft or SAP systems on Microsoft SQL .. guess what, surprise surprise, they aren't.
Jesus, use the right tools for the right job.
Oh yeah, and because they are such idiots at MS and CCP we see Eve crashing every second day ... oh wait ...
It is absolutely absurd how many people are getting such enraged about some INTERNET SPACESHIP GAME. Especially absurd while the real world around them is crashing and run by completely corrupt morons. But hey, let us enrage about about spaceship game. Just lol.
And to CCP: Keep the good work going! |
Shade Millith
International House of PWNCakes
|
Posted - 2009.12.06 12:52:00 -
[293]
The mass whining in here is strong
Get over it, stop acting like children, there was a 30 minute problem, it's not the end of the bloody world --------------------------------------------
|
LAZMAN
Esto Perpetua BiffCo.
|
Posted - 2009.12.06 12:56:00 -
[294]
Edited by: LAZMAN on 06/12/2009 12:59:17 my clients still getting stuck on log in alot and things are not loading when it does log in.....
seems to be one "fetching mail" causing massive lag's or rific.
|
Yon89
Triumvirate.
|
Posted - 2009.12.06 13:04:00 -
[295]
Originally by: LAZMAN Edited by: LAZMAN on 06/12/2009 12:59:17 my clients still getting stuck on log in alot and things are not loading when it does log in.....
seems to be one "fetching mail" causing massive lag's or rific.
same here
Originally by: MOTD Eve-mail is down at the moment and our engineers are working on restoring service as soon as possible.
============= SIG SIG SIG |
J Random
Tax Protestants
|
Posted - 2009.12.06 13:21:00 -
[296]
That's what I love about Microsoft fanatics, always making excuses for the vendor. Fact is CCP's primary business driver is Eve which means, like with any business driver, it needs to be resourced properly. It is irrelevant if Microsoft and CCP are working to resolve the issue, the fact is their solution failed. CCP chose to go with Microsoft SQL who anybody worth their salt in the real DBA world knows, just not the click monkey SA's who maintain databases, is that Microsoft SQL clustering solution currently has, and always has, serious HAC and performance issues.
I am not bashing Microsoft here for the sake of bashing them, their products are great for what they are meant for and even MSSQL has its places, but this isn't one of those places. You say Microsoft probably gave them the best deal and they shopped around but I doubt it. In my experience businesses don't factor in TCO, MTTF, reputation loss, support and sustainability cost, etc etc when going with post products, they simply look at the initial price tag for the application and my guess is CCP did the same; as the old saying goes "nobody gets fired for buying Cisco or Microsoft when it fails, you do get fired if you buy some other vendor cause everybody knows you should only buy Microsoft of Cisco". Oracle has some seriously competitive pricing as do the VAR's (IBM, Sun, Fujitus, etc etc).
Also the fact they aren't back up in twelve hours (still having massive gate closures) means they have some serious DRP problems also. At this point they prob should have realized ten hours ago to just roll the db back and drive on. As far as I can tell the geeks are running show and instead of doing the right thing after thirty minutes (shut the whole thing down, initiate DRP, and rollback) they are trying to debug on a production environment; somebody in the business office needs to reel them in. At some point the cost to company reputation by downtime outweighs the cost of losing some lost SP.
|
|
CCP Valar
|
Posted - 2009.12.06 13:39:00 -
[297]
The mail issue is fixed, a relog should clear up any issues you still have.
Note that blaming MS for the issues we had now would not be fair as the startup issues were caused by an improper shutdown and the mail issues were caused by new code deployed with Dominion. The mail SQL procedures were also a major contributor to the SQL load problem we had after startup. ---- Senior Virtual World Database Administrator Operations department CCP Games |
|
fuze
Gallente Quam Singulari
|
Posted - 2009.12.06 14:14:00 -
[298]
It wouldn't have been a proper patch without any serious problems since that's what happened with all the major patches. Compared to the earliest patches this is just a tiny glitch.
Originally by: The Mittani Where's the excellence?
|
Florio
Blue Republic
|
Posted - 2009.12.06 15:42:00 -
[299]
Edited by: Florio on 06/12/2009 15:42:35 Hi, I've been having untypical problems over the last few days too. Namely lag spikes where EVE freezes. Lost a ship to it and it is regular enough to make EVE pretty unplayable at this time, certainly PvP. No idea why it is happening but lag like this has not happened to me for almost a year. No other internet problems, no change of computer or settings or net provider.
edit/ i'm on Windows xp and eve voice has been fine.
|
Pteranodon
Caldari Stealthfield Ihatalo Cartel Navy
|
Posted - 2009.12.06 16:38:00 -
[300]
Originally by: CCP Valar The mail issue is fixed, a relog should clear up any issues you still have.
Note that blaming MS for the issues we had now would not be fair as the startup issues were caused by an improper shutdown and the mail issues were caused by new code NOT TESTED THROUGHLY BEFORE BEING deployed with Dominion. The mail SQL procedures were also a major contributor to the SQL load problem we had after startup.
I added the important bit for you.
|
|
|
|
|
Pages: 1 2 3 4 5 6 7 8 9 [10] 11 .. 11 :: one page |
First page | Previous page | Next page | Last page |