| Author | Thread Statistics | Show CCP posts - 9 post(s) | 
      
      
        |  Sleepkevert
 Paradox v2.0
 Interstellar Alcohol Conglomerate
 
 
       | Posted - 2007.09.11 18:41:00 -
          [1] 
 Edited by: Sleepkevert on 11/09/2007 18:42:28
 Lulz, now we have 2 server down stickys!
 
 
 Sign my sig
 | 
      
      
        |  Sleepkevert
 Paradox v2.0
 Interstellar Alcohol Conglomerate
 
 
       | Posted - 2007.09.11 18:44:00 -
          [2] 
 
 Lol, yeah, same. After i saw those messages popping up in the middle of the day, i expected the server to crash within 10 minutes Originally by: franny knew the server was fubar this morning with the 039820598 traffic advisorys, suprised it made it this long
 
  
 Sign my sig
 | 
      
      
        |  Sleepkevert
 Paradox v2.0
 Interstellar Alcohol Conglomerate
 
 
       | Posted - 2007.09.11 18:50:00 -
          [3] 
 
 FAILDIGGER! Originally by: CCP Wrangler The server failed over, again, and will be restarted.
 
 
 Mweh, i don't really care, i actually get to do some other stuff this way
  
 Sign my sig
 | 
      
      
        |  Sleepkevert
 Paradox v2.0
 Interstellar Alcohol Conglomerate
 
 
       | Posted - 2007.09.11 18:52:00 -
          [4] 
 
 lulz, QFT! Originally by: Postlatta Mouseanon 
  Originally by: Umbriele Need for speed?
 
 
 
 
 What? It crashes quite fast.
 
 You're just a naysayer!
 
 
 Sign my sig
 | 
      
      
        |  Sleepkevert
 Paradox v2.0
 Interstellar Alcohol Conglomerate
 
 
       | Posted - 2007.09.11 21:10:00 -
          [5] 
 
 Good! Originally by: CCP John Proctor We believe we have traced this issue to a failed controller card on our old RAMSAN that we attached a few days ago.
 
 Our hardware profile has not changed much during the past 30 days other then adding the unit back into the SAN array.
 
 Our other failovers that we have experienced since on the new SQL hardware was due to insufficient memory resources being assigned to the OS, that failover happened the day the new hardware was installed and was corrected, and another error was fixed by the installation of Service Pack 2 on the SQL server and the final one due to max degree of parallelism set too high and our queryÆs using too many processors at once leaving other requests too starve.
 
 These failover problems we are now having are the ones we were having prior to the upgrade of the SQL hardware (same error codes and little evidence of errors in the logs).
 
 Currently unfortunately we are running on the effected hardware but we have implemented steps to reduce the use of it by shifting all I/O's to the new RAMSAN, we will not be able to completely phase out the effected hardware until the next downtime and then we can switch too a different controller later on in the week on the unit and try and re-integrate it back into the cluster when we have ran a large battery of tests on it.
 
 We are working to address these issues, we care very deeply about the stability of TQ.
 
 To give you a brief recap on our database and the amazing things it does, with the increase in the player base and new features that have been rolled out via the API site and other tools we are at over 8,500 transactions a second, so you can see how trying to go back and go through all those transactions and data for the 1 transaction call that causes the server to initiate a failover can be a daunting task.
 
 But we are fully confident that we can fix this issue in a timely manner. Please have patience with us.
 
 We are throwing our full resources into this problem and feel as helpless as you do.
 
 Our apologies.
 
 
 
 
 
 Now rip the power cord out of the old ramsan, and reboot the damn server. It's down AGAIN!
 
 Sign my sig
 | 
      
        |  |  |