| Pages: 1 2 3 4 :: [one page] |
| Author |
Thread Statistics | Show CCP posts - 5 post(s) |
|

kieron

|
Posted - 2006.08.31 17:32:00 -
[1]
Yes, I know I promosed to publish Tuxford's Dev Blog today, but that one needs to wait until tomorrow. Not to fear, I have a different blog to publish, and one that is pertinent to the delay of the patch that was supposed to be deployed this morning.
So, what is the Birthday Paradox and just how does it relate to 32-bit code? Mathematicians and cryptographers should be able to figure that out without batting an eye. For those of us who aren't as well versed in math or code, check out Porkbelly's Dev Blog.
kieron Community Manager, EVE Online |
|

Linerra Tedora
Criss Cross
|
Posted - 2006.08.31 17:33:00 -
[2]
Edited by: Linerra Tedora on 31/08/2006 17:35:50 ohhhh BLOG!!! *droolls all over keyboard*
edit : funnily enough, i first saw that math probability problem yesterday... along with the cool monty hall probability.... i love these now...  EvE-CCG Spoiler Database |

Tobias Sjodin
Caldari Ore Mongers Pirate Coalition
|
Posted - 2006.08.31 17:34:00 -
[3]
Thanks for the heads up, CCP.
[they solved the wcs-prob] |

Jim McGregor
Caldari
|
Posted - 2006.08.31 17:35:00 -
[4]
Edited by: Jim McGregor on 31/08/2006 17:37:36
Nice to see you fixed the problem caused by the german language. At least I think thats what you are saying. :)
--- Eve Wiki | Eve Tribune | Eve Pirate |
|

kieron

|
Posted - 2006.08.31 17:38:00 -
[5]
Yes, the blog is saying the issue with the German localization will be fixed with Tuesday's patch.
kieron Community Manager, EVE Online |
|

Jean Perrau
|
Posted - 2006.08.31 17:41:00 -
[6]
Are you saying that your hash table code did not handle collisions ? Wow 
|
|

Oveur

|
Posted - 2006.08.31 17:42:00 -
[7]
Originally by: kieron Yes, the blog is saying the issue with the German localization will be fixed with Tuesday's patch.
Or more correctly, we've been adding so much stuff that the german language pack tilted the scales. So this would have happened sooner or later, no need to blame the german language directly 
Senior Producer EVE Online
|
|

Linerra Tedora
Criss Cross
|
Posted - 2006.08.31 17:43:00 -
[8]
damn you Oveour... i was just going to lay the blame at the german language... 
EvE-CCG Spoiler Database |

Jim McGregor
Caldari
|
Posted - 2006.08.31 17:44:00 -
[9]
Originally by: Linerra Tedora damn you Oveour... i was just going to lay the blame at the german language... 
I tried, but they pwned me. 
--- Eve Wiki | Eve Tribune | Eve Pirate |

Linerra Tedora
Criss Cross
|
Posted - 2006.08.31 17:46:00 -
[10]
i must be tired.. didn't even read your post that way first.. now i get it...
btw does anyone see a problem with the mentioning of germans and hash in the same context?  EvE-CCG Spoiler Database |

Rakeris
Legio VIII
|
Posted - 2006.08.31 17:46:00 -
[11]
Originally by: Oveur
Or more correctly, we've been adding so much stuff that the german language pack tilted the scales. So this would have happened sooner or later, no need to blame the german language directly 
We can blame it indirectly though? Right?
--------
|

Skogen Gump
|
Posted - 2006.08.31 17:47:00 -
[12]
Can't you just use GUIDs ? they're pretty robust for things like this and they save you the need to see if it's used already; as long as they're generated by the same computer anyway.
|
|

Suvetar
Forum Moderator Interstellar Services Department

|
Posted - 2006.08.31 17:49:00 -
[13]
I'll ask nicely that people don't start smack talking about the German language, please.
|
|

Kilo Paskaa
Thugs 4 less
|
Posted - 2006.08.31 17:49:00 -
[14]
Umm woot?
|

Malthros Zenobia
Caldari Independent Navy Reserve Kimotoro Directive
|
Posted - 2006.08.31 17:49:00 -
[15]
This blog conufses me.
Sorry you can't afford a dev so you get me instead ^^ - Xorus |

Jim McGregor
Caldari
|
Posted - 2006.08.31 17:51:00 -
[16]
Originally by: Suvetar I'll ask nicely that people don't start smack talking about the German language, please.
Fine, but just so you know... the jokes wasnt intended to be taken seriously.
--- Eve Wiki | Eve Tribune | Eve Pirate |

Linerra Tedora
Criss Cross
|
Posted - 2006.08.31 17:53:00 -
[17]
Edited by: Linerra Tedora on 31/08/2006 17:53:59 Ahh yes.. the allmighty david...
Linkage
edit : no smack intended.. just to have a quiet fun time... after all, i have many german friends...  EvE-CCG Spoiler Database |

Magunus
The Forsakened Few The ARR0W Project
|
Posted - 2006.08.31 17:55:00 -
[18]
Originally by: Jim McGregor
Originally by: Suvetar I'll ask nicely that people don't start smack talking about the German language, please.
Fine, but just so you know... the jokes wasnt intended to be taken seriously.
Argh. Finally a chance to use that 'German sense of humour' joke and it'll get modded in seconds. :( ---
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- Douglas Adams, 'The Restaurant at the End of the Universe' |

Andrue
Amarr
|
Posted - 2006.08.31 17:59:00 -
[19]
Edited by: Andrue on 31/08/2006 17:59:34 32-bit hash? Meh. We use a hash in our text indexing engine and it's a lot wider than 32 bits :) -- (Battle hardened miner)
[Brackley, UK]
WARNING:This post may contain large doses of reality. |

Jim McGregor
Caldari
|
Posted - 2006.08.31 18:21:00 -
[20]
By the way... now I realize why you put the Tuxford blog on a friday. Damage control. While people rabble for 15 pages, you are at the pub enjoying life. 
--- Eve Wiki | Eve Tribune | Eve Pirate |

Caleb Paine
Infinite Technologies
|
Posted - 2006.08.31 18:23:00 -
[21]
Edited by: Caleb Paine on 31/08/2006 18:23:09 nm
Death smiles at us all, all a man can do is smile back. |

Linerra Tedora
Criss Cross
|
Posted - 2006.08.31 18:34:00 -
[22]
Originally by: Jim McGregor
By the way... now I realize why you put the Tuxford blog on a friday. Damage control. While people rabble for 15 pages, you are at the pub enjoying life. 
lol...  EvE-CCG Spoiler Database |

Scetrov
Gallente OcUK
|
Posted - 2006.08.31 18:49:00 -
[23]
A million system designers have done it before, and a million system designers will do it again.
:) --
|

Scorpyn
Caldari Infinitus Odium
|
Posted - 2006.08.31 18:52:00 -
[24]
That sounds like it might have been somewhat difficult to debug. I wonder if it's caused other problems without being noticed before 
|

Raith Seinar
|
Posted - 2006.08.31 19:00:00 -
[25]
This happened to me on patch day...might have been missed in the issues thread.
Link
|

Maya Rkell
Forsaken Empire
|
Posted - 2006.08.31 19:01:00 -
[26]
I'd imagine that the effect would be quite noticeable, and it's certainly checkable.
GUID hash collisions are someone I've encountered before myself...
//Maya |

SephiriotH
REUNI0N Red Alliance
|
Posted - 2006.08.31 19:06:00 -
[27]
Those "language" packs will bring nothin more than headache. I dont like to be negative, however I see it as a promoting peoples to be stupid. Allowing the peoples to learn english, and concentrating to fix known issues and hunting a damn old bugs (some of them are still alive since EXODUS release) could be much better option...
Remember : with a good torpedo and bad word You can hurt much more than with bad word alone. |

FireFoxx80
Caldari E X O D U S Imperial Republic Of the North
|
Posted - 2006.08.31 19:28:00 -
[28]
Once again, math triumps over man!
Math: 1 Man: 0
What I do the rest of the time - Vote for a Jita bypass! |

Skogen Gump
|
Posted - 2006.08.31 19:30:00 -
[29]
Originally by: Maya Rkell I'd imagine that the effect would be quite noticeable, and it's certainly checkable.
GUID hash collisions are someone I've encountered before myself...
Hey hey come on ... Hash collisions are one thing but Guid hash collisions are alltogether something else entirely, the fact that they're partly based (in standard implementation) on how many milliseconds since the Epoch and a computers mac-address makes the chance of getting a GUID collision extremely small.
So, if you've had a Guid hash collision, then unless you're inserting the same item into your hash-table twice, that's a story you've got to tell 
Originally by: "Wikipedia" While each generated GUID is not guaranteed to be unique, the total number of unique keys (2[super]128[/super] or 3.4028+1038) is so large that the possibility of the same number being generated twice is very small.
|

Regat Kozovv
Caldari Orion's Forge New Eden Conglomerate
|
Posted - 2006.08.31 19:36:00 -
[30]
Originally by: Skogen Gump
So, if you've had a Guid hash collision, then unless you're inserting the same item into your hash-table twice, that's a story you've got to tell 
I'd be hauling ass to the 7-eleven to buy my lottery ticket!
32bit hashes seem nuts today, but three years ago it probably wasn't too far fetched.
Facinating dev blog CCP. Good luck with the fixes...
|

xeom
Veto.
|
Posted - 2006.08.31 19:37:00 -
[31]
quick unrelated question kieron,sence you posted this blog today does that mean no tuxford blog?
CCP where are our t2 shield power relays? |

Skogen Gump
|
Posted - 2006.08.31 19:42:00 -
[32]
Originally by: Regat Kozovv
I'd be hauling ass to the 7-eleven to buy my lottery ticket!
32bit hashes seem nuts today, but three years ago it probably wasn't too far fetched.
Facinating dev blog CCP. Good luck with the fixes...
That's very true but yeah like you say, it wasn't a daft thing to do back then 
|

Skogen Gump
|
Posted - 2006.08.31 19:45:00 -
[33]
Lol, I just read this on Wikipedia to describe the flexibility of the GUID or UUID ...
During a 10 billion year lifetime of the Earth, about 1 trillion UUIDs have to be created every nanosecond to exhaust the number of UUIDs.
|

John McCreedy
Caldari Eve Defence Force Ascendant Frontier
|
Posted - 2006.08.31 19:54:00 -
[34]
For the love of god, some of us don't talk geek. WTF is a 32 bit hash and how does that affect my day to day playing of Eve? 
Make a Difference
|

Thiedo Mandragorian
Gallente
|
Posted - 2006.08.31 20:00:00 -
[35]
Edited by: Thiedo Mandragorian on 31/08/2006 20:01:53
Originally by: Skogen Gump I feel that I need to backup that last post with some evidence ...
I work for a company that makes a CRM, we use an Object orientated approach to storing data and we use our SQL database to store attributes in rows instead of columns, so we can have as few or as many values for an object as we like.
Our main table has over 1.8 million rows in our biggest installation and our primary key (SQL Server 2000) is of type UniqueIdentifier and the formula for it's Identity field is ... =NewID()
We've never ever seen a collision.
Hey.. isn't an auto-incrementing int (or bigint, depending on need) a better choice for a unique id? From what I understand, a GUID is stored as a string. Wouldn't that completely screw you when you place the clustered index on your identity field?
EDIT: Not to mention the extra record overhead storing that 36 (?) character key in various other tables as a foreign key...
Thanks! Thiedo Mandragorian President/CEO Mandragorian Mining |

Rahn Sohwant
|
Posted - 2006.08.31 20:01:00 -
[36]
I think 32 bits ought to be enough for anybody.
|

Todd Overbeck
|
Posted - 2006.08.31 20:02:00 -
[37]
Originally by: John McCreedy For the love of god, some of us don't talk geek. WTF is a 32 bit hash and how does that affect my day to day playing of Eve? 
It's like a fingerprint. Instead of copying the objects everywhere they are needed, you can store the fingerprint of the object and use that to look up the object when you need it.
Unfortunately, if two objects have the exact same fingerprint, the system can't determine which of the two objects is the right one, and is likely to guess wrong.
It's a case of mistaken identity 
|

Thiedo Mandragorian
Gallente
|
Posted - 2006.08.31 20:03:00 -
[38]
Originally by: Todd Overbeck
Originally by: John McCreedy For the love of god, some of us don't talk geek. WTF is a 32 bit hash and how does that affect my day to day playing of Eve? 
It's like a fingerprint. Instead of copying the objects everywhere they are needed, you can store the fingerprint of the object and use that to look up the object when you need it.
Unfortunately, if two objects have the exact same fingerprint, the system can't determine which of the two objects is the right one, and is likely to guess wrong.
It's a case of mistaken identity 
OMG!! Identity theft has permeated the Eve Universe!!

Thanks! Thiedo Mandragorian President/CEO Mandragorian Mining |

Crunch Hardiron
Caldari
|
Posted - 2006.08.31 20:03:00 -
[39]
Originally by: Rahn Sohwant I think 32 bits ought to be enough for anybody.
Is it sad that not only did I get that, I laughed? ---- "Everyone who doesn't believe the sky is falling is an ass-kissing fanboi! EVE is going to fail, mark my words!" -Customers |

Lina Morgai
Minmatar
|
Posted - 2006.08.31 20:03:00 -
[40]
Originally by: John McCreedy For the love of god, some of us don't talk geek. WTF is a 32 bit hash and how does that affect my day to day playing of Eve? 
The method eve code uses to create unique identifiers is not adequate compared to how many unique identifiers eve needs.
Does that clarify the issue any?
|

Skogen Gump
|
Posted - 2006.08.31 20:06:00 -
[41]
Originally by: John McCreedy For the love of god, some of us don't talk geek. WTF is a 32 bit hash and how does that affect my day to day playing of Eve? 
A hash, or hash table is a structure used in computers to organise Data that you want to be unique.
Say you've got a list of unique solar systems in EVE that you need to make sure is unique, you use a mathematical function to look at the stuff in the solar system and create a special number that can only be created by that combination of information - even changing a single letter from upper-case to lower-case would change that number.
So you take the data, and you store the number and you know you've got a list of unique data, the other nice thing is that you can then take that number and use it to as an index, so it makes it easy to find things - it becomes a reference number and as it can only have been generated by one set of information, you can trust as (as a computer programmer) that the data you find with that special number is the right Data.
What CCP are saying is that the way they made these special numbers has found a way of making the same number from two different pieces of information.
That's bad ...
|

Skogen Gump
|
Posted - 2006.08.31 20:08:00 -
[42]
Originally by: Crunch Hardiron
Originally by: Rahn Sohwant I think 32 bits ought to be enough for anybody.
Is it sad that not only did I get that, I laughed?
No and No ... I laughed too 
|

Maya Rkell
Forsaken Empire
|
Posted - 2006.08.31 20:18:00 -
[43]
Originally by: Skogen Gump
Originally by: Maya Rkell I'd imagine that the effect would be quite noticeable, and it's certainly checkable.
GUID hash collisions are someone I've encountered before myself...
Hey hey come on ... Hash collisions are one thing but Guid hash collisions are alltogether something else entirely, the fact that they're partly based (in standard implementation) on how many milliseconds since the Epoch and a computers mac-address makes the chance of getting a GUID collision extremely small.
So, if you've had a Guid hash collision, then unless you're inserting the same item into your hash-table twice, that's a story you've got to tell 
The engine I work with calls file hashes GUID's. Never realised there was a standard definition/way of generating tbh :P
//Maya |

Dessa DesPlains
|
Posted - 2006.08.31 20:39:00 -
[44]
This explains how one of my alts got German 'skill finished' messages on an english client..
|

Altaree
Gallente Red Frog Investments Daikoku Trade Syndicate
|
Posted - 2006.08.31 21:39:00 -
[45]
GREAT BLOG! I love all of this details in it. KEEP UP THE GREAT WORK!! --Altaree
Fix The Web Servers |

schurem
Anarchy Inc. Confederation of Independent Corporations
|
Posted - 2006.08.31 21:48:00 -
[46]
So ah, does this explain why the server been crashing like mad lately, or does it explain why the patch has been postponed a couple times? <<<< No Boundaries, No Fences, Fly Free Or Die Trying >>>>
Typhoon for the win. Nastier than whats faster, faster than whats nastier! |

Thario
|
Posted - 2006.08.31 22:03:00 -
[47]
Originally by: Thiedo Mandragorian
Hey.. isn't an auto-incrementing int (or bigint, depending on need) a better choice for a unique id? From what I understand, a GUID is stored as a string. Wouldn't that completely screw you when you place the clustered index on your identity field?
EDIT: Not to mention the extra record overhead storing that 36 (?) character key in various other tables as a foreign key...
I think you'r messing up (unique) ID of an item with its hash value. unique ID are what it says: unique number identifying the item. Hash values are typicaly used when you want to store items in a way you can easy access them (fast and without waisting to much memory). I think here we are talking about some resource file and once loaded you put it somewhere in the memory and store its hash value together with a pointer to where you can find it in the memory in an easy access hash table. that way you can later on (when you need it again) see if you already loaded it, or actualy have to go through all the trouble of loading the data you want from your harddisk (or even worse a remote server).
|

Shiraz Merlot
Octavian Vanguard
|
Posted - 2006.08.31 22:49:00 -
[48]
Originally by: Jean Perrau Are you saying that your hash table code did not handle collisions ? Wow 
He didn't describe a hash table, he described an object loader. You must be some junior coder to be so *****y whilst using the wrong terminology. Equivalent of a "speling" flame. Go back to your desk.
|

Thiedo Mandragorian
Gallente
|
Posted - 2006.08.31 22:57:00 -
[49]
Edited by: Thiedo Mandragorian on 31/08/2006 23:01:58
Originally by: Thario
Originally by: Thiedo Mandragorian
Originally by: Skogen Gump I feel that I need to backup that last post with some evidence ...
I work for a company that makes a CRM, we use an Object orientated approach to storing data and we use our SQL database to store attributes in rows instead of columns, so we can have as few or as many values for an object as we like.
Our main table has over 1.8 million rows in our biggest installation and our primary key (SQL Server 2000) is of type UniqueIdentifier and the formula for it's Identity field is ... =NewID()
We've never ever seen a collision.
Hey.. isn't an auto-incrementing int (or bigint, depending on need) a better choice for a unique id? From what I understand, a GUID is stored as a string. Wouldn't that completely screw you when you place the clustered index on your identity field?
EDIT: Not to mention the extra record overhead storing that 36 (?) character key in various other tables as a foreign key...
I think you'r messing up (unique) ID of an item with its hash value. unique ID are what it says: unique number identifying the item. Hash values are typicaly used when you want to store items in a way you can easy access them (fast and without waisting to much memory). I think here we are talking about some resource file and once loaded you put it somewhere in the memory and store its hash value together with a pointer to where you can find it in the memory in an easy access hash table. that way you can later on (when you need it again) see if you already loaded it, or actualy have to go through all the trouble of loading the data you want from your harddisk (or even worse a remote server).
Hey Thario!
I think you misunderstood. This was a slightly off-topic question I had for Skogen Gump specifically. He was saying that they used the MS-SQL function "NewID()" to generate GUID values for uniquely identitifing database records. I was stating that using numeric identifiers would be more efficient, both in a storage and indexing sense.
Thanks! Thiedo Mandragorian President/CEO Mandragorian Mining |

Lake
The Praxis Initiative Interstellar Starbase Syndicate
|
Posted - 2006.08.31 22:57:00 -
[50]
Simple fix. Just reword something on the german character sheet. 
*ducks* |

Skawl
GeoTech
|
Posted - 2006.08.31 23:12:00 -
[51]
Originally by: FireFoxx80 Once again, math triumps over man!
Math: 1 Man: 0
Surely if math triumphed once again it'd be at the very least
Math: 2 Man: 0
|

Skogen Gump
|
Posted - 2006.08.31 23:17:00 -
[52]
Originally by: Thiedo Mandragorian
Hey Thario!
I think you misunderstood. This was a slightly off-topic question I had for Skogen Gump specifically. He was saying that they used the MS-SQL function "NewID()" to generate GUID values for uniquely identitifing database records. I was stating that using numeric identifiers would be more efficient, both in a storage and indexing sense.
Yeah, I'd normally agree - but Microsoft are inlove with the GUID and thusly have an intrinsic data type in SQL Server called the Unique Identifier that can be used for Primary keys and Identity columns; I've not read any literature from MSDN or other sources (and trust me I don't implicitely trust the might word of redmond) that say that using a Unique Identifier as a lookup value causes any performance degredation, but I also don't know how it's stored internally for indexing purposes.
Given that nothings actually a type until it's loaded and parsed, and that a guid is basically hexadecimal, it's likely to be faster then a string at anytime; the parentheses, braces and hyphens to make it easily human readable are only added when they're printed as strings.
|

Magnus Card
Caldari
|
Posted - 2006.08.31 23:30:00 -
[53]
Originally by: Crunch Hardiron
Originally by: Rahn Sohwant I think 32 bits ought to be enough for anybody.
Is it sad that not only did I get that, I laughed?
Nope, Bill gates reference right? 640kb...
|

MysticNZ
Solstice Systems Development Concourse
|
Posted - 2006.08.31 23:49:00 -
[54]
Originally by: Rahn Sohwant I think 32 bits ought to be enough for anybody.
Good old Bill Gates  -=====-
Xorus is teh nub :D |

Nerfherder2000
Gallente Nanotechnologists Are Nurturing Outer Space Interstellar Starbase Syndicate
|
Posted - 2006.09.01 00:34:00 -
[55]
Originally by: Skawl
Originally by: FireFoxx80 Once again, math triumps over man!
Math: 1 Man: 0
Surely if math triumphed once again it'd be at the very least
Math: 2 Man: 0
Even better
Math: n+2 Man: n
|

bonesbro
|
Posted - 2006.09.01 01:23:00 -
[56]
Originally by: Skogen Gump
Originally by: Thiedo Mandragorian
Hey Thario!
I think you misunderstood. This was a slightly off-topic question I had for Skogen Gump specifically. He was saying that they used the MS-SQL function "NewID()" to generate GUID values for uniquely identitifing database records. I was stating that using numeric identifiers would be more efficient, both in a storage and indexing sense.
Yeah, I'd normally agree - but Microsoft are inlove with the GUID and thusly have an intrinsic data type in SQL Server called the Unique Identifier that can be used for Primary keys and Identity columns; I've not read any literature from MSDN or other sources (and trust me I don't implicitely trust the might word of redmond) that say that using a Unique Identifier as a lookup value causes any performance degredation, but I also don't know how it's stored internally for indexing purposes.
Yes, a clustered index on a GUID is a bad idea. GUIDs are randomly generated, and a clustered index sorts things into order. Thus, inserting a new row will insert into an essentially random place in the table. This will cause significant fragmentation - in fact, you get a table that's perfectly fragmented.
|

Crunch Hardiron
Caldari
|
Posted - 2006.09.01 01:27:00 -
[57]
Originally by: Skawl
Originally by: FireFoxx80 Once again, math triumps over man!
Math: 1 Man: 0
Surely if math triumphed once again it'd be at the very least
Math: 2 Man: 0
Math: 2 Man: 1
Math: 2 Man: 2
Math: 2 Man: 3
Man WINS! ---- "Everyone who doesn't believe the sky is falling is an ass-kissing fanboi! EVE is going to fail, mark my words!" -Customers |

Jaden Haryl
|
Posted - 2006.09.01 01:33:00 -
[58]
Originally by: Skogen Gump
Yeah, I'd normally agree - but Microsoft are inlove with the GUID and thusly have an intrinsic data type in SQL Server called the Unique Identifier that can be used for Primary keys and Identity columns; I've not read any literature from MSDN or other sources (and trust me I don't implicitely trust the might word of redmond) that say that using a Unique Identifier as a lookup value causes any performance degredation, but I also don't know how it's stored internally for indexing purposes.
Given that nothings actually a type until it's loaded and parsed, and that a guid is basically hexadecimal, it's likely to be faster then a string at anytime; the parentheses, braces and hyphens to make it easily human readable are only added when they're printed as strings.
Well SQL server hardly cares whether its a string or not when it indexes, but the size of the key does matter, and 128bit GUID is 4 times the size of a 32bit int, and this definetly affects performance. Also for a table with a high volume of inserts sequential ints will vastly out perform GUIDs as pages do not have to split. There are some white papers out there going into great depth about the performance implications of GUIDs for PKs, heck check out comp.databases.theory and can see it beaten to death, but I wouldnt go there telling them you store attributes in rows, they will tear you apart, as it is such a common elementary mistake to use a relational DB this way.
BTW any reason you guys are using a GUID on a table with only 1.8 mill rows, a 32bit int should be more than enough, or go big int, either way better performance. Maybe you guys are replicating? Then again you should just be creating new tables with different rows for different object types and not storing all objects attributes in one big table, but I digress.
|

jamesw
Omniscient Order
|
Posted - 2006.09.01 02:03:00 -
[59]
Did someone say there was spare hash?? --
NEW Vid: Domi For the Win! |

Alekto Erinys
Platinum Investments
|
Posted - 2006.09.01 03:28:00 -
[60]
So, uhhh.... for a practical concern 
How much programming effort will this take to fix? Do you need to rewrite a ton of code?
And I think I kind of understood what some of the programmers were saying ... so would it be better to use sequential numbers instead of the (randomly generated?) hashes you currently use?
|

Lala Ru
|
Posted - 2006.09.01 04:23:00 -
[61]
Why did you rely on RANDOM CHANCE to ensure data integrety?
Then again, I write medical software, where data integrety errors = we don't go home until its fixed.
|

Death Kill
Caldari direkte
|
Posted - 2006.09.01 05:13:00 -
[62]
Originally by: Andrue We use a hash
So do we, but I think in a different way than you
Recruitment |

Ralitge boyter
Minmatar
|
Posted - 2006.09.01 05:37:00 -
[63]
Edited by: Ralitge boyter on 01/09/2006 05:39:52 One does wonder how much other stuff is out there burried deep inside years old code that could potentialy make EVE go nuts...
Then again being the great company that CCP is I assume they will now assign the unlucky programer that wrote this bit of code to go over all the remaining antique code. So that after the dust is blown off he/she can check if there is not more fun to be had with 32bit hash tables or other mathematical pit falls. We all know that CCP houses some of you number loving math brains who don't mind doing fancy stuff with these dirty little digets. It might be that one of those people that came in with the furniture left some other surprize for you rookies 
Anyway, cool that you guys found the offending bit of code, and intresting that this issue has not shown up yet in the china code branch. Does this mean that the chineese cluster only speeks chineese? The likely hood of having +40k players most of whom are using a diffrent than standard language set not causing the same problems seems to remote to even want to calculate it. They should have caused the same error as was encountered on a +20k player base where only 2 to 3k players at a time use a diffrent than standard language pack, should it not? If not please do explain where this seemingly obvious calculation is flawed. 
CCP programers are just like dogs, when a bug is found management just screems: Go Patch !!! and within a few hours they run back to managment with heads held hy and a patch in their hands. 
Edit: because I am to dumb to do it right in a single go  ------------------------------------------- Should you disagree with me, well I guess that is because I disagree with you. If you have a problem with that please feel free not to tell me. |

Skogen Gump
|
Posted - 2006.09.01 06:53:00 -
[64]
Originally by: Jaden Haryl
Well SQL server hardly cares whether its a string or not when it indexes, but the size of the key does matter, and 128bit GUID is 4 times the size of a 32bit int, and this definetly affects performance. Also for a table with a high volume of inserts sequential ints will vastly out perform GUIDs as pages do not have to split. There are some white papers out there going into great depth about the performance implications of GUIDs for PKs, heck check out comp.databases.theory and can see it beaten to death, but I wouldnt go there telling them you store attributes in rows, they will tear you apart, as it is such a common elementary mistake to use a relational DB this way.
BTW any reason you guys are using a GUID on a table with only 1.8 mill rows, a 32bit int should be more than enough, or go big int, either way better performance. Maybe you guys are replicating? Then again you should just be creating new tables with different rows for different object types and not storing all objects attributes in one big table, but I digress.
It's all down to what you're doing, if you're storying lots of string like values then using a row attribute architechture works and its simply enough to implement, of course if I was going for the national/international sized database for storing genetic's data or something with billions and billions of rows, you'd use something more appropriate and you wouldn't be using SQL Server 
Why don't we use ints as Ids ? well; the same reason that CCP did and now get collisions.
If you're only generating Guids on the same machine, you can almost guarantee that they will never ever collide.
|

Ossi Black
|
Posted - 2006.09.01 06:59:00 -
[65]
"A patch to take on the world"
Famous last words.
Just kidding, glad you figured it out.
|

FireFoxx80
Caldari E X O D U S Imperial Republic Of the North
|
Posted - 2006.09.01 07:42:00 -
[66]
Originally by: Skawl
Originally by: FireFoxx80 Once again, math triumps over man!
Math: 1 Man: 0
Surely if math triumphed once again it'd be at the very least
Math: 2 Man: 0
Thus another triumph?
Math: 3 Man 0
What I do the rest of the time - Vote for a Jita bypass! |

Irimi Nage
Evolution Band of Brothers
|
Posted - 2006.09.01 08:03:00 -
[67]
-- COPIED FROM INTERNAL FORUM --
There seems to be a bit of missunderstanding around this, so for anyone interested I'm going to try and clear it up a bit, I'd post to eve-o, but I'm at work and posting is blocked, I'll probably copy this to there when I get home...
This has LOTS of assumptions in it.
Firstly without knowing the structure of the resource file, it likely works something like this: Each resource (logical file) is stored inside one giant resource archive (physical file). The location of the logical file within the physical file is stored in a File Allocation Table (FAT) which would be in the header of the physical file. The FAT is read into memory and stored in a hashtable. When a resource is loaded, the cached object likely replaces the file offset in the hashtable.
The FAT probably looks something like this:
<Resource ID, File Offset><Resource ID, File Offset>
etc.
It is more efficient to use a single physical file, rather than lots of them because the OS and disk managers will "prebuffer" data for you, and you will get better disk cache performance, as well as having a better likelyhood of the files being contiguous on disk.
A Hashtable is a data structure which makes it quick to find an item. Most people when they think of a lookup, think of it as follows: Compare all items against what you're searching for until you find a match. This is _very_ slow. A hashtable allows you store items in an indexed list. You take the search term do a quick little bit of maths on it, and it produces an index into the list. You then grab the item at that index, search over. The little bit of maths done is a "hash function". It is not a "cryptographic hash function", they are similar in maths definition, but _very_ different in practice.
Picture the following hashtable:
Index Item 0 Resource1 1 Resource3 2 <empty> 3 Resource2 4 <empty> 5 Resource4
Now, as the resource ID is likely to just be a 32 bit GUID the mathematical hash can collide. A collision is when two GUIDs when "hashed" produce the same index. Lets say we are in the process of loading the resource offsets. We've loaded 4 resorces so far, but when loading the fifth resource, we get a hash collision. The index we get is 1. This makes the hashtable look like this:
Index Item 0 Resource1 1 Resource5 2 <empty> 3 Resource2 4 <empty> 5 Resource4
Resource3 has been replaced by Resource5! Holy crap, hello weird client behaviour, and memory that looks corrupted. The client requests Resource3, but it actually gets Resource5.
There is a form of hashtable known as a "chaining hashtable". This data structure is more robust than a plain hashtable, but slightly slower and less efficient. This works by when a collision appears it stores BOTH items and when you try to look them up, it does a slow search if it detects multiple items and returns the correct one. With a chaining hashtable we would see the following:
Index Item 0 Resource1 1 Resource3, Resource5 2 <empty> 3 Resource2 4 <empty> 5 Resource4
Now, when the client requests a resource it gets the correct one all the time, although it gets it more slowly should there be collisions.
I suspect the new patch simply implements a chaining hashtable rather than a plain hashtable.
The above is probably a dramatic oversimplification of the problem and the resolution. But it should clear up all the comments about using GUIDs and about how unlikely collisions are with a cryptographic hash.
I've seen posts saying you shouldn't get a collision using cryptographic hashes etc, but they won't be using cryptographic hash functions because they're useless for hashtables.
It's also not collisions in DB IDs - it is a client problem. It is most likely a collision in the indexing of the resource manager's internal hashtable. ---
|

Eewec Ourbyni
Caldari
|
Posted - 2006.09.01 08:09:00 -
[68]
If two resource files have the same GUID why didn't something complain when you tried adding the second entry for the same GUID to the GUID index table?... think I need to go read up on GUID's but that would be my first concern after finding a clash like this.
This is a sig...
-- You think this guys post is nuts.... you should see his bio --
... good, ain't it! |

Irimi Nage
Evolution Band of Brothers
|
Posted - 2006.09.01 09:36:00 -
[69]
Originally by: Eewec Ourbyni Edited by: Eewec Ourbyni on 01/09/2006 08:26:23 Edited by: Eewec Ourbyni on 01/09/2006 08:18:28 If two resource files have the same Hashed GUID why didn't something complain when you tried adding the second entry for the same Hashed GUID to the resource index table?... think I need to go read up on GUID's but that would be my first concern after finding a clash like this.
EDITED: Ahh, course the Hashed Guid itself isn't stored... but still considering the resource files for things like the language packs are constant unless you specifically make a patch for them, it does raise the question as to why the resource files aren't checked for clashing Hashed GUID's. It also makes you wonder if there are any other clashing Hashed Guids out there in EVE-DB land.
Eewec, no complaints, because most plain hashtable implementations happily just replace the pointer in the array. EG:
hashtable[ hash( resourceID ) ] = resourceOffset
The server will send a message to the client which will give a resource ID. The resource ID will be hashed by the client (not precompiled) and turned into an index/offset by the resource manager. Given the nature of the quick hashtable hashing functions usually used, it isn't a suprise that so few resources caused a collision, especially with such a small sample input size (32bit).
Now, the resource file might have precompiled offsets known by the client, rather than a FAT, but they're still going to have to be looked up. The hashing of the GUID isn't done until load-time. The resource file offsets would be linked by resource ID (GUID) rather than an index number.
That's my guess anyway. ---
|
|

porkbelly

|
Posted - 2006.09.01 09:50:00 -
[70]
Edited by: porkbelly on 01/09/2006 09:51:54 Edited by: porkbelly on 01/09/2006 09:51:02
Originally by: Jean Perrau Are you saying that your hash table code did not handle collisions ? Wow 
Indeed. We have a good excuse though: At the time this was written, the set of resources was largely static. Also, we were only looking up files that we knew were already in there. Thus, any collisions would be noticed and acted on at build time. The overhead of storing the actual filenames wasn't needed.
What has changed is that not only do we have more resources, we are also querying for possibly non-existant resource files (to see if localized versions exist), and this can create hash collisions at run time.
Anyway, it's one of those things that happens when you write code with certain assumptions, and then go ahead and violate them.
What it is! |
|

Wilfan Ret'nub
Singularity.
|
Posted - 2006.09.01 10:15:00 -
[71]
I can't decide if this one is worthy of The Daily WTF or not . ------ No ISK, no fun |

Norris Neophitus
Minmatar Hidden Agenda
|
Posted - 2006.09.01 10:43:00 -
[72]
Edited by: Norris Neophitus on 01/09/2006 10:43:12 Hmmm so CCP take the reactive approach rather than the Proactive approach to their code.
Ahhhh screw it i can't be arsed with this things like this will happen and we can't do anything to help with or point out potential problems.
Honestly im surprised all this talk of hash and hash tables only got one drug reference and loads of maths ones.
/me needs drugs just to deal with the maths being talked about.
Originally by: Wild Rho -------------------------------------------------------------------------------- I'm not emo, I just hate my life and no one understands me ----------------------------- |

Joerd Toastius
Octavian Vanguard
|
Posted - 2006.09.01 11:13:00 -
[73]
Originally by: Nerfherder2000
Originally by: Skawl
Originally by: FireFoxx80 Once again, math triumps over man!
Math: 1 Man: 0
Surely if math triumphed once again it'd be at the very least
Math: 2 Man: 0
Even better
Math: n+2 Man: n
That assumes that Math is only leading by 2. Given the available data a better formulation would be
Math: n+x Man: n
Where x >= 2
That guarantees that Math has beaten Man by at least two while allowing it may have won further times too.
However, that still doesn't fully capture the statement "Once again, math triumps over man!", because it implicitly assumes that Math has beaten Man more times than Man has beaten Math, which is not specified. The best formulation would be
Math: x+1 Man: y
Where x>1. That simply specifies that Math and Man both have independent scores, that Math has won at least once in the past and that it's just won again; Man's score can be any number allowed by the scoring rules in question.
/logicmode
|

Cheyenne Shadowborn
Caldari Citizens of E.A.R.T.H. E.A.R.T.H. Federation
|
Posted - 2006.09.01 11:38:00 -
[74]
Edited by: Cheyenne Shadowborn on 01/09/2006 11:39:41 How do I post this in the most friendly way?
Erm ... if my product had code that generated a supposedly unique hash and didn't check if it already existed, I wouldn't blog about it 
Being the lazy guy that I am I'd probably hunt down the source for Linux mktemp() command because thats one command that makes a unique hash (32 bit wide, if you will) and actually CHECKS for collisions (whoops, then Eve would have to be open source ... nevermind )
Btw, how is the birthday thing a paradox? Surprising to some - yes. Defies common sense of most people - yes. A paradox? Not a chance. --
|

Cheyenne Shadowborn
Caldari Citizens of E.A.R.T.H. E.A.R.T.H. Federation
|
Posted - 2006.09.01 11:42:00 -
[75]
Originally by: Skogen Gump Can't you just use GUIDs ? they're pretty robust for things like this and they save you the need to see if it's used already; as long as they're generated by the same computer anyway.
www.google.com
Short for Globally Unique Identifier, a unique 128-bit number that is produced by the Windows OS or by some Windows applications to identify a particular component, application, file, database entry, and/or user.
Ouch. Writing portable code anyone?  --
|

Johan VanDelft
Gallente
|
Posted - 2006.09.01 11:58:00 -
[76]
Originally by: Eewec Ourbyni Edited by: Eewec Ourbyni on 01/09/2006 08:26:23 Edited by: Eewec Ourbyni on 01/09/2006 08:18:28 If two resource files have the same Hashed GUID why didn't something complain when you tried adding the second entry for the same Hashed GUID to the resource index table?... think I need to go read up on GUID's but that would be my first concern after finding a clash like this.
EDITED: Ahh, course the Hashed Guid itself isn't stored... but still considering the resource files for things like the language packs are constant unless you specifically make a patch for them, it does raise the question as to why the resource files aren't checked for clashing Hashed GUID's. It also makes you wonder if there are any other clashing Hashed Guids out there in EVE-DB land.
This has nothing to do with the database, but with the resources in the client (eg language strings)
|

F2C MaDMaXX
London Transport Freelancer Alliance
|
Posted - 2006.09.01 12:08:00 -
[77]
So, does this fixing actual help anything else/improve anything that already happens at all?
______________________________________ Natural Selection Developer
Sound FX |

Skogen Gump
|
Posted - 2006.09.01 12:17:00 -
[78]
Originally by: Cheyenne Shadowborn
www.google.com
Short for Globally Unique Identifier, a unique 128-bit number that is produced by the Windows OS or by some Windows applications to identify a particular component, application, file, database entry, and/or user.
Ouch. Writing portable code anyone? 
Firstly, EVE is only written for Windows - what's wrong with it using features for windows. Secondly, the UUID (of which GUID is just an implementation) is a standard by the OSF (Open Software Foundation) or as they're now known, The Open group. You know, the folks behind things like POSIX,LDAP and X Windows. 
|

Pesadel0
Vagabundos
|
Posted - 2006.09.01 12:28:00 -
[79]
 Just in to say hi and good job on the collision stuff ,we need more collisions dammit
|

Paddlefoot Aeon
Neogen Industries
|
Posted - 2006.09.01 14:03:00 -
[80]
Edited by: Paddlefoot Aeon on 01/09/2006 14:03:36 I'm not a programmer, so you will have to excuse my conceptual proposition.... there are probably 100s of good reasons this won't work, so don't flame too hard!
Most of the issues around Dragon have been related to getting the new code to work on the old code's data-tables, importing data that was saved using the old code. CHINA's release didn't have this problem, as there were no "saved games" (for lack of a better word), as there were no users before DRAGON in china.
So the question begs to be asked... why not start up character #1 using the old data, and have tool that simple "saves" that character using the new Dragon code format? When I create a PowerPoint presentation, I have to take into account that some people are using Office XP, while some are using the newer Office 2003 (or whatever the 2 most recent versions of Office are). I then save the file accordingly, to make sure my formatting will be preserved. Could not the same thing be done, to ensure a clean migration to Dragon? Have SiSi start up using Old code, get all the Character stats in active memory, then simply "save" the character in the new format on Tranquility?
FLAME INCOMING *ducks* Diplomat and Recruitment Officer |

Suze'Rain
|
Posted - 2006.09.01 15:03:00 -
[81]
I love these sort of blogs. tiny little design blips that cause weird results, and how they need massive work to fix. The coding is well beyond my mathamatically incompetent brain, but the work is intriguing.
as a matter of interest, if it's not under an NDA... how many lines of code is eve, for client and server? I'd love to read a "random weird facts" blog someday covering all those sort of stats. :)
|

Vladimir G'orkin
Minmatar
|
Posted - 2006.09.01 15:21:00 -
[82]
Originally by: Skogen Gump
Originally by: Cheyenne Shadowborn
www.google.com
Short for Globally Unique Identifier, a unique 128-bit number that is produced by the Windows OS or by some Windows applications to identify a particular component, application, file, database entry, and/or user.
Ouch. Writing portable code anyone? 
Firstly, EVE is only written for Windows - what's wrong with it using features for windows. Secondly, the UUID (of which GUID is just an implementation) is a standard by the OSF (Open Software Foundation) or as they're now known, The Open group. You know, the folks behind things like POSIX,LDAP and X Windows. 
Given the fact that CCP wants to keep Eve going for several more years it would be wise to keep the code as independent of the OS as possible. In the unlikely event that Windows ceases to be the OS of preference for CCPs customers it would be easier to port the code to another OS. I am not saying that it will happen but that unlikely events can occur. The topic of this blog is proof of that fact.
|

Jaden Haryl
|
Posted - 2006.09.01 16:03:00 -
[83]
Edited by: Jaden Haryl on 01/09/2006 16:03:40
Originally by: Skogen Gump
It's all down to what you're doing, if you're storying lots of string like values then using a row attribute architechture works and its simply enough to implement, of course if I was going for the national/international sized database for storing genetic's data or something with billions and billions of rows, you'd use something more appropriate and you wouldn't be using SQL Server 
Why don't we use ints as Ids ? well; the same reason that CCP did and now get collisions.
If you're only generating Guids on the same machine, you can almost guarantee that they will never ever collide.
Well I would argue that CREATE Table [ObjectName] ( [ObjectID] int PRIMARY KEY, [Atrribute1Name] varchar([Attribute1Size]), [Atrribute2Name] varchar([Attribute2Size]) ) inesrt into [ObjectName] ([ObjectID],[Atrribute1Name],[Atrribute2Name]) values (@GUID,@Attrib1,@Atrrib2)
select * from [ObjectName] where [ObjectID]=@ID etc.
Is not much harder than inserting those attribute rows into that big old name,value table plus your using a relational db as it was design instead of as an untyped name,value system, as there are better systems for that. Notice how with creating tables for each type you could actually specify the type instead of it just always being a string of the same size?
Sorry to nitpick, I have just seen this design many times and people wonder why thier system doesn't perform that great or is difficult to query properly.
And it's not about collisions on GUID's I am pretty confident you would never collide, and I'm not agaisnt GUID where approriate (like when you need to replicate). But they are slower than ints for sure.
Not that this has anything to do with hashes ;)
|

Harris
|
Posted - 2006.09.01 16:21:00 -
[84]
Originally by: Irimi Nage Lots of words like "hash" and "client" and "tables" and things that link them together...
Thanks Irimi, I have read the blog and read these pages and read a lot of words posted by people talking about stuff that I couldn't follow without someone holding my hand and holding a cold compress to me forehead while I squinted at the page for a long time.
Your explanation seemed - to me at least - to make sense and actually explain the problem in a way I could follow.
Up till then I could only imagine a the database holding a BPO in a drawer while the clients borrowed BPC's of it till suddenly the clients ordered a BPC but the draw had to store two BPOs and the wrong one was picked up and so they delivered the wrong one and wrong item was made cos it was the wrong BPC that they actually sent so the client tried to fit a seige module onto a bantam and it fitted but it shouldn't have so the game went mad trying to understand why and crashed.....
I am a man of limited imagination... .
Your explanation makes sense though and I guess it's closer to the truth,
|

Skogen Gump
|
Posted - 2006.09.01 17:29:00 -
[85]
Originally by: Jaden Haryl
Well I would argue that CREATE Table [ObjectName] ( [ObjectID] int PRIMARY KEY, [Atrribute1Name] varchar([Attribute1Size]), [Atrribute2Name] varchar([Attribute2Size]) ) inesrt into [ObjectName] ([ObjectID],[Atrribute1Name],[Atrribute2Name]) values (@GUID,@Attrib1,@Atrrib2)
select * from [ObjectName] where [ObjectID]=@ID etc.
That table wouldn't work Guids are 128bit if you store it as a number so I prefer to use the type:
CREATE Table [ObjectName] ( [ObjectID] uniqueidentifier PRIMARY KEY NOT NULL DEFAULT NewID(), [Atrribute1Name] varchar([Attribute1Size]), [Atrribute2Name] varchar([Attribute2Size]) )
Originally by: Jaden Haryl
Sorry to nitpick, I have just seen this design many times and people wonder why thier system doesn't perform that great or is difficult to query properly.
Well a combination of this system to store many string types of data, and a Hybrid nested set to store tree structure and we have no problems with data retrieval or performance, it's a simple inner join.
Originally by: Jaden Haryl And it's not about collisions on GUID's I am pretty confident you would never collide, and I'm not agaisnt GUID where approriate (like when you need to replicate). But they are slower than ints for sure.
Agreed; that wasn't the original point :) I simply meant to provide a good example of using Guids that showed massive generation without collision.
As you say though, this is going off topic; perhaps you could pick it up in the out-of-pod experience ?
|

Death Kill
Caldari direkte
|
Posted - 2006.09.01 17:37:00 -
[86]
Originally by: Norris Neophitus Edited by: Norris Neophitus on 01/09/2006 10:43:12 Hmmm so CCP take the reactive approach rather than the Proactive approach to their code.
Ahhhh screw it i can't be arsed with this things like this will happen and we can't do anything to help with or point out potential problems.
Honestly im surprised all this talk of hash and hash tables only got one drug reference and loads of maths ones.
/me needs drugs just to deal with the maths being talked about.

Recruitment |

Weps
Caldari Provisions
|
Posted - 2006.09.01 17:53:00 -
[87]
Originally by: porkbelly Edited by: porkbelly on 01/09/2006 09:51:54 Edited by: porkbelly on 01/09/2006 09:51:02
Originally by: Jean Perrau Are you saying that your hash table code did not handle collisions ? Wow 
Indeed. We have a good excuse though: At the time this was written, the set of resources was largely static. Also, we were only looking up files that we knew were already in there. Thus, any collisions would be noticed and acted on at build time. The overhead of storing the actual filenames wasn't needed.
What has changed is that not only do we have more resources, we are also querying for possibly non-existant resource files (to see if localized versions exist), and this can create hash collisions at run time.
Anyway, it's one of those things that happens when you write code with certain assumptions, and then go ahead and violate them.
That is not a good excuse but the number one reason for bugs. 
|

Felysta Sandorn
Caldari Murder of Crows Blood of the Innocents
|
Posted - 2006.09.01 18:34:00 -
[88]
Wait a second there... Recoding the whole base of which EVE is built on... Surely that'll take weeks to figure out with a lot of tricky porting? Or do you have a lovely OO system behind it? Also, won't implementing a patch that changes all of the items mean you have to reset the database? As in cancel all market orders and finish all skills?
(I know I can't get my point across very well... You'd be suprised that I'm in my third year studying a degree in computing, involving hash tables and probabilities! Honest!)
|

Jaden Haryl
|
Posted - 2006.09.01 21:18:00 -
[89]
Originally by: Skogen Gump
That table wouldn't work Guids are 128bit if you store it as a number so I prefer to use the type:
CREATE Table [ObjectName] ( [ObjectID] uniqueidentifier PRIMARY KEY NOT NULL DEFAULT NewID(), [Atrribute1Name] varchar([Attribute1Size]), [Atrribute2Name] varchar([Attribute2Size]) )
Originally by: Jaden Haryl
Sorry to nitpick, I have just seen this design many times and people wonder why thier system doesn't perform that great or is difficult to query properly.
Well a combination of this system to store many string types of data, and a Hybrid nested set to store tree structure and we have no problems with data retrieval or performance, it's a simple inner join.
Originally by: Jaden Haryl And it's not about collisions on GUID's I am pretty confident you would never collide, and I'm not agaisnt GUID where approriate (like when you need to replicate). But they are slower than ints for sure.
Agreed; that wasn't the original point :) I simply meant to provide a good example of using Guids that showed massive generation without collision.
As you say though, this is going off topic; perhaps you could pick it up in the out-of-pod experience ?
Yeah I didn't mean using GUID in the example even though I put @GUID as a variable name, was typing too fast and yes your would be if you did wanna use uniqueidenitfier ;).
I hate to think how many bytes are waisted for every "column" in your name-value table vs. a table driven design, what is it 128bits per column identifier(prob be less bytes just to use a the column name instead), plus un-typed string data plus 21 bytes row overhead minimum, so it takes 48bytes to store 10 ascii chars say in your system (58 if your doing nvchar). Now multiply by 1.8 million, thats a lot of extra disk i/o just to read and store the column values, but I'm sure it works for you.
Prob right this discussion should be taken elseware :)
|

Qutsemnie
Caldari Deep Core Mining Inc.
|
Posted - 2006.09.01 22:35:00 -
[90]
Edited by: Qutsemnie on 01/09/2006 22:44:27 Hashing should assume the hash generated isnt unique and plan for collisions in reliable fashion. The object isnt to not get a collision... the object is to not get them alot. Stroustrup says alot of time and money is spent thinking up good hashing functions that dont have alot of collisions, but im surprised you guys planned around not having any collisions... thats no where in any book I read. Ok I read your explanation. LOL cant say i havent made assumptions but we can file it under code that breaks in the future. =)
|

Plutoinum
German Cyberdome Corp Veritas Immortalis
|
Posted - 2006.09.02 02:09:00 -
[91]
Edited by: Plutoinum on 02/09/2006 02:14:16
Originally by: Qutsemnie
Hashing should assume the hash generated isnt unique and plan for collisions in reliable fashion.
If they just want a unique key at dev time and don't need to generate the hash-value later on the client once again, then my first idea to solve it now would be to either use a common database to check for duplicates or just use a common key generator on a single server that all devs use. Even better a combination of both that allows your machine to check out a range of unused keys for your work. Later your machine could connect again and update the server with the info, which keys are used and which are still free. Over 4 bil unique file keys should be enough for EVE's lifetime with that method. This would also allow you to check-in all used keys so far, so that the new generator knows it and doesn't assign them again and you could just go on with the 32-bit key like it is now without the need to make big changes. Well, maybe there are other reasons, why that isn't an option, but speculating is fun. 
|

Jenna Malone
Caldari W-hat LLC
|
Posted - 2006.09.02 14:51:00 -
[92]
Edited by: Jenna Malone on 02/09/2006 14:52:25
Originally by: Jaden Haryl Well SQL server hardly cares whether its a string or not when it indexes, but the size of the key does matter, and 128bit GUID is 4 times the size of a 32bit int, and this definetly affects performance. Also for a table with a high volume of inserts sequential ints will vastly out perform GUIDs as pages do not have to split. There are some white papers out there going into great depth about the performance implications of GUIDs for PKs, heck check out comp.databases.theory and can see it beaten to death, but I wouldnt go there telling them you store attributes in rows, they will tear you apart, as it is such a common elementary mistake to use a relational DB this way.
BTW any reason you guys are using a GUID on a table with only 1.8 mill rows, a 32bit int should be more than enough, or go big int, either way better performance. Maybe you guys are replicating? Then again you should just be creating new tables with different rows for different object types and not storing all objects attributes in one big table, but I digress.
GUIDs only affect performance when inserting. However there's no performance difference in retrieving rows by GUID or integers (well, there might be a very negligable one, on a long run, but it's small ass tiny), unless you forgot to put an index on it.
|

Skogen Gump
|
Posted - 2006.09.02 18:42:00 -
[93]
Originally by: Jenna Malone
GUIDs only affect performance when inserting. However there's no performance difference in retrieving rows by GUID or integers (well, there might be a very negligable one, on a long run, but it's small ass tiny), unless you forgot to put an index on it.
That's my experience too; the only problem really is that it's harder to type a guid into Query Analyser 
|

bovine syndrome
Amarr Intergalatic Blue Navy
|
Posted - 2006.09.03 07:17:00 -
[94]
surely it would be be,
math: 4,294,967,295 man:0
?
|

Miss Overlord
Gallente Garoun Investment Bank
|
Posted - 2006.09.03 09:17:00 -
[95]
Originally by: Skogen Gump
Originally by: Jenna Malone
GUIDs only affect performance when inserting. However there's no performance difference in retrieving rows by GUID or integers (well, there might be a very negligable one, on a long run, but it's small ass tiny), unless you forgot to put an index on it.
That's my experience too; the only problem really is that it's harder to type a guid into Query Analyser 
any other alternatives available then ?
|

Jaden Haryl
|
Posted - 2006.09.03 17:19:00 -
[96]
Edited by: Jaden Haryl on 03/09/2006 17:19:08
Originally by: Jenna Malone Edited by: Jenna Malone on 02/09/2006 14:52:25 GUIDs only affect performance when inserting. However there's no performance difference in retrieving rows by GUID or integers (well, there might be a very negligable one, on a long run, but it's small ass tiny), unless you forgot to put an index on it.
You are over simplifing by a long shot, yes inserts are where you will see the most performance reduction from using random GUID's, but as a consequence of all that page splitting in either the table or the index (clustered or non-clustered) you will see MUCH more index fragmentation which if not controlled through index rebuilds will degrade performance far in excess of the base line reduction in speed that will exist simply because the key itself is 4 times the size of an int.
Sure you can rebuild indexes more often, but that process in itself leads to degraded performance or downtime to complete, and if your DB is 24/7 you may not have a good "downtime".
So again whats the point of using a GUID if you don't have to, all your buying yourself is less performance for the same result? If you need it GUID are great though, don't get me wrong, I use them all the time in specific situations where they are necessary and therefore the costs have to be accepted.
Maybe it makes little difference in the size of the database you deal with, but I can tell you a good DBA will know there are significant implications.
|

elFarto
New Order Industries
|
Posted - 2006.09.04 19:10:00 -
[97]
I've got some resource (.stuff) loading code that doesn't have hash collisions, I can sell it to you for a reasonable price 
Regards elFarto
NPC database Mal: This is your captain speaking. We're having some technical difficulties, so we may experience some turbulence, and then explode. |

Admai Sket
|
Posted - 2006.09.05 09:34:00 -
[98]
Originally by: Linerra Tedora Edited by: Linerra Tedora on 31/08/2006 17:53:59 Ahh yes.. the allmighty david...
Linkage
edit : no smack intended.. just to have a quiet fun time... after all, i have many german friends... 
WTF?
|

Kweel Nakashyn
Minmatar Aeden
|
Posted - 2006.09.07 19:00:00 -
[99]
Edited by: Kweel Nakashyn on 07/09/2006 19:05:42 Edited by: Kweel Nakashyn on 07/09/2006 19:04:50
Originally by: Crunch Hardiron
Originally by: Rahn Sohwant I think 32 bits ought to be enough for anybody.
Is it sad that not only did I get that, I laughed?
Yes... @work we use 80 bits and we still get some naughty things sometimes :)
Edit : mmmm... k 
|

Kweel Nakashyn
Minmatar Aeden
|
Posted - 2006.09.07 19:02:00 -
[100]
Edited by: Kweel Nakashyn on 07/09/2006 19:04:35 Edited by: Kweel Nakashyn on 07/09/2006 19:02:59
Originally by: Miss Overlord
Originally by: Skogen Gump
Originally by: Jenna Malone
GUIDs only affect performance when inserting. However there's no performance difference in retrieving rows by GUID or integers (well, there might be a very negligable one, on a long run, but it's small ass tiny), unless you forgot to put an index on it.
That's my experience too; the only problem really is that it's harder to type a guid into Query Analyser 
any other alternatives available then ?
Do a damn select before any insert. If it gives a row, reinsert somewhere else. Perf = perf / 2 (or in case of very bad luck 3), and Data Security = Data Security x the 32 bit key :).
|
| |
|
| Pages: 1 2 3 4 :: [one page] |