Pages: 1 [2] 3 :: one page |
|
Author |
Thread Statistics | Show CCP posts - 10 post(s) |
Alx Warlord
SUPERNOVA SOCIETY Tribal Conclave
104
|
Posted - 2012.05.03 21:46:00 -
[31] - Quote
Yammy database !!! uhmmnnn tasty!!!
* oh it is not yammy it is yaml... D : |
|
CCP Redundancy
C C P C C P Alliance
30
|
Posted - 2012.05.03 21:51:00 -
[32] - Quote
Packtu'sa wrote:Cheers. To clarify, does CCP have any plans to add structures which can't easily be represented in a database? (I'm having difficulty imagining what these might be, but YAML can do a lot.) If there are any performance issues with YAML in third-party applications, I'm sure someone over at the Technology Lab will come up with a more useful package. (Something similar to the binary format that CCP Redundancy mentioned?) [EDIT] CCP Redundancy wrote:I figure I'll just answer some questions in an incomprehensible techy way. This, please, more of this! I've recently come back to EVE after playing some other in-development games, and it's refreshing to once again chat with devs who respect the player base and are themselves respectable.
I don't recommend YAML for anything where you worry about performance. Check out MongoDB (BSON) or MessagePack as a starting point (also NoSQL in general is an interesting thing to play with if you've been all-relational, but I won't pretend that it's a good solution to everything). If you need to use YAML, make sure you're using a native parser at least (pyYAML + libYAML, for example).
We'll be sticking to lists and dicts and nested objects (pretty much JSON), and mainly focusing on working out how to convert our existing datasets (that are already in the DB) to this sort of thing without screwing things up for everyone at CCP. Python is very handy at working with this sort of data, so I personally recommend that for transforming it to whatever format you prefer.
This sort of structure: { 1: ['a', 'cat'], 2:['two','dogs'] } is a pain in the ass in a DB... do-able, but I don't want to insist that people build a relational version unless they need to.
|
|
|
CCP Redundancy
C C P C C P Alliance
30
|
Posted - 2012.05.03 21:57:00 -
[33] - Quote
James Bryant wrote:VS2010 Premium actually has some good SQL versioning capabilities when used with Team Foundation Server, but I have to say that I'm intrigued by CCP's approach here. Kinda the best of both worlds, in a certain sense (for you guys, a bit less so for us). A bit unwieldy, for multiple join type queries, but that stuff can be handled in code instead of in the database, I suppose.
We evaluated that technology, but determined that it probably wasn't going to fit our needs.
There are a few ways to handle multiple joins in a NoSQL-y way: you can separate the data out into another document collection and do the lookup (like MongoDB document links) and you can also duplicate and pre-embed the data. That sounds wasteful, but if you're sensible about it, it's no way near as bad as the memory overhead that python has (~10MB of data in terms of pure integer/float etc memory can easily blow up to 90MB, which starts to add up if you pickle and unpickle large data structures). We use schemas to omit type and attribute name information (like all of those "graphicID" strings in the raw data), which can be a big factor in more permissive structured data representations. [Side note - this is a big reason why we have typically seen a rise in the memory of the character selection screen each expansion as we add more/new data ]
Keep in mind that this stuff is heavily built towards static data that's immutable at runtime (do you know how difficult it is to find a key-value storage system library that's built for that particular requirement?). We can build all sorts of indices however we want - planets could be embedded inside of a solar system document, but we could still make efficient indices for looking up planets by ID within that. We can also load the data from disk in a cache-friendly manner if needed.
So in general, when dealing with static data, pre-bake your joins - funnily, we tend to already do this in performance critical databases by denormalizing data (only denormalized relational databases can't do that for lists or parent-child relations).
At least, that's the theory... |
|
|
CCP Nobody
Royal Amarr Institute Amarr Empire
1
|
Posted - 2012.05.03 22:09:00 -
[34] - Quote
..*slow clap*... |
|
Zaotome
Schweine im Weltall.
1
|
Posted - 2012.05.03 22:58:00 -
[35] - Quote
slow clap? clap! clapclapclap! |
Packtu'sa
Nabaal Construction and Industrials Corp Nabaal Syndicate
1
|
Posted - 2012.05.04 00:59:00 -
[36] - Quote
Alright, you've convinced me. A unit of Spirits to you! |
James Bryant
Deep Core Mining Inc. Caldari State
8
|
Posted - 2012.05.04 01:37:00 -
[37] - Quote
CCP Redundancy wrote:There are a few ways to handle multiple joins in a NoSQL-y way: you can separate the data out into another document collection and do the lookup (like MongoDB document links) and you can also duplicate and pre-embed the data. The problem is, in YAML, that's the slowest part, and with a binary format like MessagePack, how the heck are you getting at a specific piece of data you want without unpacking the whole thing? If something as large as invTypes needs to be parsed (or unpacked), that's an awfully large piece of memory (and slow code). For the PHP and other web folks who have to load data for every page, that gets pretty nasty. I suppose that's probably just not the right tool for that particular job, but I'm just trying to flesh out the options.
I can see MongoDB or another No-SQL style as perhaps the weapon of choice for the web guys for that reason if the data ever gets to the point of being outside the realm of what can be handled by a traditional relational format.
I haven't read completely through the MessagePack docs (which are pretty bare), but I'm not seeing random access capability. It is entirely possible I'm completely missing something though.
There's an additional problem I see for the 3rd party folks, and that's for non-dynamically-typed languages. The YAML (or JSON, or BSON) can have any number of arbitrary data structures. I suppose, like for Java or C#, you could maybe just use a Hashmap.
Quote:So in general, when dealing with static data, pre-bake your joins - funnily, we tend to already do this in performance critical databases by denormalizing data (only denormalized relational databases can't do that for lists or parent-child relations). True, and I tend to do the same, trying to denormalize when it makes performance sense, such as adding some information from invTypes to various other API pulls, like assets and the wallet transactions, to avoid having to join them every time. |
SkillQueueMonitor
Pator Tech School Minmatar Republic
2
|
Posted - 2012.05.04 02:13:00 -
[38] - Quote
Bout time. That denormalized table inside SQL made my soul hurt.
AND
I never have to install MSSQL ever again. |
Lairel Dallocort
Dreddit Test Alliance Please Ignore
4
|
Posted - 2012.05.04 03:00:00 -
[39] - Quote
As a Linux user who has no access to an MSSQL server, this makes me super happy! |
Jinli mei
Dreddit Test Alliance Please Ignore
109
|
Posted - 2012.05.04 05:31:00 -
[40] - Quote
James Bryant wrote: For the PHP and other web folks who have to load data for every page, that gets pretty nasty. I suppose that's probably just not the right tool for that particular job, but I'm just trying to flesh out the options.
With web-based stuff you can cache it either using a nosql approach like mongo, or something sane people use like memcached. If you think about it hard enough, you realize that most data you're pulling from CCP should likely be in a cached state rather than pinging the database for it or parsing it anyway.
|
|
Khir
Het Kruidvat
3
|
Posted - 2012.05.04 05:53:00 -
[41] - Quote
RavenDB is a pretty nice no-sql database for the .net platform that is free if your project is open source. I was already thinking about trying that as my backend store with denormalized data migrated from MSSQL.
I had no problem whatsoever with MSSQL, but I can appreciate the new setup will be better for those that don't develop on Microsoft platforms.
Any chance you guys want to share what you think the schema for some of the other yaml documents will be like? Even if they will not be published as yaml data at this point? |
Jack Tronic
borkedLabs
43
|
Posted - 2012.05.04 05:58:00 -
[42] - Quote
Meh, JSON is more readable and friendlier than YAML |
Real Poison
Aura of Darkness Nulli Secunda
101
|
Posted - 2012.05.04 06:13:00 -
[43] - Quote
Jack Tronic wrote:Meh, JSON is more readable and friendlier than YAML
While i love JSON for its purposes. That is plain wrong. YAML is the least cluttered and easiest format to store Array and Hashed Objects.
YAML Ain't Markup Language <- FTW! |
Matthew
BloodStar Technologies
2
|
Posted - 2012.05.04 07:51:00 -
[44] - Quote
Many thanks for the detailed explanation, sounds like it has the potential for significant benefits, which makes it easy to accept the additional work it'll need.
Though a 3rd party community project to script this back into SQL tables would be awesome (and I suspect far better than what I will otherwise cludge together on my own!). |
Risingson
20
|
Posted - 2012.05.04 08:08:00 -
[45] - Quote
even if it sounds like a state of the art move i hope there will be a mssql dump provided by ccp to have backward compatibility with existing tools. in my case doing a web for eve is a hobby not a job for a living. no mssql dump may make me quit it due to lack of time.... no crying, but panda. Eveeye.com-á- New Eden Bordcomputer Systems |
Freibuis
Legion of Lost Souls The Lego Cartel
1
|
Posted - 2012.05.04 08:18:00 -
[46] - Quote
where do I start.. CCP.. thanks for making my day and ruining my day in the same dev blog. ;) good one CCP /me looks through all my Stored procs I have made over the years. Shrugs and says.. i guess that I didnt need `em any way.
moving to a noSQL style is a great idea.. not sure about YAML tho.. never had it work properly. ended up chewing more resourse then it was worth.. but its good to see a decentralized approach in the future,
Question: These tables being removed or left in as well. if these tables are getting dropped could you save us OLD timers and give us a sql file with all the inserts for the table so we could use either YAML and or keep SQL that we have grown up.
will we have to write our own tools to put the data back into the SQL database?
|
|
CCP Nobody
Royal Amarr Institute Amarr Empire
3
|
Posted - 2012.05.04 11:36:00 -
[47] - Quote
With this data structure change we wanted to move over to a standardized way of retrieving static data. This is what YAML provides, it gives us a vast collection of parsers for the majority of programming languages and it is not tied to a particular OS (which apparently makes Lariel Dallocort super happy ).
The process we in Team Core Graphics Tools are following is that after a system is ported to the new structure, we will drop the unneeded tables. And after we have finished porting a system we can give you a look at how that systems schema will look (because we won't know before we start porting it).
Currently we do not have any plans of creating tools that put the data back into a DB. However this is just us giving you the actual data that is used within the game (although the in-game data has been optimized to pieces) and the method of storing and reading that data is totally up to you guys/girls because your needs differ. - If your application needs some sort of fast key-value lookup you could take a look at level-db - I would personally recommend mongo-db, because it is schemaless and should be easily used with the yaml data structure. |
|
Vessper
Eve Engineering Finance Eve Engineering
9
|
Posted - 2012.05.04 11:52:00 -
[48] - Quote
CCP Nobody wrote:The process we in Team Core Graphics Tools are following is that after a system is ported to the new structure, we will drop the unneeded tables. So just to confirm, future MSSQL data exports which have had some data converted to the new structure will be missing certain data tables (as these will be provided in the YAML files)? I guess I'm trying to establish if we will continue to get a full (as in, the pre-Inferno schema) SQL export until you've finished this project or we need to start working on partial conversions now. |
Thebriwan
LUX Uls Xystus
41
|
Posted - 2012.05.04 12:01:00 -
[49] - Quote
Yesterday I swallowed my comments - because they would be a bit bitter.
It seems to be more pointless now, but I do in anyway...
Thank you CCP Nobody for the deep insight in your whys and hows.
BUT:
There is still a standard in Web-hosting. It's called (X)AMP(P). That is what you get. No MongoDBs no nothing.
Yes one can get his own virtual server and do what he pleases. But like someone wrote a before me: This is just a hobby.
I can not spend eternity with setting up unknown systems (and update them every time a new zero-day-exploit is found). I can not spend the money - because it is still just an hobby.
And I would like to see the no-sqldb that calculates the gain on my sell orders for the last 5 years in a timely manner on the fly.
So. I need MySQL-Tables and I will be very thankful if someone can sill provide them.
|
Freibuis
Legion of Lost Souls The Lego Cartel
1
|
Posted - 2012.05.04 12:16:00 -
[50] - Quote
CCP Nobody wrote:With this data structure change we wanted to move over to a standardized way of retrieving static data. This is what YAML provides, it gives us a vast collection of parsers for the majority of programming languages and it is not tied to a particular OS (which apparently makes Lariel Dallocort super happy ). The process we in Team Core Graphics Tools are following is that after a system is ported to the new structure, we will drop the unneeded tables. And after we have finished porting a system we can give you a look at how that systems schema will look (because we won't know before we start porting it). Currently we do not have any plans of creating tools that put the data back into a DB. However this is just us giving you the actual data that is used within the game (although the in-game data has been optimized to pieces) and the method of storing and reading that data is totally up to you guys/girls because your needs differ. - If your application needs some sort of fast key-value lookup you could take a look at level-db - I would personally recommend mongo-db, because it is schemaless and should be easily used with the yaml data structure.
Dont get me wrong. Its great what you are doing.. But (and there is always a butt!) until all the data is in the new format. this method is going to be a pain/. Part data in one and part data in the other. us old timers will have to Re-import (or god forbid not update) the YAML data into MS-SQL so that or functions/Stored Procs/SQL goodness will still work.
there would be no point moving to a new data struture until ALL static is moved to YAML format. Also this will cause coding issues every time a new YAML port is released..
I would still release the complete SQL database whole until the 100% of the static data is released. that way we wont have to do code changes EVERY TIME.. only once.
I would rather spend a week converting Stored Proc's then spending a day here and there until 100% Statics data is converted to YAML.
I am not saying I dont want YAML... I am saying.. I would rather do it at one go then every month. most people who have code like mine will have to import back into SQL to keep stuff working until the eventual day when there is no SQL at all
|
|
Hosedna
FumbleFamily Corp
8
|
Posted - 2012.05.04 12:35:00 -
[51] - Quote
The shared hosting I pay for only have MySQL / PostgreSQL options, as most, so I guess it will become a bit tricky to do the requests for industry on YAML files. We'll lost the expression power of SQL and have to do the joints "by hand"... Unless there is something a bit in the line of xpath for YAML ? It's not as good as SQL but it could be a first step to help structuring requests... |
|
CCP Nobody
Royal Amarr Institute Amarr Empire
3
|
Posted - 2012.05.04 14:07:00 -
[52] - Quote
The plan is to drop the migrated tables from the data dump with every release. We know that this is difficult but it will add a lot of overhead to insist that while moving over to a more flexible data format, that we maintain backwards compatibility to a completely separate representation form GÇô in the end this would make us less flexible and able to take advantage of the benefits of the new format.
Unfortunately there are so many systems and so much static data in Eve that any attempt to do them all at once would be a multi-month effort that would be doomed to failure because we wouldnGÇÖt have worked through all the problems and issues while trying to apply the solution. We would also cause all feature development to stop, and break all of the tools that we use in day to day development, while likely introducing issues into every single game system. This just isnGÇÖt a practical option for us or for you.
|
|
James Bryant
Deep Core Mining Inc. Caldari State
8
|
Posted - 2012.05.04 14:29:00 -
[53] - Quote
Hosedna wrote:The shared hosting I pay for only have MySQL / PostgreSQL options, as most, so I guess it will become a bit tricky to do the requests for industry on YAML files. We'll loose the expression power of SQL and have to do the joints "by hand"... Unless there is something a bit in the line of xpath for YAML ? It's not as good as SQL but it could be a first step to help structuring requests...
That is definitely something that is going to bite quite a few folks. I happen to have a virtual server for my hosting, so not a big deal for me, but I have a feeling that I'm in the minority of Eve dev hobbyists. Still, the solutions are out there, this just might push a few people past their commitment point, unfortunately. Still, my feeling is that somebody will step up to the plate and convert all this into SQL after each release anyhow. There's no way I'd be able to do one of the more join heavy queries I do now like getting the top ten profitable market categories out of all our trades for the month, or maybe the wackiest query ever, T2 build requirements (uf!).
I'll tell you where this hurts the most, and that is in Android land, where I also develop. Many devices, especially ones still running Gingerbread or earlier, don't have much in terms of heap space, usually only 16Mb (or less on junk devices, of which there are many). That ought to be fun trying to parse/unpack/unserialize something massive like the map data or invTypes. Combine that with statically typed Java, and you have a challenge.
Still, I like a challenge. We'll see how this shakes out. |
Xander Hunt
35
|
Posted - 2012.05.04 14:32:00 -
[54] - Quote
*sigh*
I don't even know where to begin...
First...
YAML: YAML Ain't Markup Language
... come on.. really? I'm seriously, physically rolling my eyes at this.
I've very quickly just skimmed over the what the structure is about. So f'n not impressed.
Cons I see....
- First, looking at the "yaml.org" website, it looks like it was coded by a five year old with limited knowledge of anything to do with a computer, let alone design a new type of data structure. Designed in something pre-Netscape Designer. Doesn't ooze a lot of professionalism and confidence towards code base (If there is a "code base" behind specifications of a data structure) and functionality and theory behind the actual concept of the data format, really, nor does it raise any kind of confidence behind who the designers of this data format are when this has been around since 2001. (Yes it was a run-on sentence - sorta) However, I'll give credit where credit is due and note that they did use an external style sheet... which reading on down the code looks like the page was generated anyways. Makes me wonder if the site itself is read from a YAML file?
- Second, just like XML, JSON, and any other non-managed database system that doesn't rely on an index of sorts, one must read all data, or at least to the point where the data you want exists while assuming the data is sorted, from top to bottom, to get that bit (literal) of information to determine whether or not typeID 21471 is a Published object. What a waste. Don't get me wrong. Both have their place. Exchange of clear, described data to be put somewhere. I know massive XML documents float around at times, exchanging hands from one type of system to another, but that XML file isn't used as a "lookup information" source 99.995% of the time.
- The volume of data within EVE... Looking at the SQLite database conversion from Crucible, its over 200meg in size. Thats with packed data (IE: 10 character numbers in 4 bytes of data), indexing, structure definitions, page files, etc. A query to pull any data from anywhere in that file takes MILLISECONDS worth of time (Just timed it, 19ms to find out it is published). Text files? Too large to handle. I'd have to read thousands of lines to GET to that point.
- Not sure I'm too keen on the whole idea of just taking data out of the existing MS SQL backup and dropping them into text files to be re-consumed. I'd ask that all data exists in the MS SQL data and slowly roll out the new YAML files as you massage out the structure you want. Then when all tables are done, then drop MS SQL
- Some of this structure looks similar to Windows INI files.... 'Cept, headers are marked with an identifier followed by a colon instead of a [identifier] type of ordeal. I do acknowledge there are advancements in comparison to the INI format, but not much more.
Pros...
- No MSSql - Although I started off with training against MS SQL 2000 Enterprise, I've moved very far from it simply due to costs. Yes, I know its free NOW, but it wasn't like that until recently, and I've never looked back. I might have been an MSSql fanboi if there were always free versions. I'm cheap.
- Take the generalized data and put it into a proprietary structure our applications work with. MSSql, MySQL, SQLite, CSV, our own structure of data (I'm looking at you EVEMon! {wink}) or whatever we want is a GREAT bonus.
Final thoughts
Of course, all we (us?) developers are going to have to follow your lead if we're going to keep developing our tools for your game, but honestly, I've never been, never will be, a complete fan of single or multiple text files that is supposed to relay some sort of structured data. I avoid creating XML, I avoid CSV, I avoid plain text simply because repetitive reads of data slows the whole process down, ESPECIALLY when you get into thousands of lines.
With all the enhancements you ladies and gents at CCP have been putting into improving UI response times, I'm quite thrown back that you'd go to a text file to manage database worthy information, static data or not.
{30 minutes later}
... come to think of it... YAML originated in 2001 and has had pretty much NO MOVEMENT since then... and you're using it for in-house processes and implementing it as a data store in half way though 2012?!?!?! |
Katrina Bekers
Rim Collection RC Test Alliance Please Ignore
93
|
Posted - 2012.05.04 15:33:00 -
[55] - Quote
Speaking of NoSQL:
Redis.
You will never go back. EVER. << THE RABBLE BRIGADE >> |
Kouryusei
The Bitter Sea Trading Company
28
|
Posted - 2012.05.04 16:15:00 -
[56] - Quote
Following up on Katrina, go play with Couchbase (not CouchDB), it's just as sexy as redis.
In other news, **** YAML. Royally. I'll convert it to a plethora of formats since, like I said - **** YAML. |
Steve Ronuken
Fuzzwork Enterprises
392
|
Posted - 2012.05.04 16:22:00 -
[57] - Quote
As long as the data remains in a form that can be easily represented in a tabular form, I'll be backporting it, along with the mysql conversions I've been doing.
Something I would love though: A separate file that specified the keys (as some are optional) and max lengths of values. Just reduces the amount of preprocessing I'll have to do on import.
It's not a biggy though. FuzzWork Enterprises http://www.fuzzwork.co.uk/ Blueprint calculator, invention chance calculator, isk/m3 Ore chart-á and other 'useful' utilities. |
Etil DeLaFuente
New Eclipse Initiative Mercenaries
6
|
Posted - 2012.05.04 17:32:00 -
[58] - Quote
So if i understood right, more and more data will be available on the client in YAML format ?
Or, will we still have to rely on the toolkit ? |
Lan Staz
Aperture Harmonics K162
16
|
Posted - 2012.05.04 18:41:00 -
[59] - Quote
I think you are going to have a perception problem due to the choice of initial samples which are far too simple to show the advantages of structured over relational data.
Maybe showing something more complex, even if it is just an indicator of how things might look, would be a good idea. Something that currently requires several tables and lots of joins between them that would collapse down to one list of structured objects, such as ship definitions. or the map.
I'd post something here as an example except there doesn't appear to be a way to post code samples on these boards without losing the structure.
Oh, and as someone who has no access to MS SQL and works in Python anyway, yay for YAML!
|
Antihrist Pripravnik
Scorpion Road Industry
6
|
Posted - 2012.05.04 19:19:00 -
[60] - Quote
Big thanks to all devs that replied with a lot of technical stuff! I can see the future now CCP Ytterbium: Yarrblblbgrlblbgrlblblblbblbgrlblblbgrblblyarrrrdrooooooolonthekeyboardlikealunatic |
|
|
|
|
Pages: 1 [2] 3 :: one page |
First page | Previous page | Next page | Last page |