Pages: [1] :: one page |
|
Author |
Thread Statistics | Show CCP posts - 2 post(s) |
Riethe
Fine Goods for Fine Gentlemen
|
Posted - 2008.05.05 04:22:00 -
[1]
Just a quick question:
Does the refID ever reset, or is it unique and constantly incrementing?
For some reason I recall seeing a thread that mentioned some ID resetting and I'm hoping it had to do with something else or that I simply misinterpreted it.
Thanks.
|
Ausser
|
Posted - 2008.05.05 10:17:00 -
[2]
Originally by: Riethe
Does the refID ever reset, or is it unique and constantly incrementing?
Good question. I'm clueless too.
The documentation on http://myeve.eve-online.com/api/doc/data-account.asp does not tell us details about.
If refID is transmitted in signed 32 bit integer, then it should wrap arround when reaching 0x7fffffff. This would happen if the real ID is larger than 32 bits and is just truncated to 32 bits on transmission. On the long term, we would get the same refID more than one, but it would not change for the same log entry between queries.
Unfortunateley, in the api doc we find the following:
Quote:
Walking is performed by specifying the beforeRefID parameter as the lowest refID you have seen so far. You will then get another page with entries. This process can be repeated until you have gone back a week. The server will return an error if you attempt to go back farther.
That's a problem. If refID is a modulo truncation of some other, larger ID, then it would wrap arround. If it does, then the lowest refID in one data pull is not neccecary the refID of the oldest entry. Table walking like described in the docs would not work, except you take the refID of the last entry instead of the lowest refID in the result.
On the other side, if the refID gets reset sometimes, the server could give us results where the refID is sequential for all queries of one complete data pull. In this case, we would see the same entry with different refID's, each time the server resets the counters.
Which exact mechanism is used for key fields like refID in WalletJournal, WalletTransactions, MarketOrders, IndustryJobs?
|
Ambo
Dirty Deeds Corp. Axiom Empire
|
Posted - 2008.05.05 13:00:00 -
[3]
Yes it does change. Every now and then they do some DB maintenance and the IDs all get moved around to make room.
If you give my 900mil back I'll even link the post for you.
--------------------------------------
Trader? Investor? Just want to track your finances? Check out EMMA |
Matthew
Caldari BloodStar Technologies
|
Posted - 2008.05.05 18:17:00 -
[4]
RefID is a 32-bit integer. However, there have been more than 2 billion journal entries in the history of eve, which poses a (hopefully) obvious problem. Making RefID into a bigint to accommodate >2 billion unique values would carry with it a significant performance hit. So instead what they do is every so often archive off the journal table, and repopulate a fresh table (and thus a fresh refID starting again from the beginning) with the last month or so of entries (it looks like they left between 3 and 4 months last time).
What this effectively means is that RefID is not globally unique, but it is unique within any single generation of RefID. The process of moving between generations also keeps the journal entries in the same RefID order, so all you need to know is the offset between the old and new RefID in order to translate between the two.
There are two approaches you can then use. Either you use a bigint in your own database to store the "true" RefID (which you calculate from the given RefID and the cumulative offsets of all generation changes to date), or you store the generation number alongside the RefID returned from the API, and have a more complex procedure for checking for duplicates.
Since the release of the API, this generation change has only occurred once, in mid-October 2007. The offset in that case was 1085796677. This appears to have been triggered by the RefID exceeding the 1.5 billion mark. Based on my personal transactions to today (I have enough to give a fairly good picture over time), we are currently at approx the 1.22 billion mark. Judging from the figures since December, RefID is growing at approx 100mill per month. Assuming that rate remains constant (and it has been fairly stable over that period), and they keep the same trigger point, it will be another 3 months before the next generation change. I would expect the extended DT to perform that operation to occur early August.
As an aside, taking figures from before December clearly shows the difference that batching up NPC bounty payments has had. Prior to that change, RefID had been growing at 150mill per month, compared to 100mill per month after the change.
Incidentally, I would have expected it to be around the beginning of March when people who hadn't allowed for this could have begun seeing problems. Because the API was enabled mid-generation, it wasn't until then that the Gen2 RefID's entered the range of values that would have also appeared in Gen1. Though the probability of ID collisions is going to be very small for individuals, so many may well get away without allowing for this for some time yet. ------- There is no magic Wand of Fixing, and it is not powered by forum whines. |
Ausser
Cybertech Industrials Agency
|
Posted - 2008.05.05 18:32:00 -
[5]
Originally by: Ambo Yes it does change. Every now and then they do some DB maintenance and the IDs all get moved around to make room.
Hmm... then i've to change my code, before it will mess up my database. The refID is pretty useless when collecting data over time.
It will be a pain to find a safe algorythm to distinguish between records allready in database and new ones, without using refID together with the date field...
It would be nice if ccp could slightly extend these query requests/results to make them more handy.
Some simple features would allready help:
- An 'afterDate=' and 'afterRefID=' argument to control which data the user is interested in. This would also reduce load on server. In case of 'afterDate=' the server should ensure not to return data up to the current minute, to prevent spliting of data that belongs into the same minute.
- An element called '<more/>' in the result, indicating there is more data. Combined with 'beforeRefID', 'afterDate' and 'afterRefID' attributes to indicate where to start next request for more data. This would also reduce load on the server, because the last request, which is doomed to fail, just to indicate end of data, isnt neccecary any more.
These should be simple to implement, without to break compatibility with existing aplications.
I hope a nice dev reads this and finds 30 minutes of time to implement it
|
Ausser
Cybertech Industrials Agency
|
Posted - 2008.05.05 19:04:00 -
[6]
@Matthew:
Missed your post while writing mine.
Thanks a lot for the hint with the offset.
|
Matthew
Caldari BloodStar Technologies
|
Posted - 2008.05.05 19:08:00 -
[7]
Originally by: Ausser Which exact mechanism is used for key fields like refID in WalletJournal, WalletTransactions, MarketOrders, IndustryJobs?
I would imagine the same approach would be taken with the others as with WalletJournal (which I've described above), there's no reason to do it any differently.
Of course, WalletTransactions, MarketOrders and IndustryJobs are going to be growing at a much slower rate than WalletJournal.
I only have historical data for WalletTransactions, which shows that the transaction ID is increasing at approx 25 million a month and is currently at 628 million. Hence I wouldn't expect a generation change to be necessary for another 3 years, assuming no growth in transactions.
While I don't have historical data for the other two, we can argue that those will also be growing at a much slower rate simply because all of these generate journal entries (through fees, taxes, payments etc), so the combined growth of all of these should be less than the growth in the journal (as other things, like bounties, also go into the journal).
I wouldn't expect to hit generation trouble with the other tables anytime soon.
Originally by: Ausser Hmm... then i've to change my code, before it will mess up my database. The refID is pretty useless when collecting data over time.
It will be a pain to find a safe algorythm to distinguish between records allready in database and new ones, without using refID together with the date field...
If you haven't already implemented something, and you have data in your table for Oct-07 or before, then unpicking the transitional period is going to be the main challenge.
For this, you need to take advantage of the step-change in RefID values, and the 1-week window offered by the API. While RefID's have now looped round to values that were previously used, the transaction dates are going to be so different that you should be able to easily flag which are from which generation.
You'll need a different approach around the point of transition though (I'd suggest applying this to transactions dated Sep-07 to Nov-07. If you just imported regularly at the time, relying on RefID being unique, then you will definitely have duplicates in there. For this time period, you should be able to separate the two generations based on the RefID - anything above 1 billion will be Gen1, anything below 1 billion will be Gen2.
Once you have every existing record tagged as either Gen1 or Gen2, you can then use the known offset between them to check for duplicates. If your RDBMS has bigint capability, and you can afford the performance hit, then simply add a TrueRefID field. For the Gen1 entries, TrueRefID=RefID. For the Gen2 entries, TrueRefID=RefID+1085796677. You can then check for and eliminate duplicate values of TrueRefID. When the next generation change comes along, you just add that generation difference value on to the new imports as well.
If your RDBMS can't do bigint, then it gets slightly trickier. In that situation, you need to subtract 1085796677 from all the Gen1 ID's, to convert them into Gen2 ID's (adding to Gen2 would overflow Int, subtracting from Gen1 keeps you within the limits of Int). You then check for duplicates between these converted ID's and the true Gen2 ID's. You'd then need to roll this process forward for each new generation.
Obviously, going forward it is desirable to identify the generation of the import at the point of import, rather than having to identify it retroactively. The presence of a generation change is fairly easy to detect by comparing the highest current-generation RefID with the highest one in the current download. You then just need a process to alert you to the problem, and reject imports until you've given it the new generation offset to use. ------- There is no magic Wand of Fixing, and it is not powered by forum whines. |
Matthew
Caldari BloodStar Technologies
|
Posted - 2008.05.05 19:27:00 -
[8]
Originally by: Matthew You'll need a different approach around the point of transition though (I'd suggest applying this to transactions dated Sep-07 to Nov-07.
Actually, there's an easy way to determine the exact time range you need to do this for. What you need to do is find the maximum and minimum RefID seen for each day based on transaction date. If all entries on that day are from the same generation, then the difference should be minimal - I'd expect to see differences in the 2-6 million range. In contrast, if you have data for that day from two different generations, the difference should be in the order of 1 billion - easily distinguishable!
Everything before the first day of large differences will be Gen1, everything after the last day of big differences will be Gen2, and you split the rest based on the over/under 1 billion rule as before. ------- There is no magic Wand of Fixing, and it is not powered by forum whines. |
Riethe
Fine Goods for Fine Gentlemen
|
Posted - 2008.05.06 11:11:00 -
[9]
Thanks for the interesting contributions everyone.
|
Ausser
Cybertech Industrials Agency
|
Posted - 2008.05.06 22:56:00 -
[10]
I've learned refID is not unique, trueRefID isn't too.
It is unique within an account of char/corp.
For example:
If you transfer money via player donation, there will be one journal entry with refTypeID=10 (Player Donation) on the accounts of both participating parties. Both entries will share the same refID (and also refTypeID) because they are related to each other.
Some other ref types that show up with more than one journal entry are:
33 - Agent Mission Reward - one entry for character and one for his corp (tax) 85 - Bounty Prices - again character+corp, because of tax like in Agent Mission Reward 1 - Player Trading - both characters get one entry with the same refID 37 - Corporation Account Withdrawal - character and corp get it this time ...
So you cannot use trueRefID directly as primary key. But you can use it as part of it, if you also include accountID (note: this is not the same like accountKey).
accountID is used to identify directly the corporationID/characterID and accountKey. It can be found in the AccountBalance request result.
It helps reducing wallet journal table size at the cost of one additional look up to resolve charID+accountKey/corpID+accountKey <-> accountID.
I'm not sure if accountID in /Corp/AccountBalance and /Char/AccountBalance both share the same number space, but i suppose it is so. Maybe a dev could answer this. Don't pod me if i'm wrong, i just have a few (just two^^) full api keys to test with. |
|
Ambo
Dirty Deeds Corp. Axiom Empire
|
Posted - 2008.07.22 16:23:00 -
[11]
Mild bit of necro here but I've been palying around with building an almost-fully automated solution to this.
My conclusion is that it's simply not possible. Sure, you can get somthing that works most of the time and with some optimisations it does not have to be much slower than just reading in the data normally.
However, there are cases where I simply can't see how to work out which generation you're dealing with and you certainly can't work out the offset for a new generation.
My question is: Do I need to be putting more time into this to try and come up with somthing better or are CCP working on a solution that is just around the corner and remove the need for all this messing around? --------------------------------------
Trader? Investor? Just want to track your finances? Check out EMMA |
|
CCP Lingorm
C C P
|
Posted - 2008.07.22 17:19:00 -
[12]
I went for a different approach when I started recording information.
I worked out a unique key from the other fields in the data and then generated my own i field in my table.
By placing the unique constraint on the capturing table it prevents me from trying to insert duplicate data.
I will check at home what the unique combination of fields I used was.
CCP Lingorm CCP Quality Assurance QA Engineering Team Leader
Originally by: Lord Fitz Eve is to WoW as Wow is to an 8 player game of Unreal Tournament.
|
|
Ambo
Dirty Deeds Corp. Axiom Empire
|
Posted - 2008.07.22 18:21:00 -
[13]
Ah, I see. That's coming at it from a totally different angle..
Would certainly be interested to see what you came up with. --------------------------------------
Trader? Investor? Just want to track your finances? Check out EMMA |
Dragonaire
|
Posted - 2008.07.24 06:16:00 -
[14]
Ok here my 2 isk worth on this with some math. I'll round off numbers here to make this easier to follow.
refID range 2^31 or 2.1e9 refType range 2^5 or 32 (range is bigger but not all types used or very common) charID/corpID range 2^31 or 2.1e9 ownerID2 same as charID
Just multiplying first two together gets you a 1 in 6.9e10 chance of having duplicate entry. Lets compare that to something in RL. DNA for court cases only has to be better than 6.0e9 except if u have a twin when it can make DNA useless
Now let's multiply that by ownerID2 to get 1 in 1.5e20. Lets compare that to something in RL. It's estimated there are 10e20 stars in the universe so that like chance of drawing two consecutive numbers from a lottery of all those stars
If I was you I'd take my chances of overwriting a record myself and get on with my programming instead of worrying about having a 'perfect' system
|
Ambo
Dirty Deeds Corp. Axiom Empire
|
Posted - 2008.07.24 07:59:00 -
[15]
Originally by: Dragonaire If I was you I'd take my chances of overwriting a record myself and get on with my programming instead of worrying about having a 'perfect' system
Yeah... I'm too anal to go for that.
Particularly because the system I'm writting could be deployed either as a standalone app or online. If it was online it could have journal entries from 100's or 1000's of different chars and corps.
I have come up with a very solid system for working out the 'true' ref ID though. It's fully automated (except for when a new generation appears, still no totally fool proof way I can think of to work out the offset. I'm working on that as well though...) I'm storing each journal entry with a senderID and recieverID (and associated data) rather than the way it's presented in the XML. This means that selling items to yourself, etc is not a problem.
The only one that is a problem is when the same journal ID is used for a bounty/mission reward payed to a player and the tax on that payed to the corp. Why it has the same ID baffles me but it screwes up my system! On this one I decided to just take the easy way out and include 'recieverID' in my primary key. Not ideal but it works.
--------------------------------------
Trader? Investor? Just want to track your finances? Check out EMMA |
Dragonaire
|
Posted - 2008.07.24 08:21:00 -
[16]
As I was trying to pointing out above just use combo of ownerID1, ownerID2, and refID as primary key in say the journal is really more than enough if you think about it. What's the chance of the same char doing something with same other char years apart and getting the same refID too. Even if you have 1000s of entries every second the universe will have end before you are likely to get a duplicate record. There's a higher chance of the database hard drive dying when you don't have a good backup and losing all your records then having a dup.
|
Dragonaire
|
Posted - 2008.07.24 08:35:00 -
[17]
Just had another idea on how to approach it too. Tables that use refID or other fields like it that recycle over time also have date/time field that could be used with it to tell them apart easily. DBs have no problems using composite primary keys until you start dealing with very high volume per second type of traffic as long as all the parts are fixed length.
|
|
CCP Lingorm
C C P
|
Posted - 2008.07.24 09:35:00 -
[18]
OK, my unique key was
refDateTime OwnerID1 OwnerID2 walletKey argID1
then added a autonumber field in my table for teh Primary key.
CCP Lingorm CCP Quality Assurance QA Engineering Team Leader
Originally by: Lord Fitz Eve is to WoW as Wow is to an 8 player game of Unreal Tournament.
|
|
|
|
|
Pages: [1] :: one page |
First page | Previous page | Next page | Last page |