Friday, March 30, 2012

Invoice dimension - best way to model it

We have a system that will have ~15 million invoices per year. We would like to track lots of very unique attributes of each invoice (invoice number, etc). Currently we are using a junk dimension ([Dim Invoice Attribute]) that contains 4 or 5 of the attributes, but our changes will require going to the grain of the invoice table. I have read having a physical dimension table that has a 1:1 cardinality to the fact table isn't a good idea (strictly because of the size of the table and the operation of joining fact to dimension). Currently we have 2 or 3 attributes of the invoice in the fact table (degenerate dimensions) in order to fulfill relational queries for tying things back to the MDX results.

My question is what is the best way to model this? I have read both sides of the story - create an invoice dimension table, and store everything there with keys relating back to the fact table, and also store all of the attributes in the fact table and create a named query in the .dsv that selects out only the attributes for the [Dim Invoice] dimension, thus saving space and join energies during processing.

Can anyone shed some light on this?

Thank you in advance,

John Hennesey

Typically in this situation, you have a fact table that represents the invoicing process. One of the dimensions is the invoice which will most likely have a one-to-one relationship with the fact. Does this describe your model?

Kimball would describe this as a degenerate dimension and would recommend you move the dimension into the fact table. SSAS provides support for this through a fact dimension. (See Books Online topic "Dimension Relationships" for a more detailed description.)

So this works if we're talking about a fact dimension with just a few attributes. After that, the model starts to look a little sloppy. I would look at the cardinality of some of these other attributes. If the cardinality is relatively low, you may want to move these into their own dimensions or if the cardinality is very low lump them into a junk dimension.

The other thing I would consider is the user's experience. How often will users be accessing these other dimensions created to hold those attributes? If these are presented separate from the Invoice dimension, would these seem weird or be cumbersome for your user? Do you plan on providing user-hierarchies that role your invoice number up to these attributes? If not, then I'd recommend going ahead with moving them out.

From a performance standpoint, I don't think your design choice has too big an impact on SSAS. If you figure that every attribute hierarchy is given its own storage and the only thing that would be different internally would be the relationship maps. The impact of having or not having these depends on how you intend to use your attributes.

So, I guess my general recommendation would be keep to standard, Kimball principles and put an emphasis on end-user experience. If you run into a performance problems down the road, you should be able to easily tweak the model.

If others folks have thoughts on this, esp regarding how the SSAS engine handles this stuff, please, please chime in.

Thanks,
Bryan

|||

I agree with the approach of keeping the invoice attributes in the fact table.

One issue is the number of invoices(metadata) that will be downloaded to the client when they use the invoice dimension. This will be a really slow process if you show to many invoice members and the solution is to create artificial levels above the invoice number. I build the first level with the first character in the invoice number using the TSQL-left function in the data source view.. A guide line is to have no more than 10-15 groups on each level. On the leaf level you have have larger groups.

I have asked the SSAS2005 development team about any performance issues with building a fact table dimension and they have no negative impact.

HTH

Thomas Ivarsson

|||

Interesting - I have heard arguments on both sides of the argument. To implement a fact table dimension, it would be using a named query that goes against the fact table, right? Is there a special way to set up a fact table dimension (any attributes of the dimension itself?) I have also read in just about every article it is necessary to set up a hierarchy to limit the number of members returned to the client - indeed a very good idea.

Thank you all for your input,

John Hennesey

|||

Hello John! Or use named queries in the fact table for each artifical level in the invoice hierarchy/dimension.

Good luck!

/Thomas

No comments:

Post a Comment