| Chris's profileChris Webb's BI BlogBlogLists | Help |
|
|
10/9/2009 Using Dimensions as Parameters to Calculations and SetsOne of the advantages of using SSRS as a client for SSAS is the control you have over your MDX, and one of the advantages of control over your MDX is the ability to parameterise not only your queries but also any calculations defined in those queries. It would be great if we could parameterise calculations defined in the MDX Script (I have a Connect open on this, in fact – please vote!) but until we can, we can do something almost as good: we can use a dimension hierarchy to store a pre-defined range of parameter values and then use the CurrentMember on this hierarchy to pass one of these values to a calculation or dynamic named set. Here’s how. First of all, we need to create a dimension to hold these parameter values, something like a Time Utility or Shell dimension. We can create the source data for this easily in the DSV using a named query and a SQL SELECT statement as follows: SELECT 1 AS ParamID, '5%' AS ParamDesc, 1.05 AS ParamValue We can then build an SSAS dimension – I called it [Parameter Values] – with a single attribute, its KeyColumn property using the ParamID column above, its NameColumn property using the ParamDesc column, and its ValueColumn property using the ParamValue column. This means we have a simple hierarchy with four members on it. We then add the new dimension to the Adventure Works cube; it doesn’t need to have any relationship with any measure group. Now let’s use it. Imagine we want to see what the value of Internet Sales Amount would be if it grew by 5%, 10%, 15% or 20%, how would we do it? Well, what we could do is say that when the All Member on the Parameter Values is selected we see the real value of Internet Sales Amount, but when one of the other members is selected we increase the value of Internet Sales Amount by the percentage associated with the selected member. The MDX required would look like this: SCOPE([Measures].[Internet Sales Amount]); What I’m doing here is scoping on Internet Sales Amount and all of the members except the All Member on my new dimension, so that I’m only doing my calculation when a selection is made on the Parameter Values hierarchy. In this scope I’m then multiplying the value of Internet Sales Amount at the All Member with the value returned by the MemberValue function for the CurrentMember on [Parameter Values].[Parameter Values] – which is the value from the column I used in the ValueColumn property of the attribute. Even though I’m scoping on a real measure, the calculation doesn’t aggregate up to the All Member because this dimension has no relationship with the measure group that Internet Sales Amount is from (or indeed any other measure group). Here’s a query that shows the results: SELECT [Measures].[Internet Sales Amount] ON 0, And here’s the output, showing Internet Sales Amount and underneath it the value increased by 5%, 10%, 15% and 20%: So we’ve got a (sort of) parameterised calculation. We can also use a similar approach with dynamic named sets too – here’s an example dynamic named set definition that uses the same dimension to control the percentage passed into TOPPERCENT function: CREATE DYNAMIC SET MYSET AS The set returns all Customers if no selection is made on Parameter Values. However if a selection is made on Parameter Values then the selection drives the number of Customers that the set returns. So the query: SELECT [Measures].[Internet Sales Amount] ON 0, …returns all 18485 Customers, whereas: SELECT [Measures].[Internet Sales Amount] ON 0, …returns the top 5% of Customers (I’m using the MemberValue function minus 1, multiplied by 100, here so I can use the value 5 rather than the original MemberValue of 1.05) by Internet Sales Amount, which results in the top 164 Customers being returned. Slicing by the other members on Parameter Values will give me the top 10%, 15% and 20% of Customers by Internet Sales Amount. 9/10/2009 Now() and the Formula CacheYou know, I have the nagging feeling I’ve already blogged about this before… but I can’t find the post (I have been at this for over four years now) so it’s worth mentioning again… Anyway, a common question asked on the MSDN Forums is how to get the current date and then use it in an MDX calculation – see this thread for example. Usually this is because users want calculations that show the current day’s sales, or something similar. One answer is of course to use the Now() function, but what people don’t often realise is how this can impact the ability of Analysis Services to cache the values returned by calculated members, and therefore reduce overall query performance. To understand why, let’s look at some examples in Adventure Works. First of all create a calculated measure on the AW cube as follows: CREATE MEMBER CURRENTCUBE.[MEASURES].[NOWDEMO] AS NOW(); Then run the following query a few times: SELECT {[Measures].[NOWDEMO]} ON 0 As you’d expect, every time you run this query you see the current date and time – and every time you run it, you see a different value. But, you may be thinking, doesn’t Analysis Services cache the results returned by calculated members? Well, yes it does in most cases, but for non-deterministic functions (functions that could return a different result every time they’re called) like Now() no caching takes place, because otherwise the value returned from the cache might be different from the one the function actually returns. The next problem is that if you create any other calculated members that depend directly or indirectly on the value returned by a non-deterministic function, their values can’t be cached either. One problem I see sometimes in my consultancy work is poor query performance resulting from SSAS being unable to use the formula cache, because a large number of calculations have a dependency on a single calculation that uses the Now() function. Here’s a greatly simplified example of two calculated members, the first of which finds the current year and the second which returns a sales value for the year eight years before the current year: CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year] CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR] If you run a query that references the second calculated measure on a cold cache, such as: SELECT [MEASURES].[SHOW SALES FOR A YEAR] ON 0 The first time you run it you’ll see SSAS going to disk as you’d expect; the second time you run it though you’ll see SSAS is able to use the Storage Engine cache but not the Formula Engine cache, as this Profiler trace shows: The highlighted Get Data From Cache event shows data being retrieved from the measure group cache. This is ok and can be beneficial for query performance, but if the calculation we’re doing is very expensive then it can still mean our query takes a long time to run. We’ll only get an instant response on a warm cache if we can work out how to use the formula cache somehow. Luckily, in most cases where Now() is used, we don’t usually want the system date and time, we just want the date. That means that we only want to return a different value when the date changes, once every 24 hours. What we can do therefore is use a named set to somehow store the value returned by Now(), for example like this rewrite of the calculation above: CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year] CREATE SET CURRENTCUBE.MYYEAR AS CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR - CACHEABLE] What we’re doing here is finding the year we want, then finding the member for the year 8 years ago and storing that in a named set. Because a static named set is only evaluated once, when we do our final calculation we can reference the single member stored in the named set and therefore make use of the formula cache as the following Profiler trace shows: We’re now getting data from the flat cache, which is one part of the formula cache (which isn’t ideal either as it indicates the calculation is being evaluated in cell-by-cell mode, I guess because we’re referencing a named set inside it) and so warm-cache performance will be better. The next problem is that when the date does change, we need to clear the cache. This can be accomplished easily by running an XMLA ClearCache command, perhaps direct from SQL Server agent, every night at midnight or whenever necessary. To be honest, though, I’m not sure using the Now() function at all is a good thing – apart from the issues described here there are a lot of other risks involved, such as the time or date on your server being wrong or confusions with time zones and date formats. I think a better approach to the problem is to have an extra attribute in your Time dimension which flags up a date as ‘today’, and which changes every day. Of course this means you need to do some extra ETL and processing on your Time dimension as a result, but I think it’s a much cleaner solution than Now() and leads to much more efficient MDX calculations. 8/8/2009 Sets in the Where Clause and AutoexistsI don’t usually blog about bugs, but there are some cases where the dividing line between what’s a bug and what is ‘by design’ is unclear – and in these cases, a warning to the user community is always helpful. This is one of those cases… The other day I was talking to Peder Ekstrand of DSPanel and he showed me a pair of queries running on SSAS 2008 that had him confused and to be honest, to me looked clearly buggy. I managed to repro the behaviour on Adventure Works on 2008 (friends reproed it on 2005 too) and here are my queries. The first one returns a single cell containing the value $14,477.34, the value of Internet Sales on July 1st 2001, as you’d expect: select [Measures].[Internet Sales Amount] on 0, The second returns the value $29,358,677.22, the value of Internet Sales Amount across all time periods: select [Measures].[Internet Sales Amount] on 0, …which, incidentally, is the same value you’d get if you removed the Where clause completely. To me this second result makes no sense whatsoever and is extremely confusing. The only difference between the two queries is that in the first the Where clause contains a single member whereas in the second that member is enclosed in braces, meaning it is now a set containing a single member. The second important thing to point out is that we have members from different hierarchies on the Date dimension on Rows and in the Where clause, meaning that auto-exists is coming into play (see this section on BOL for an explanation of what auto-exists is). The third thing to note is that it only happens in some cases. So for example when you run the following query which has a member from the Calendar Year hierarchy rather than the Date hierarchy in the set: select [Measures].[Internet Sales Amount] on 0, …you get the value you’d expect, ie the Internet Sales Amount for the Calendar Year 2001. This query, with a Fiscal Year on Rows, also returns the ‘correct’ result, $14,477.34: select [Measures].[Internet Sales Amount] on 0, Most interestingly, where the members on Rows and in the Where clause from the second query above are swapped, also returns the ‘correct’ result: select [Measures].[Internet Sales Amount] on 0, What’s going on here? Clearly something to do with sets in the Where clause, auto-exists and probably attribute relationships, and something that could easily cause a lot of confusion for users and cube developers alike. I’ve been told that the current behaviour is ‘by design’ but the dev team are aware it’s less than ideal; it’s something to do with maintaining consistency with what happens when there are sets in the Where clause in some scenarios. But for this query: select [Measures].[Internet Sales Amount] on 0, …I would expect to see the aggregate of Internet Sales Amount for July 1st and July 2nd 2001. Anyway, here’s the Connect I opened about this: The more votes it gets, the more likely it’ll get fixed! 6/26/2009 SAP, MDX and the right language for BIVia Amyn Rajan’s blog I’ve learnt about a forthcoming MDX book, “MDX Reporting and Analytics with SAP”, and also seen some encouraging news on the SAP Business Objects Roadmap where MDX will be a core element. As an MDX guy – and more specifically as someone who make his living from MDX – I’m pleased to see this level of support for MDX from someone other than Microsoft and Mondrian. Even if I never do any consultancy work on SAP MDX (although I’d be interested since I hear rates are much higher than for Microsoft work!) I have a vested interest in the popularity of the MDX language in general. Yes, it’s difficult to learn and sometimes inconsistent, but I am 100% convinced it’s the best language for multidimensional queries and calculations and much better than SQL. To a certain extent we’re all guilty of promoting the technology we know and love as the way of solving all problems; and whenever we come across a problem our technology can’t solve easily, the temptation is to extend it and improve it so that it can solve the problem. However in the case of SQL and BI there comes a point where no amount of extensions and improvements can make it do what you want it to do efficiently and easily. To paraphrase Dr Johnson, a complex BI-type calculation done in SQL is like a dog walking on its hind legs. It is not done well, but you are surprised to find it done at all. There are plenty of vendors and bloggers who would disagree with me on this though, I know. One good example of where MDX scores over SQL is that it inherently knows about order on hierarchies: for example it knows (because you told it when you built your dimension) that Monday comes after Sunday, that 2009 comes after 2008, that June comes after May. This makes doing time-based calculations very easy, and time based calculations are part of just about every BI solution. The ability to order a set and then work with it is also important for other types of problem, such as the grouped string concatenation puzzle that Adam Machanic blogged about recently. In MDX, to solve this problem and get an ordered list of product names, you just need to order a set and then use the Generate function, for example: Generate( But I’m getting carried away again. Arguing the merits of a language is all very well, but it’s commercial support that is what actually matters of course. And to get back to my original point that’s why I’m pleased to see that SAP seems to agree with me that MDX is a good thing. 6/19/2009 Error messages in MDX SELECT statements and what they meanAnyone that has tried to learn MDX will know that, when you make a mistake somewhere in your code, the error messages that Analysis Services gives you are pretty unhelpful. It was suggested to me recently while I was teaching an MDX course that I should blog about common error messages and what they actually mean; so here’s a list of a few example queries using Adventure Works that return confusing errors, the error messages themselves, and details on how to solve the problems. I’ve deliberately concentrated on query-related errors rather than calculation-related errors (that can be a future blog post); if you can think of any more errors that I should cover please leave a comment. 1) Query causing error: SELECT Error message: Query (3, 1) Parser: The syntax for '[Date]' is incorrect. The first step to solving this fairly simple syntax error is understanding the values in brackets in the error message. (3,1) indicates that the error is at character 1 on the third line of the query, where we have the expression [Date].[Calendar Year].MEMBERS; we should also see a red squiggly underneath this text in SQL Management Studio. There’s nothing wrong with this expression though, apart from the fact that it’s in the wrong place: what has happened is that we’ve forgotten to include a comma after COLUMNS immediately beforehand. If we put one in, the query runs. Solution: SELECT
2) Query causing error: SELECT Error message: Query (3, 1) The CHILDREN function expects a member expression for the 1 argument. A tuple set expression was used. This is a very common error that people encounter while learning MDX, and it all comes down to understanding the difference between sets, tuples and members. In a lot of situations Analysis Services is very forgiving: if it expects a set and you give it a single member, then it will cast that member into a set with one item in it for example. It can’t do this for you all the time, though, and you do need to understand what kind of object each function returns and/or expects for a parameter. In this case, the problem is that the .CHILDREN function needs to be passed a member and the .MEMBERS function returns a set (strictly speaking, as the error says, it’s a set of tuples); therefore we can’t use the two functions together. If we want to find all of the children of all years, we can use the DESCENDANTS function instead, which can accept a set as its first parameter. Solution: SELECT
3) Query causing error: SELECT Error message: Parser: The statement dialect could not be resolved due to ambiguity. Analysis Services supports no less than three query languages: MDX, DMX and a very limited subset of SQL. As a result, when you run a query it needs to work out what query language you’re using and can easily get confused if you make a mistake. In the query above we’ve given a list of the two measures we want to see on the columns axis, but we’ve forgotten to surround this list in braces to turn it into a set – and it’s a set that is required for the axis definition. This is an error that is commonly made by people with a background in SQL, and indeed the problem here is that the error has made the query look a bit too much like SQL or DMX. Putting in braces where they’re needed fixes the problem and removes the ambiguity. Solution: SELECT 4) Query causing error: SELECT Error message: Query (2, 2) The member '[Internet Sales Amount1]' was not found in the cube when the string, [Measures].[Internet Sales Amount1], was parsed. A fairly straightforward error this: we’ve tried to reference a member that doesn’t exist in our query - it’s the extra 1 on the end of the name that’s the problem. The way to avoid this is to always let Analysis Services generate unique names for you, and you can do this by dragging the member (or any other object) from the metadata pane in SQL Management Studio into the MDX query pane when you’re writing queries. Here, using the correct member unique name solves the problem. Solution: SELECT Note that for dimensions other than the Measures dimension, what happens in this scenario depends on how you’ve set the MDXMissingMemberMode property. By default if you write something that looks like it could be an MDX unique name, but which isn’t actually the unique name of a member on a hierarchy, Analysis Services will simply ignore it. So the following query returns nothing on rows because the year 2909 doesn’t exist in our Calendar hierarchy: SELECT And worse, the in this query a genuine syntax error is completely ignored too: SELECT 5/27/2009 BeginRange and EndRange connection string propertiesUsing the Timeout connection string property is a good way of making sure that your queries don’t run for too long, but sometimes – for example when you’re using SSRS – you want to restrict the amount of data that a query returns. You can’t properly do this with Analysis Services, but it is almost possible… Consider the following query on Adventure Works: SELECT It returns 1189 rows and 3 columns. If you click on any of the cells containing data in SQL Management Studio, to see the cell properties, you’ll see that the CellOrdinal property contains the index of each cell in the cellset. So the top left hand cell is ordinal 0, the one to its right is 1, and so on until the last column where it starts again one row down: Using the BeginRange and EndRange connection string properties, you can limit the cells in a cellset that actually get populated with data. Note that you can’t restrict the overall number of cells though, which would be more useful. Both these properties take an integer value which represents a cell ordinal: BeginRange is the first cell ordinal you want to contain data, EndRange is the last cell ordinal. Their default value is –1, which for BeginRange means start at the first cell ordinal and for EndRange means end at the last cell ordinal. So, for example, with BeginRange=4 and EndRange=7, running the query above would give the following output: As I said, the overall number of cells in the cellset remains the same, but only the cells in the range we specified actually contain data. This ‘filtering’ happens after the query axes have been resolved, as far as I can see, so adding NON EMPTY on Rows for example does not filter out any of the empty rows. If you were using SSRS, however, you could do this filtering at the DataSet level. If you look in Profiler you’ll see that these properties have an affect on the amount of work SSAS does at query time. On a cold cache, with no BeginRange and EndRange set, the query scans all of the year partitions in the Internet Sales measure group as you would expect. But with BeginRange and EndRange set as above, on a cold cache SSAS only reads data from the 2001 partition. BTW, remember that if you’re experimenting with these connection string properties in SQLMS, when you’re finished you’ll need to either close and reopen SQLMS or set BeginRange=-1 and EndRange=-1 as a result of this bug (which still doesn’t seem to be fixed in SP1). 5/20/2009 Joining the results of two MDX queries togetherOne question I get asked occasionally is whether it’s possible to join the results of two MDX queries together. Although I seem to remember this kind of functionality is mentioned in the OLEDB for OLAP spec it certainly isn’t supported in Analysis Services MDX and I don’t expect it ever will be; therefore, as all good consultants know, when you’re faced with a request for functionality that doesn’t exist what you have to do is look closely at the requirement to see if there’s a different way of solving the problem to get the result the customer wants… What people usually want to do when they think about joining MDX queries is this: they want to create a query that shows members from two different hierarchies side-by-side on the same axis. For example, in Adventure Works you might want to see a query with Calendar Years on Rows and Countries followed by Product Categories on Columns, something like this:
It’s clear we can get the results we need by running two different queries, as follows: SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]} ON 0, SELECT {[Product].[Category].&[1],[Product].[Category].&[3] } ON 0, Depending on the tool we’re using, we could try to put the results next to each other to make them more easily comparable. What we can’t of course do is something like the following query: SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada],[Product].[Category].&[1],[Product].[Category].&[3]} ON 0, This will result in the following error message: What can we do to make the query work? Well, there is a simple MDX solution: create a set of tuples containing Countries and Product Categories: SELECT What I’ve done here is created a set using two Crossjoins. The first returns a set containing the Countries we want crossjoined with the All Member from Product Categories; the second returns a set containing the All Member from Countries crossjoined with the Product Categories we’re interested in; we can then union them together and use them on the same axis because the tuples in the set have the same dimensionality, ie (Country, Product Category). Here’s what you get back: It’s not quite what we wanted, but it’s all the data we need in a single query and we can probably get the user to ignore the All Members, or possibly hide them in the client tool somehow. The only problem with this approach is that it becomes unwieldy the greater the number of different hierarchies we want to display on columns. If we’re using SSRS 2008 to display the results of our query, there’s another possible approach: we can use the new Tablix control to create the style of layout we’re after instead quite easily. You need to start by using the query designer and paste in a version of the query above with Years, Countries and Product Categories on Rows and Internet Sales Amount on columns: SELECT You then create a new matrix, drop Calendar Year onto the row group, Internet Sales Amount into the Data area, Country onto column group, then right click on the rightmost column and select Add Group->Column Group->Adjacent Right, to create a new column group, set it to group by Product Categories and again drop Internet Sales Amount into the data area: Then, for each Column Group you need to make sure that you don’t see aggregated values for the All Members (which of course in SSRS are returned not with the All Member’s name, but with blank names); You do this by setting a filter on each group property, using an expression like: Anyway, you then get an output like this, which is what we wanted: Here’s one last impractical but fun way to solve the problem. While playing around with DMX recently it occurred to me that the SHAPE statement could also be useful in solving this problem, and a lot of help on the syntax from my friend and DMX (as well as SSIS) guru Mr Allan Mitchell, I came up with the following: SELECT FLATTENED t.* To get this to work you just need to have a mining model in your SSAS database so you can put it in the FROM clause; it doesn’t matter what it is because it’s going to be ignored. I was able to join the queries on the MEMBER_CAPTION field from [Date].[Calendar Year], which contained the names of the Years on rows, although it was a struggle to work out how and where to add all the extra opening and closing square brackets that are needed in the RELATE clause! Notice, though, that we can just paste the MDX queries we need in there – usually SHAPE is used with OPENQUERY, but of course even though this is a DMX query we’re staying within the same SSAS database to get the data so that’s not necessary. Not the most elegant solution, of course, but interesting nonetheless. 3/31/2009 EXISTING AND NONEMPTYHere’s an interesting MDX nugget. The other day I was working on something similar to the following query, that finds the count of the number of customers who bought something in the current month as well as the previous month: WITH On my laptop, this runs in almost 16 seconds – not good. What I was doing here was making the assumption that it was a good idea to do the EXISTING first, to filter the set of customers down as much as possible, then do the two NONEMPTYs. In fact, what I found was that by moving the EXISTING to after the NONEMPTYs had been evaluated the query ran much faster: WITH On my laptop now, the above query executes in 3 seconds. Why is this? To be honest, I’m not completely sure… looking in Profiler and Perfmon I see differences, but not many; these things are better left to the likes of Mosha who know what’s going on inside in the formula engine. But anyway I thought I’d post this up because it’s a common form of calculation – so if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing. 3/26/2009 Aleri Live OLAPThere’s no real reason for this post except to indulge my trainspotterish habit of making a note whenever I find another OLAP product that supports MDX. From a recent post on Curt Monash’s blog I came across Aleri Live OLAP (see also here), an MDX-capable OLAP server that works on streaming data coming from their complex event processing platform. 2/11/2009 More on Oracle 11g and MDXFollowing on from reports last year that Simba Technologies had built a 2005-flavour OLEDB for OLAP provider for Oracle's OLAP option, here are some more details: 2/4/2009 Arbitrary-shaped sets and the Storage Engine cacheHere's a companion-piece to my post last week about query context and the formula engine cache - another scenario where you can easily stop caching taking place without knowing it, which has already been documented (although there is at least one important point to add) but again needs more visibility. This time the problem is that when you write an MDX query with an arbitrary-shaped set in the Where clause it stops Analysis Services using the storage engine cache. Queries that suffer from this will always read from disk and always perform as well or as badly as they did the first time they were run - so if cold cache performance is a problem for you, then this is an issue you need to understand and avoid. Rather than repeat the information, let me direct you to the blog entry where I first found out about this problem, on Thomas Keyser's blog from 2006: I can confirm that everything he says is still relevant on SSAS2008 except for the last query, where he has the whole of the Product.[Product Categories] hierarchy in the Where clause - run it twice and the second time you run it you'll see it does hit the storage engine cache. One other point I picked up on Mosha's MDX seminar in November is that it is possible for Analysis Services to think a set is arbitrary-shaped when it really isn't. Take the following query: SELECT [Measures].[Internet Sales Amount] ON 0, This does not have an arbitrary-shaped set in the Where clause, and as a result the second time you run it you'll see it hit the storage engine cache. However, if you rewrite the query so you have a set of tuples in the Where clause as follows: SELECT [Measures].[Internet Sales Amount] ON 0, Even though you might think this query is equivalent to the first one, you'll see that it does not use the storage engine cache. What can we do about this then? Not a lot with most client tools; I've not checked, but I'd be surprised if any of them generated their MDX to avoid this situation. If your users frequently use certain arbitrary-shaped sets the only thing you could maybe do is hack your dimension data to make them non-arbitrary - but that would almost certainly end up being a bad compromise; otherwise you'd just have to build aggregations to make cold cache queries fast. However, if you're using SSRS then of course you can rewrite the MDX yourself. Let's build a quick report on AdventureWorks that displays this problem: As you can see, I've got a multiselect parameter on the slicer that has a default selection of members from two different levels from [Product].[Product Categories] - an arbitrary shaped set. Here's the MDX that gets generated: SELECT And here's how I would rewrite it: SELECT What I've done here is:
1/30/2009 Formula Caching and Query ScopeThe Analysis Services formula engine's ability to cache the results of calculated members can be crucial to the overall performance of your cube, but it's all too easy to turn off this caching. I've blogged already about the impact that subselects have on this and today I thought I'd discuss how you need to be careful using different calculation contexts. This is a topic that's covered in the Analysis Services 2008 Performance Guide, but I thought it would be worth talking about here because it does deserve a lot more visibility as an important cause of poor query performance, especially in SSRS reports with hand-coded MDX. As you know, you can define calculated members in three different places in Analysis Services: on the cube (the global context), within the session (the session context) and in the WITH clause of a query (the query context). The important paragraph from the Performance Guide is this one: If the query processor encounters calculations created at query time, it always uses the query context, even if a query also references calculations from the global context (there is an exception to this – queries with query calculated members of the form Aggregate(<set>) do share the session cache). If there are no query calculations, but there are session calculations, the query processor uses the session cache. The query processor selects the cache based on the presence of any calculation in the scope. This behavior is especially relevant to users with MDX-generating front-end tools. If the front-end tool creates any session calculations or query calculations, the global cache is not used, even if you do not specifically use the session or query calculations. What does this mean in practical terms though? Consider the following query on Adventure Works that uses a calculated measure defined on the cube, Internet Ratio to Parent Product: SELECT [Measures].[Internet Ratio to Parent Product] ON 0, Clear the cache and run the query and you'll see the normal stuff taking place if you run a Profiler trace. Then, if you rerun the query on a warm cache, you should see something like this in Profiler (if you are looking at the Get Data From Cache event): Clearly the result of the calculation has been cached, and the second time you run the query you're retrieving values from the formula engine cache. Now, consider the following query: WITH MEMBER MEASURES.TEST AS 1 We're now creating a calculated measure in the WITH clause but not using it in the query. What happens when we run this query on a warm cache? We're still hitting cache, but it's the storage engine cache and not the formula engine cache; the calculations are being re-evaluated for the query. So, even the presence of a calculated measure in the WITH clause, even though we're not actually using it, prevents the use of the formula engine cache and if we've got complex calculations used in our query this could make our overall query performance significantly worse. Note that the presence of a named set in the WITH clause does not have the same effect, so the following query can use the formula cache: WITH SET TEST AS {} So, some recommendations that follow on from this:
11/27/2008 NonEmpty() and that all-important second parameterHere's a question which comes up all the time - it was asked at Mosha's MDX seminar last week, and a friend of mine asked me about it recently too - what does the NonEmpty function do if you don't specify the second parameter? Let's take a look at some example queries. I think everyone knows that you can use NON EMPTY before an axis definition to remove all the empty tuples on that axis, as with: SELECT [Measures].[Internet Sales Amount] ON 0, The problem comes when people assume that you can use the NonEmpty() function in the following way to get the same result: SELECT [Measures].[Internet Sales Amount] ON 0, In a lot of cases you might not see any obvious differences between what the two uses return, but if you run the query above you can see a lot of empty rows returned so they clearly aren't the same. So what's happening? If you clear the cache, rerun this second query and then run a Profiler trace you can get a hint: Why are the Reseller Sales measure group partitions being hit? Because the Reseller Sales Amount measure is the default measure on the Adventure Works cube, and since we didn't specify a measure in the second parameter for NonEmpty() it's using the default measure to decide which dates have values or not. To fix this we can explicitly tell AS which measure to use: SELECT [Measures].[Internet Sales Amount] ON 0, The moral here is always, always, always specify a measure in the second parameter for NonEmpty() whenever you use it. If you don't you may get unexpected results back and you may also get poor performance, for example if the default measure comes from a very large measure group. Oh, and as a bonus tip, don't ever use NonEmptyCrossjoin() with AS2005 or later. It's difficult to use and frankly unpredictable in what it does sometimes; you can always do whatever you want with NonEmpty or Exists (when specifying a measure group in the third parameter) much more reliably and just as fast. For more information on this topic, have a look at this old-but-good blog post from Mosha: 11/21/2008 Oracle OLAP support for MDXIs that a pig I see flying past? It seems that the folks at Simba have come up with an OLEDB for OLAP provider for Oracle's OLAP option, so you'll be able to query Oracle OLAP direct from Excel 2007. More details here:
More evidence that it's the lure of direct Excel connectivity that's the driver for MDX adoption. This is a big win for MDX as a query language...
10/28/2008 Reporting Services-generated MDX, Subselects and Formula CachingAnalysis Services can, of course, cache the results of MDX calculations and share that cache with other users most of the time... but when it can't the result is much unnecessary load on your server as calculations are recomputed again and again, and much longer query times. Unfortunately the details on what caching AS can do when are not properly documented anywhere and I've only learned the bits and pieces that I know about this subject from my own experience and from talking to the likes of Mosha; I've avoided blogging about it because my knowledge is incomplete and this is a complex topic. Happily the upcoming AS2008 Performance Guide should rectify this lack of proper documentation, and I believe Mosha is also going to address this subject in his pre-conf seminar at PASS. However, there is one specific scenario that I thought I'd write about because I come across it regularly and it affects anyone using a client that generates MDX with subselects in, such as the Reporting Services query builder. Let's say you've got an expensive calculated member on your cube - for example, add the following (deliberately slow) calculated measure to the Adventure Works cube: CREATE MEMBER CURRENTCUBE.MEASURES.EXPENSIVECALC AS In Reporting Services you might then use the query builder to create a query that looked like this to select just Calendar Year 2004 on rows and this new calculated measure on columns: SELECT On a cold cache, on my laptop, this query takes 33 seconds; on a warm cache it still takes 33 seconds. This is not good - practically all of the query time is taken up evaluating the calculation, and this calculation is clearly not being cached. The reason why is because the query contains a subselect, and when a query contains a subselect it means that the results of any calculations can only be cached for the lifetime of the query. You can easily rewrite the query as follows to avoid the subselect: SELECT On a cold cache this query still takes 33 seconds, but on a warm cache it comes back in 1 second because the calculation can now be cached and this cache can be reused for subsequent queries. Clearly this is much better. I wonder how many SSRS implementations are suffering from needlessly poor performance just because of this? Subselects are a useful addition to MDX and make calculating 'visual totals' type values very easy, but they should be used with care and not just as a means of selecting individual members. 10/20/2008 Calculated members are better than assignments to real members (at least sometimes)Earlier this year I blogged about how, in my experience, using real members and overwriting them with MDX Script assignments sometimes seemed to perform worse than using calculated members. See here for the full posting: Recently I came across this problem again but was able to get together a proper repro and open a support case with Microsoft. What was happening was that I had a time utility dimension with three members - let's call them member A, member B and member C. A was the default member and had no calculation, it just showed the real values from the rest of the cube; B was a straightforward YTD calculation; C was a really nasty, untuned, same period previous year calculation with lots of special logic. I had a query that included A and B but not C and which was performing really badly, and I found that when I commented out the calculation for C the query returned instantly. So I guessed that for some reason the calculation for C was being evaluated when the query ran, even though C wasn't actually requested in the query and this was confirmed by Microsoft. Unfortunately I was also told this behaviour was 'by design', something to do with prefetching and sonar. Thankfully it doesn't happen for all queries or all cubes (I tried and failed to repro it on Adventure Works) but in my opinion there should never be a situation where the formula engine evaluates calculations that aren't needed - precisely because in some cases it will mean the query runs much longer than it ever should. Probably the worst thing about all this is that you have no idea when exactly it's happening with the tools we've got available at the moment. I suspect that many people out there have poorly-performing cubes because of this issue... One of the workarounds was to use calculated members instead of real members, as I recommended in my original post, but as I also said in that post there are some cases where calculated members are not a good alternative to real members. In my particular case I was able to tune the calculation associated with member C so either the problem didn't happen or C evaluated so quickly it didn't matter - I'm not sure which. Anyway, I stand by what I said in my original posting: use calculated members rather than real members overwritten by Script assignments wherever you can! 10/14/2008 PALO supports MDX and XMLAThanks to Marco Groeneveld for sending me this link:
The open source OLAP server PALO now supports XMLA and MDX. Another victory for MDX! 9/16/2008 Displaying totals at the bottomFor someone like me without a financial background, the default behaviour of AS when displaying a hierarchy seems natural: when you ask for a set like MyHierarchy.Members, you'd get all the parent members appearing in the list before their child members. But for financial reports you want the opposite, you want to see all child members displayed before the parent members. The other day I was asked how to do this in turn and asked Andrew Wiles (who has a lot more financial OLAP experience than me), and he pointed me to the simple answer - the HIERARCHIZE function. Here's an example from Adventure Works: SELECT [Measures].[Amount] ON 0, Run the query and you'll see the following returned, with the parent member Statistical Accounts displayed above its children: However, if you want to display it in the financial style, all you need to do is wrap your set with HIERARCHIZE( <<set>>, POST) as follows: SELECT [Measures].[Amount] ON 0, I have to admit, I'd always wondered what the point of Hierarchize() was - this is the first time I've actually needed to use it! 7/30/2008 Many-to-Many Dimension bugJon Axon mailed me recently with an interesting bug he'd come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It's reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469 Here's an example on Adventure Works that Jon gave me. The following query: SELECT Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason - this is the key point here. Now if we run the query: SELECT We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that: SELECT But it doesn't, it returns 18,484 again. However if you run this query which should be equivalent to the previous query: SELECT It returns the value we're looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it's equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database. 7/15/2008 Named Sets, AutoExists and KatmaiA couple of months ago Radim Hampel pointed out to me some very weird stuff happening with named sets and the Where clause. Since it turned out that Darren had run into the same issue and also been thrown by it, and since I tested it out on Katmai CTP6 and could see that it was behaving differently from AS2005, I opened an item on Connect: And now, after a long and detailed email thread involving Mosha, Edward Melomed, Marius Dumitru, Darren and Deepak we've got to the stage where I understand what's going on, Katmai RC0 does roughly what I want, and I can blog about it! Let me explain what I saw first. What would you expect the calculated member in following query to return? --Query 1 Just from looking at the code I would have set the calculated member should return the string representation of the set from January 2004 to June 2004. But if you run the query you will in fact see that it returns the set containing the member July 2004 on both AS2005 and Katmai. To me that made absolutely no sense... Now, take a look at this query: --Query 2 On AS2005 the calculation returns June as I would expect, on Katmai it returns an empty set. Now run this query: --Query 3 and this query: --Query 4 ...which to me should do the same thing. On AS2005 query 3 returns the set May and June but Katmai returns an empty set; query 4 returns an empty set on both AS2005 and Katmai. At this point I could see that something funny was happening that I didn't like! What are the practical implications of this? Take the following query from Mosha's blog entry on ranking: WITH Run it and you'll see that the Employee with the key 46, A Scott Wright, has a rank of 18. Now let's slice by this Employee: WITH Run this and you'll see that A Scott Wright has now supposedly got a rank of 1. Whatever the logic behind this, it doesn't make sense from an end user perspective does it? So how can we explain what's happening here? It's all to do with autoexists: in some cases it makes sense to apply autoexists to named sets, but in others (mostly when the set is intended for use in a calculation) then it doesn't. Let's forget about trying to understand what AS2005 does because it tries to guess when it should apply autoexists and gets very confused, but Katmai is mostly consistent and logical: by default it applies autoexists to all named sets. That explains why queries 2,3 and 4 all return empty sets on Katmai: May and June don't exist with July. After my initial item on Connect was opened those nice people in Redmond (who agreed with me that the way things were working wasn't ideal) added a new connection string property, Autoexists, which can have the following values: 0 – default (same as 1) Here's the explanation I got from Marius about what's meant by 'deep' and 'shallow' autoexists: Suppose a query axis or named set involves a set expression of the form F(G(s)), with F and G being set functions (e.g. TopCount, Tail etc.) So the behaviour I describe above for RC0 is also what you get when you put Autoexists=1 in the connection string. But what about Query 1 - why does the PeriodsToDate function return July with this setting? Hmm, well I think this is a bug and it should return an empty set. I opened another Connect about this: What happens with Autoexists=2 and Autoexists=3 then? In both cases, Query 1 returns the set of months from January to June; Query 2 returns June, Query 3 and Query 4 both return May and June. To see the difference between these two settings take a look at this query: WITH With Autoexists=1 and Autoexists=3 the set here is empty; with Autoexists=2 then it contains May and June. |
|
|