Chris's profileChris Webb's BI BlogBlogLists Tools Help

Blog


    11/11/2009

    Ragged Hierarchies, HideMemberIf and MDX Compatibility

    Here’s something that I suspect a few people out there will be surprised I didn’t know – but as far as I’m concerned, if I didn’t know it then it’s worth blogging about.

    Anyway, it regards ragged hierarchies, the HideMemberIf property and the MDX Compatibility connection string property. Now you probably know that if you want to turn a user hierarchy into a ragged hierarchy (perhaps to avoid using a parent child hierarchy) you need to use the HideMemberIf property on the user hierarchy. For example, imagine you were using the following SQL query as the basis of your customer dimension:

    SELECT        'UK' AS Country, 'Bucks' AS State, 'Amersham' AS City, 'Chris Webb' AS Customer
    UNION ALL
    SELECT        'Italy' AS Country, 'BI' AS State, 'Biella' AS City, 'Alberto Ferrari' AS customer
    UNION ALL
    SELECT        'Vatican' AS Country, 'Vatican' AS State, 'Vatican' AS City, 'The Pope' AS customer

    We could build a dimension off this with attributes for Country, State, City and Customer, and for two out of our three customers that would be fine. However the Pope lives in the Vatican, which is (at least for the purposes of this exercise) a Country with no concept of City or State; and in the case of customers who live in the Vatican, we just want to be able to drill down on the Country ‘Vatican’ and see all of the Customers who live there without drilling down through a meaningless State and a City.

    So what we can do is build a user hierarchy on our dimension with levels Country, State, City and Customer, and on the lower three levels set the HideMemberIf property to OnlyChildWithParentName:

    image

    Then, with any sensible client tool, we can connect to the cube and browse the dimension as we want:

    image

    I saw ‘sensible’ client tool, because of course this only works if you set:
    MDX Compatibility=2
    …in the connection string. And of course Excel 2007 hard-codes MDX Compatibility=1 in the connection string and doesn’t allow you to change it, so you can’t use ragged hierarchies properly.

    This much I knew.

    However, what I didn’t realise until last week when I was moaning about this to TK Anand from the SSAS dev team at PASS, is that for some ragged hierarchies you don’t need to set the MDX Compatibility connection string property at all.

    For example, if in our case we duplicate the Customer upwards rather than the Country downwards, like so:

    SELECT        'UK' AS Country, 'Bucks' AS State, 'Amersham' AS City, 'Chris Webb' AS Customer
    UNION ALL
    SELECT        'Italy' AS Country, 'BI' AS State, 'Biella' AS City, 'Alberto Ferrari' AS customer
    UNION ALL
    SELECT        'Vatican' AS Country, 'The Pope' AS State, 'The Pope' AS City, 'The Pope' AS customer

    …and then build the dimension, setting HideMemberIf on our user hierarchy to OnlyChildWithParentName, we can get the result we want without setting the MDX Compatibility property. Here’s a screenshot of this new dimension in Excel just to prove it:

    image

    The difference here is that we’re hiding all members below the State level right down to the bottom of the hierarchy, rather than hiding members somewhere in the middle of the hierarchy. Truly, this is one of those “Doh, if only I’d known!” moments… this at least means that in some of the scenarios where you’d use ragged hierarchies you can get them to work with Excel, even if it means that we have to hack the data (‘The Pope’ is a Customer, not a State or a City). On the other hand there are plenty of scenarios where you do need to hide members in the middle of a hierarchy, and frankly I don’t see why Excel 2007 can’t set MDX Compatibility=2 in its connection string so they work properly.

    11/5/2009

    Live Blogging @PASS - SSAS Consolidation and Virtualisation

    Here are some notes from the SQLCat team’s session on SSAS consolidation and virtualisation; they’re a bit fragmentary since I’m too busy paying attention to what’s being said! I get asked about these issues by my customers all the time.

    • Use Windows System Resource Manager to control how many resources SSAS can use. For more on WSRM see http://technet.microsoft.com/en-us/library/cc755056.aspx
    • If SSAS and SQL are on the same server, use the Shared Memory protocol to improve processing performance
    • Also since resource usage requirements for SSAS and SQL will be different when processing and when querying, if they’re on the same box you can use WSRM to dynamically change resource allocations at different times.
    • Consolidating multiple SSAS databases on the same machine, it can be good to use multiple instances (maybe one per database) on the same machine to give fine control over resource usage, service packs etc.
    • Someone asked the question of whether there is an overhead to using multiple instances each with one database rather than using a single instance with multiple databases. Answer: multiple instances would perform better but use slightly more resources; better to start with a single instance and only move to multiple instances when you have a good reason to do so.
    • Tests run comparing SSAS running on bare metal and Hyper V – no difference in performance between the two for querying, but for the Storage Engine (processing and SE activity when querying) you use 1.5 times more threads on Hyper V (can modify the default number of threads available for processing, may therefore need to change this for Hyper V).
    • Description of a custom-built system for load balancing SSAS developed by the MSSales team inside Microsoft. Code and white paper will be available in a few months.
    • There were various issues with Synchronization in SSAS that have been fixed in the late CUs.
    • IIS7 performs much better for HTTP access to SSAS – performs as well as a direct connection. I’m sure I also heard somewhere that there were some performance issues for HTTP access that were noticeable over a slow network that have also been fixed in the latest CUs.

    The most useful session so far at this conference for me – I learned a lot.

    Quest add support for SSAS monitoring

    Something I saw yesterday at PASS: Quest now have support for monitoring SSAS from their “Spotlight on SQL Server Enterprise” product. See

    http://www.quest.com/newsroom/news-releases-show.aspx?contentid=10602
    http://www.quest.com/spotlight-on-SQL-Server-enterprise/features-benefits.aspx

    It’s pretty basic at the moment – they capture some Perfmon counters and data from schema rowsets, but no trace data – and nowhere near as sophisticated as what SQLSentry have, but it’s good to see another vendor entering this market.

    10/26/2009

    Actions and Multiselect

    At the beginning of this week a customer asked me why, in a certain third-party client tool that shall remain nameless, they could no longer do a drillthrough when they did a multiselect on a filter axis. It seemed a bit weird to me, and it got weirder when I asked around for ideas and Greg Galloway pointed out that Excel 2007 didn’t show any actions at all when there was a multiselect, and Marco Russo noted that the current beta of Excel 2010 didn’t either. This made me wonder whether the problem was in fact with Analysis Services rather than the client tools…

    I didn’t actually know how a client tool worked out what actions were available when, so I did some research and found out that the MDSCHEMA_ACTIONS schema rowset was how it was done. Here’s the documentation on MSDN:
    http://msdn.microsoft.com/en-us/library/ms126032.aspx

    For example, if a client tool needs to know which actions can be called when a user clicks on a cell in a resultset, then it will execute an XMLA command something like this one on Adventure Works:

       1: <Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
       2:     <RequestType>MDSCHEMA_ACTIONS</RequestType>
       3:     <Restrictions>
       4:       <RestrictionList xmlns="urn:schemas-microsoft-com:xml-analysis" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
       5:         <CUBE_NAME>Adventure Works</CUBE_NAME>
       6:         <ACTION_TYPE>401</ACTION_TYPE>
       7:         <COORDINATE>([Customer].[Country].&amp;[Australia],[Measures].[Internet Order Quantity])</COORDINATE>
       8:         <COORDINATE_TYPE>6</COORDINATE_TYPE>
       9:       </RestrictionList>
      10:     </Restrictions>
      11:   <Properties>
      12:   </Properties>
      13: </Discover>

    You can see that a tuple is being passed into the COORDINATE to indicate which cell we’re interested in. But when there’s a multiselect, which cell in the cube are we actually clicking on? Good question… Different client tools handle multiselect in different ways, and it turns out there’s no way of telling SSAS you’re doing a multiselect in this situation. If you try to pass a set of tuples to the COORDINATE you get no actions returned, for instance.

    Having talked this over with Akshai Mirchandani from the dev team, what the client tool needs to do is to make multiple calls to MDSCHEMA_ACTIONS, one for each member selected in the multiselect. It then needs to work out from each of the rowsets returned which actions should be available in the current context – and of course, in this case, there’s a good chance that different client tools will do different things (if they do anything at all). Not ideal.

    To be honest, this really needs to be something that is solved in SSAS rather than on the client and the key to solving it properly would be to have a standard way of handling and detecting multiselect in MDX. As Mosha hinted here, it’s something that’s been on the dev team’s radar for a while but it’s still not made it into the product unfortunately. In the meantime, if there are any client tool developers from the Excel team or third parties out there reading this, it would be great if you could at least do something rather than nothing here!

    9/30/2009

    Building a Better Cache-Warmer, Part 1: the Storage Engine Cache

    SSAS caching is an interesting subject: I’ve blogged numerous times about when caching can and can’t take place, and one of the most popular posts I’ve ever written was on how to build a cache-warmer in SSIS. However that last post was written a while ago, before the CREATE CACHE statement for warming the Storage Engine cache was reintroduced in SSAS 2005 SP2, and I’ve not been satisfied with it for a number of other reasons too. So as a result I’ve started to do more research into the issue and, with the help of Mosha, Akshai Mirchandani, and the chapter 29 of the indispensible “Microsoft SQL Server 2008 Analysis Services Unleashed” I’ve been collecting my thoughts on how to build a better cache warmer with a view to blogging about it again. This first post will deal with the relatively easy bit – warming the SE cache – and when I have time I’ll write about the more difficult but more useful task of warming the Formula Engine cache, and by hopefully then I’ll be in a position to create the definitive SSIS cache warmer package and be able to blog about that too.

    Before I go on I need to mention that one other reason that has made me want to revisit this subject is that, at his MDX Deep Dive session at PASS last year, Mosha asserted that cache-warming was a Bad Thing. This intrigued me, since it seemed like such an obviously Good Thing to do, but of course it turns out he had good reason for saying what he did… SSAS caching is a lot more complex than it first appears and cache warming done badly can be counter-productive.

    I’m also tempted to say that if you don’t need to warm the FE cache (ie you don’t have any complex calculations that take a long time to run) then there’s not much point in warming the SE cache. In my experience if you have the right aggregations built then you can get excellent response times even on a cold cache. Of course to build aggregations you need to know the granularities of data that your users’ queries are, but you need the same information to warm the cache. The only difference would be that you can make your cache warmer much more dynamic than your aggregation design: if your users’ query patterns change regularly it would be difficult to keep your aggregation design in synch with them, whereas with a cache-warmer you can record the most recent queries that have been run and warm the cache accordingly.

    There are a number of things to bear in mind when warming the SE cache:

    • We must not try to cache too much data. If we go over the LowMemoryLimit data will start to be evicted from the cache, which means we get no benefit from that point on; if we go over the TotalMemoryLimit it’s likely that just about the entire cache will be emptied. We also have to remember that the cache will grow as a result of normal usage during the day, so we should plan to use only a fraction of the available memory when warming the cache.
    • We want to avoid cache fragmentation. The SE cache uses a data structure called the data cache registry to store its data, and when the FE requests a (potentially filtered) subcube of data from the SE that data subsequently gets stored in the SE cache as an entry in the data cache registry. As a result it is possible that instead of having one entry in the SE cache that contains a lot of data, the same data could be spread across multiple smaller entries. This is bad because:
      • It means that lookup in the cache is slower as there are more entries to search
      • The SE doesn’t bother to continue searching in the data cache registry after it has compared the current request with more than 1000 subcubes, and will then go to disk even if the required data is in the cache. If the cache is filled with lots of small entries then this will happen more often.
      • If the FE requests the slice {A,B} from a hierarchy in a subcube and the data for those two members A and B are held in different entries in the cache, then it will not be found. It will only be matched if a single entry in the cache contains data for both A and B.
    • Data in the SE cache can be aggregated to answer a request – but only if the data itself if aggregatable, and this is not the case when there are distinct count measures in the measure group or when there are many-to-many relationships present. Assuming that the cache is aggregatable though we should perform SE cache warming at lower rather than higher granularities, although if we go too low down we’ll end up caching data we don’t need, using up too much memory and find that the cache warming itself takes too long.

    Of course all of the above is subject to change in later versions, but I understand that this is the current situation with SSAS 2008 SP1.

    How can we find out what data to use to warm the SE cache then? The MDX queries that have been run by users aren’t much help, but we can see what subcubes are being requested when a query is run by looking at the Query Subcube and Query Subcube Verbose events in Profiler; we could therefore run a Profiler trace to capture this data, but in fact the easiest option is to just use the Usage Based Optimization wizard’s Query Log since it contains the same data as Query Subcube. The following blog entry from James Snape shows how to interpret this information:
    http://www.jamessnape.me.uk/blog/2006/11/09/SubcubeQueries.aspx 
    A truly sophisticated SE cache warmer would be able to take this data, along with data about attribute relationships, and work out what the most recent, frequent (but slowest) subcube requests were, then try to find some lowest (but not too low) common granularities to use for cache warming, perhaps also slicing so that only the most recent time periods were used as well.

    Last of all, how do we actually go about loading data into the SE cache? We could run MDX queries but this is a bit of a risky approach since we may not know what calculations are present and so we can’t be sure that our query requests the right data; the best option is to use the CREATE CACHE statement that is described here:
    http://blogs.msdn.com/sqlcat/archive/2007/04/26/how-to-warm-up-the-analysis-services-data-cache-using-create-cache-statement.aspx

    9/23/2009

    Slow metadata

    There’s an interesting thread on the Analysis Services MSDN Forum that I’ve been following for a while now (it’s over 40 posts long at the time of writing) that gives a good explanation of why metadata access on very large or complex cubes can be so painfully slow. Here’s the link:

    http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/679b510f-3d42-4eed-ba2c-df2a2bd39a00

    In summary, it turns out that the more files there are on disk in your SSAS database folder, the slower metadata reads become, and when you reach very large numbers of files almost any kind of metadata access becomes incredibly slow. There are a lot of different factors determining how many files are created for your SSAS database but large numbers of cubes, attributes, partitions and aggregations (which end up being multiplied across each partition) are the main culprits; the problem is that the way the EstimatedSize property on metadata objects has been implemented results in SSAS having to iterate over all the files in every subdirectory in the database’s directory in the data folder to find their size.

    One workaround is to set the Storage Location property of your partitions to somewhere outside the data directory since this means they won’t get scanned, but doing this makes backup and restore difficult and means you get inaccurate size estimates. The only other option is to reduce the number of partitions and aggregations you have; given that the SQLCat team are now saying that partitions of 50-60 million rows each are feasible in 2008, and that you can use different partition designs for partitions containing old and new data, there’s probably a lot of scope for doing this in most cubes. Hopefully, though, this problem will be fixed in a future release…

    9/20/2009

    adCenter and SSAS 2008 White Paper

    There’s a new white paper on how adCenter uses Analysis Services 2008 available here:
    http://technet.microsoft.com/en-us/library/ee410017.aspx

    For the most part it’s a fairly run-of-the-mill list of reasons to upgrade to 2008 (and, to be honest, if you’re running 2005 I can’t think of a good reason not to upgrade to 2008 – the performance is so much better) but there are several good nuggets on performance tuning for large SSAS implementations; for example, table 3 has a list of ini file settings for optimising processing. Worth a look.

    7/10/2009

    Analysis Services Connect digest

    It’s a fairly popular thing for SQL Server bloggers and MVPs to put together lists of items on Connect (bugs that need fixing, new bits of functionality that should be in a future version, etc) and ask readers to vote on them so they get more visibility. Jamie Thomson did a post for SSIS the other day, for instance:
    http://blogs.conchango.com/jamiethomson/archive/2009/07/05/ssis-connect-digest-2009-07-05.aspx

    While I’ve done something similar in the past, I thought it would be a good idea to put together a more comprehensive list of Connect items for SSAS in the hope that some of the more important issues will be addressed in the next major release. So, please look down the list below and vote on anything that you think is important!

    1. MDX needs a special division operator that returns null instead of infinity when dividing by zero or null. Why do we always have to trap this in code ourselves? I have never, ever wanted to return infinity from a calculation!
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=448127
    2. BI Development Studio is ridiculously slow doing Deployment or Saving (when in Online mode) sometimes. This isn’t a BIDS issue, of course, more of a reflection on how long the XMLA it’s generating takes to run, but still it wastes so much time…
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=447405
    3. It would be cool if, after every time the server crashed, SSAS could make a copy of the current Flight Recorder trace file so we can see what was happening on the server. I know PSS can get this information from the mdmp files that get generated, but sometimes I want to do some detective work of my own before opening a support call:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=442606
    4. Attribute relationships serve a second purpose as member properties, but sometimes you want to display an attribute as a member property of another attribute when there isn’t a direct relationship between them. This leads people to creating either redundant attribute relationships or duplicate attributes in the dimension, both of which are equally bad:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=436023
    5. We have role-playing dimensions, so why not role-playing measure groups? There are a few times when I’ve wanted to reuse the same measure group with dimensions joining onto different columns (eg when working with many-to-many relationships), and not have to create and process multiple different physical measure groups:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=436081
    6. Apart from the Slowly-Changing Dimension support that we already have, it would be useful to be able to create different versions of the same dimension and allow users to choose which version they want to see. This would allow us to expose snapshots of what a dimension looked like at any given point in time:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=436064
    7. Richard Tkachuk’s approach for handling ‘events in progress’ highlights the fact that SSAS isn’t very good at dealing with what is a reasonably common BI problem. Let’s have a new ‘Range’ dimension relationship type to do this out of the box:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=428465
    8. Let calculated members in MDX return sets. This would allow us to optimise calculations where expensive set operations currently have to be duplicated:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=426719
    9. At the moment, the IgnoreUnrelatedDimensions property controls what happens for measures when you query them with dimensions that have no relationship with the measure group, and when you query them with attributes below the granularity of the measure group. Very often, though, I do not want the same behaviour for these two scenarios, so can we have two different properties please? Or even be able to set this on a per-dimension basis?
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=350232
    10. At the moment, when using connection string properties in SQLMS, after you’ve set them the properties get persisted for all future connections without being visible in the connection dialog. Either don’t persist the properties or show us which ones are being set!
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=358956
    11. No more arbitrary-shaped set errors! When writing scoped assignments, I want to be able to scope on whatever area in the cube I want; at the moment, I often find I have to repeat the same assignment several times to get around the requirement that I can’t assign to an arbitrary-shaped set.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=339861
    12. We can’t currently deny access to an entire dimension, rather than individual members on it, yet this is another common requirement:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=233410
    13. It would be cool to be able to parameterise server-side MDX calculated members and named sets:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=188842
    14. Here’s an old one: we really, really need to be able to rename hierarchies on each instance of a role-playing dimension. Not being able to do so seriously limits the usability of role-playing dimensions.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=144500
    15. Similarly, I’d like to be able to set individual All Member captions for all of the attributes on a dimension:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=144234
    16. One mistake I make all too often is change something in the cube, redeploy it, process and then at the very end of processing have it fail because of an MDX Script error. I know it’s possible to use the ScriptErrorHandling mode to do something similar, but that’s a bit dangerous to set in production. What I have is an option for processing in a dev environment that is set by default and allows you to ignore MDX Script errors if any occur.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=144248
    17. Support for cross-dimensional user hierarchies would be useful for two reasons: first it would help us create (and possibly optimise) complex drillpaths; second, it would help when you have a ROLAP attribute that you want to expose only as the lowest level of a user hierarchy, to encourage people to cut down the scope of their query before they used it. Another approach to solving this second problem would be the ability to have attributes on the same dimension with different storage modes (which Greg Galloway and Harsh suggested on my older post).
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473684
    18. Dimensions are getting bigger and bigger, and it would be useful to be able to partition them for manageability and performance reasons. For example, doing this would allow you to delete some members from the dimension quickly by deleting the partition; you could also speed up Process Updates by only processing the partitions where dimension members have changed.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473687
    19. I find it frustrating that dimension calculations (ie custom member formulas, unary operators etc) can’t be edited from the MDX Script. I want all my MDX calculations in one place!
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473688
    20. It would be really cool if we could declare parameterised MDX Select statements on the server, and then allow client tools to expose them as pre-canned queries, similar to what can be done with stored procedures in the relational world. This would mean that as developers we could write complex queries that most end users would never be able to create themselves (even if their client tools were capable of building them). Similarly, it would be useful if we could declare custom MDX functions in the MDX Script: for example, you could declare a complex set operation as a new function, then use it in your own calculations and also expose it to users so that they could use it in their own client tools.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473694
    21. Fix parent/child hierarchies! They’re so commonly used, but there are so many ‘known issues’… I want to:
      1. Them to perform better, perhaps by building aggregations within them
      2. To be able to scope calculations on them properly
      3. Have more flexibility building them, by creating more than one per dimension and not having to build them from the key attribute
        https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473698
    22. I’d like calculated members to do everything, and have all of the same properties, as real members. This means I want to be able to
      1. Control the order they appear on a level, and mix them in with real members
      2. Allow them to have other calculated members as children
      3. Allow them to have member properties
      4. Allow all properties (member properties as well as properties such as Name) to be dynamically generated using an MDX expression
      5. Be able to dynamically generate whole sets of calculated members with a single expression; this would, for example, be useful for doing pareto analysis or dynamically generating bandings.
      6. Be able to expose the MDX definition as a property visible in MDX
      7. Be able to secure calculated members with dimension security
        https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473693
    23. From Jamie Thomson (from a long time ago) - automating partition generation when new data appears in the cube is a pain – this is something SSAS should be able to do for us:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127378
    24. One from Thomas Ivarsson here – it would be good to have a flag for the BottomCount function to get it to automatically remove empty tuples from the set we’re passing in:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=472560
    25. From Darren Gosbell – we really need an official, built-in way of making calculations work properly when the current context is a set (for example when doing a multiselect results in a set in the Where clause):
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=267570
    26. Also from Darren – at present, if you rename a database in SQLMS you then can’t deploy to this database from BIDS:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=281595
    27. From Marco Russo – at the moment, the DSV still has problems with handling TinyInt columns, casting them to different types in different scenarios:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126798
    28. From Marco Russo – fix drillthrough! It’s almost unusably slow at the moment on large cubes and it doesn’t work with semi-additive measures, let alone calculated measures. This is such an important bit of functionality it needs to work properly:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=403083
    29. Following on from this, we need better control over the SQL that Analysis Services generates full stop. I’d like more influence over how SQL is generated for supported relational data sources and also official support (and help) for creating cartridges for new relational data sources.
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=473689
    30. A tool that displayed MDX query plans would make performance tuning queries much easier:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=321161
    31. From David Clem – you can’t currently control the order in which calculated measures are displayed. It would be useful if there was a ‘display order’ property that allowed you to do this:
      https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=294907

    Of course, if you’ve got your own issues open on Connect that you’d like to publicise, please leave a comment below…

    7/6/2009

    Disk Partition Alignment white paper

    Via Greg Lowe, I see there’s a new white paper out on disk partition alignment for SQL Server:
    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=5b343389-f7c9-43d0-9892-ddcf55890529

    Of course its main focus is the SQL Server relational engine, but it’s all equally relevant to Analysis Services. If you’re running Analysis Services on Windows 2003 or before (or even if you’re running Windows 2008 but your partitions were created on an earlier OS) then this is a must-read.

    6/18/2009

    Microsoft .NET Provider for Oracle dead

    Products, die, come back to life, die again… Anyway, those of you building cubes from Oracle data sources may be interested to know that the Microsoft .NET provider for Oracle has been killed off. Here’s the official announcement-disguised-as-a-blog-entry:
    http://blogs.msdn.com/adonet/archive/2009/06/15/system-data-oracleclient-update.aspx

    Yet again, it’s those persuasive “customers, partners, and MVPs” who are asking for the product to be canned. I just hope that there aren’t any “customers, partners, and MVPs” who are asking for Steve Ballmer to sell Microsoft to Google for $1, because then we’ll all be in trouble... Anyway, needless to say, not everyone is pleased:
    http://www.theregister.co.uk/2009/06/18/microsoft_kills_oracle_connector/

    One blog entry I’ve had half-written for a long time is a discussion of the problems faced when building cubes from Oracle data sources (thanks for your help on this David; anyone else with any hints, please leave a comment). There are a lot of bugs and data type issues that need to be dealt with, and from what I’ve gathered using the Microsoft .NET provider was part of how they could be worked around. 

    5/4/2009

    SSIS Trace File Source Adapter now available

    Hurray! At long last, my friends at www.sqlis.com have got round to making their Trace File Source Adapter for SSIS available:
    http://www.sqlis.com/post/Trace-File-Source-Adapter.aspx

    What’s this got to do with Analysis Services? Well, as the post mentions, there are tons of cool things you can use this for when you’ve got some Analysis Services trace files: cache warming and usage monitoring among other things. It works for SQL Server relational engine traces too, if you care about that sort of thing…

    4/24/2009

    SQLSentry Performance Advisor for Analysis Services

    I’m currently in Germany at the PASS European Conference. I ran a pre-conf with Allan Mitchell two days ago on monitoring SSAS, SSRS and SSIS, an expanded version of the material I used at the PASS Summit last year; we talked about how you could build a monitoring solution for the MS BI stack yourself, but for the last few months I’ve also been talking to the guys at SQL Sentry – who are also here exhibiting – about their new product called Performance Advisor for Analysis Services which has just been made public. I’ve had an in-depth demo and my first impression is that this is the first Analysis Services monitoring solution that I would actually want to use; it looks really, really good. Here’s a screenshot:

    I believe SQL Sentry have the first solution that will monitor the SQL Server relational engine, Analysis Services, and SSIS and SSRS as well. As I’ve said before, I can’t believe it’s taken this long for a big third party vendor to notice the MS BI monitoring market! You can find out more and sign up for the beta here:
    http://www.sqlsentry.net/performance-advisor/sql-server-analysis-services.asp
    I’ll be getting hold of a beta version myself soon, and I’ll blog in more detail then.

    Of course, collecting this information is one thing – actually understanding what it tells us, and using that information effectively, is something else completely. To be honest, it’s only after the prep work I did for my precon that I’ve started to think about this problem properly. Indeed, it’s only really when this kind of monitoring data is easily accessible that best practices for what to monitor can emerge; perhaps the SQLCat team need to write a white paper on this subject? In our preconf Allan also did a few cool demos using data mining on perfmon data to predict when a server is going to ‘go pop’ (in his words), and I think there are a lot of interesting possibilities here too.

    4/20/2009

    Using MDX to browse the file system

    One very obscure feature of Analysis Services that I’ve only ever seen documented in the books “Microsoft SQL Server Analysis Services 2005” and its successor “Microsoft SQL Server 2008 Analysis Services Unleashed” (both highly recommended for advanced SSAS users, by the way) is the fact that you can use MDX to browse the file system of the server that Analysis Services is running on, with some restrictions. Full details on this can be found on P804 of the 2008 book but it neglects to give any real examples of how to do it, so since it took me a few minutes to work out the correct syntax to use I thought it was worth a blog post.

    There are four MDX extensions you can use, which can be executed from SQL Management Studio just like any other MDX statement.

    • SystemGetLogicalDrives returns a list of the drives on your AS box, with the amount of free space.
    • SystemGetSubdirs returns a list of directories under a specified path. This only actually works on directories that you have listed on the AllowedBrowsingFolders server property for Analysis Services (which you can find in SQL Management Studio by right-clicking on your instance name in the Object Explorer and clicking Properties) and their parents, so on my machine the following query returns only the Program Files directory:
      SystemGetSubdirs 'C:\'
    • SystemGetFiles returns all the files in a given directory, again only working for the directories listed in AllowedBrowsingFolders. For example on my machine:
      SystemGetFiles 'C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Backup\'
      returns all the files in my backup directory because that’s automatically included in AllowedBrowsingFolders, whereas
      SystemGetFiles 'C:\'
      returns an empty result set.
    • SystemGetFileExists returns whether a file exists or not in a directory you’re allowed to browse. So, for example:
      SystemGetFileExists 'C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Backup\MyBackupFile.abf'
      will return 1 if the file MyBackupFile.abf exists, or 0 if it doesn’t.

    Clearly this functionality is quite useful if you’re automating things like backups and you don’t have administrative rights to the server file system, although you need to be an Analysis Services administrator to do this.

    4/14/2009

    Update Isolation Level connection string property

    I recently came across a new connection string property, Update Isolation Level, mentioned briefly in BOL but with so few accompanying details that I had no idea how it could be used. Here’s what is currently documented:

    From http://technet.microsoft.com/en-us/library/ms145488.aspx :

    Important:

    When updated cells do not overlap, the Update Isolation Level connection string property can be used to enhance performance for UPDATE CUBE. For more information, see ConnectionString.

    From http://msdn.microsoft.com/en-us/library/microsoft.analysisservices.adomdclient.adomdconnection.connectionstring.aspx :

    Update Isolation Level

    None

    Specifies whether the cells in a single UPDATE CUBE statement are isolated from each other. The default is not isolated.

    Luckily Akshai Mirchandani and Marius Dumitru from the dev team were able to provide me more details on it…

    Basically, Update Isolation Level is useful when you are using writeback and you are writing values back to multiple cells within the same UPDATE CUBE statement. This point is important: you won’t see a benefit if you’re just writing back to one cell in your UPDATE CUBE statement, or if you’re issuing multiple single-cell UPDATE CUBE statements within a transaction. What it allows you to do is to say that each tuple within the SET clause in the UPDATE CUBE statement doesn’t overlap with any of the others; if that’s true, AS doesn’t need to do as much work at execution time and so the UPDATE CUBE statement runs faster.

    Let’s take a look at some examples (I’ve used a simple cube I built based on the Adventure Works database, but it’s not Adventure Works). First of all, the following UPDATE CUBE statement contains two tuple assignments in the SET clause which overlap:

    UPDATE CUBE [Sales]
    SET
    ([Measures].[Sales Amount], [Order Date].[Date].&[20010701]
    , [Product].[Product].&[1], [Customer].[Country - Customer].&[21383]) = 10,
    ([Measures].[Sales Amount]) = 10

    The first tuple assigns the value 10 to Sales Amount for a particular Date, Product and Customer; the second assigns the same value to a tuple that contains just Sales Amount, and this tuple overlaps with the previous tuple because its value is the aggregation of all Sales Amounts for all Dates, Products and Customers. However, in the following example the two tuples do not overlap because they assign to Sales Amount for different Products:

    UPDATE CUBE [Sales]
    SET
    ([Measures].[Sales Amount], [Order Date].[Date].&[20010701]
    , [Product].[Product].&[1], [Customer].[Country - Customer].&[21383]) = 2,
    ([Measures].[Sales Amount], [Order Date].[Date].&[20010701]
    , [Product].[Product].&[486], [Customer].[Country - Customer].&[21383]) = 3

    The Update Isolation Level connection string property has two possible values:

    • Update Isolation Level=1 means that each tuple update is guaranteed to be Isolated, ie the tuples don’t overlap
    • Update Isolation Level=2 means that each tuple update is not guaranteed to be Isolated

    So, in theory, for the second example above if you set Update Isolation Level=1 in the connection string, the statement should run faster. I couldn’t see any differences in my limited testing but I’m assured that in more complex scenarios there are improvements – how much will depend on the amount of overlap, the structure of the cube and so on.

    4/12/2009

    SSAS2008 and Kerberos

    Kerberos is notoriously difficult to set up – it’s something I’ve always left to someone else to do! Anyway, I’ve just come across a very detailed white paper from Boyan Penev and Mark Dasco about setting up Kerberos for Microsoft BI which I thought was worth linking to for future reference:
    http://bp-msbi.blogspot.com/2009/04/enhanced-security-and-integration-of.html

    However, I’ve heard from a number of sources (Dr John Tunnicliffe; Dan English) recently about problems with getting Kerberos to work with Windows 2008 Server or Vista, and this post from John Desch at MS confirms the bug – worse, it doesn’t seem to be fixed in SP1 (which got released this week):
    http://blogs.msdn.com/psssql/archive/2009/04/03/errors-may-occur-after-configuring-analysis-services-to-use-kerberos-authentication-on-advanced-encryption-standard-aware-operating-systems.aspx

    One to watch out for…

    3/31/2009

    SQL Solutions OLAP Heartbeat and OLAP Performance Advisor

    Hmm, you wait for years for commercial tools for monitoring Analysis Services (the only one I’d ever seen before was Companion for Analysis Services from SQLMinds) and then two come out at once. One of these tools I’ll be blogging about towards the end of this week, hopefully – I’ve had a sneak preview and it looks very cool – but today I found out the following from SQL Solutions:

    I’ll download both and give them a thorough test as soon as I can.

    3/19/2009

    Query behaviour when you cross the LowMemoryLimit

    Here’s something that had me stumped for quite a long time earlier this week…

    I’m doing some performance tuning work for a customer at the moment, and following my usual approach I took a test query, cleared the cache, ran it and noted how long it ran for and what I saw in Profiler and MDX Studio. However this time I saw some strange things happening: I’d run the same query on a cold cache more than once and the query times would differ by anything up to a minute; even more confusing, the information I got from Profiler and MDX Studio would be different too – for example, the number of Query Subcube Verbose events in recorded in Profiler for a query might differ from run to run by several hundred. This had me completely confused. There was no-one else on the server and I was running on a cold cache. What was going on?

    It turned out that I was running out of memory. I was working on a 32-bit box with a lot of databases on, so even on a cold cache the amount of memory used by SSAS was approaching the LowMemoryLimit threshold (see Marco’s post here, the AS2005 Performance Guide and this white paper for an explanation of what this is). When I ran the query – which had a lot of calculations - memory usage went above the LowMemoryLimit so the cleaner thread was waking up and evicting elements of shrinkable memory, which presumably included data in cache that had been put there by the current query. Since the exact moment that this threshold was crossed would vary from run to run, and as I assume that what the cleaner thread would evict from memory would also vary, this explained the different query times and Profiler activity. As soon as I moved onto a 64-bit box with no other databases this stopped happening.

    Of course nowadays I would expect that most people are running on 64-bit so it’s much more unlikely that you’ll ever cross the LowMemoryLimit, but I do see it happen from time to time. So if you’re in the same position and suffer from inconsistent query performance, check your memory usage!

    3/12/2009

    BI Survey 8

    As I think I've probably said before, I don't get many freebies as a blogger but one that I do get and I really appreciate is my review copy of the BI Survey. It being that time of year, I got the latest edition - the BI Survey 8 - last week along with an email encouraging me to blog about it, and who am I to refuse a request to blog about something as fascinating as this? It's a whopping 489 pages long so I can't even begin to summarise it, but there are some points regarding Microsoft BI that I'd like to pick up on.

    In general, the findings show the Microsoft BI stack as a solid and successful suite, extremely good value for money, but by no means a stellar performer. There's a long section at the end of the report showing a wide range of technical and project-related KPI ratings (for example query performance, business benefits achieved, quality of support, cost of deployment and so on) and in almost every respect Analysis Services and Reporting Services come out in the middle of the rankings. This doesn't really surprise me much: let's face it, what we've got works well, but from a technical perspective there's not been much new and exciting in the world of Microsoft BI for a while now (although with Gemini and Madison coming soon that will change), the platform in general still has some glaring holes and overlaps, and as the PerformancePoint debacle showed recently Microsoft's overall BI strategy is somewhat confused. In fact, I suspect Microsoft's entire BI strategy is not a BI strategy at all but a get-people-to-upgrade-to-the-latest-version-of-Office strategy, but I digress...

    For Analysis Services in particular, the survey showed 15% of those surveyed were on AS2K, 79.6% were on AS2005 and 3.5% on AS2008. Given that most people must have been surveyed last summer, well before the RTM of AS2008 that's pretty good and with the migration path from 2005 to 2008 very smooth I should think the AS2008 percentage will look much better next year. Analysis Services remains the top-ranked BI tool used against Microsoft's own databases, as you'd expect since it's essentially free with SQL Server, but it also comes in a close third for Oracle (while Oracle's own BI tools come in 12th!), second for IBM, third for Teradata and top for Open Source databases.

    In terms of the client tools used with Analysis Services, just over 70% of people were using one of Microsoft's own tools - mostly pivot tables and Reporting Services. 25% of SSAS users had Proclarity, and interestingly there were more people using the old free Excel addin than PerformancePoint , and a sizeable minority still using Data Analyzer. There's clearly a lot of demand for a client tool from Microsoft but from the looks of things most people are still stuck with Office 2003; this just adds weight to my argument that Microsoft coupling its BI strategy so closely to Office might help Office adoption rates but has a serious negative impact on the success of BI strategy itself... sorry... there I go again... As far as third-party tools go Panorama is still the #1 vendor, but it only has an 11% share and only 22% of SSAS users overall had any kind of third party tool in use. Incidentally Panorama was treated as a BI vendor in its own right for the purposes of the survey and came out top in a lot of the high-level KPIs including the overall KPI ranking - they obviously have some very enthusiastic customers.

    So who, apart from Panorama, seems to be doing well? Qlikview certainly is, which at least validates Microsoft's decision to go after that market with Gemini; Microstrategy does well too. Essbase comes out badly with great query performance offset by poor support and product quality; Cognos, apart from TM1, doesn't seem to do too well either. Clearly it's the vendors who are 100% focused on BI that are the most successful, which is as you'd expect.

    3/4/2009

    Analysis Services and the System File Cache

    Earlier this week Greg Galloway sent me an email about some new code he'd added to the Analysis Services Stored Procedure Project to clear the Windows system file cache:
    http://asstoredprocedures.codeplex.com/Wiki/View.aspx?title=FileSystemCache

    I thought this was quite interesting: several times when I've been doing performance tuning I've noticed that the same query, running on a cold Analysis Services cache, runs much slower when the cube has just been processed. This I put down to some form of caching happening at the OS level or below that; I also vaguely knew that it was a good idea to limit the system file cache, having seen the following on Jesse Orosz's blog:
    http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!295.entry

    Anyway, doing some more research on this subject I came across the following blog entry that discusses the problem of excessive caching in more detail:
    http://blogs.msdn.com/ntdebugging/archive/2009/02/06/microsoft-windows-dynamic-cache-service.aspx
    ...and announces a new tool called the Microsoft Windows Dynamic Cache Service that aims to provide a better way of managing the system file cache:
    http://www.microsoft.com/downloads/details.aspx?FamilyID=e24ade0a-5efe-43c8-b9c3-5d0ecb2f39af&displaylang=en

    Has anyone got any experience with this? From what I can see, installing the Dynamic Cache Service on a 64-bit SSAS box with a big cube on looks like a good idea - has anyone tried it? If you have, or are willing to, can you let me know how you get on? Comments are welcome...

    2/25/2009

    Madison and SSAS?

    On Monday Microsoft announced "SQL Server Fast Track", a set of reference architectures for data warehousing with SQL Server. This has all been blogged very well by others, so if you're interested in finding out more I suggest you read Peter Koller:
    http://peterkol.spaces.live.com/Blog/cns!68755AEAC31F9A6C!1022.entry
    and Curt Monash:
    http://www.dbms2.com/2009/02/23/microsoft-sql-server-fast-track/

    A couple of accompanying white papers have also been released, though, and I was reading this one:
    http://msdn.microsoft.com/en-us/library/dd458815.aspx
    When I noticed the following statement:
    Project code name "Madison" is the upcoming Microsoft scale-out solution for very large data warehouses (VLDW). Madison is based on the MPP technology developed by DATAllegro and the proven SQL Server 2008 database. Madison expands the DATAllegro hub-and-spoke solution to include not only MPP appliances but also standard symmetric multi-processing (SMP) instances of SQL Server 2008 and SQL Server Analysis Services, (SSAS), allowing either to be viewed as nodes within a grid.

    and also:
    With the upcoming release of Madison, MPP scalability and grid connectivity can be taken to a new level. Madison expands the DATAllegro hub-and-spoke solution to include not only MPP appliances but also standard SMP instances of SQL Server 2008 and SSAS to be viewed as nodes within a grid. A grid of SMP databases and MPP appliances can be used as the basis for any large-scale data warehouse environment or architecture. However, it is particularly suitable for a hub-and-spoke architecture.

    So Analysis Services is clearly going to be supported as part of Madison somehow, as a 'node within a grid'. What does this mean exactly? I'm not really sure. The focus of the paper is the 'Hub and Spoke' architecture and how Madison will enable this through its ability to transfer large amounts of data quickly via 'high speed, parallel transfers' over its grid. The following DATAllegro white paper offers some more detail on this:
    http://www.datallegro.com/grid/faq_datallegro_grid.pdf

    Maybe I'm reading too much into the specific references to SSAS above, but it does seem like something is afoot with Madison and SSAS even if it is just that we'll get a quick way of moving SSAS databases around. I suppose we'll find out soon enough...