| Chris's profileChris Webb's BI BlogBlogLists | Help |
|
|
11/11/2009 Ragged Hierarchies, HideMemberIf and MDX CompatibilityHere’s something that I suspect a few people out there will be surprised I didn’t know – but as far as I’m concerned, if I didn’t know it then it’s worth blogging about. Anyway, it regards ragged hierarchies, the HideMemberIf property and the MDX Compatibility connection string property. Now you probably know that if you want to turn a user hierarchy into a ragged hierarchy (perhaps to avoid using a parent child hierarchy) you need to use the HideMemberIf property on the user hierarchy. For example, imagine you were using the following SQL query as the basis of your customer dimension: SELECT 'UK' AS Country, 'Bucks' AS State, 'Amersham' AS City, 'Chris Webb' AS Customer We could build a dimension off this with attributes for Country, State, City and Customer, and for two out of our three customers that would be fine. However the Pope lives in the Vatican, which is (at least for the purposes of this exercise) a Country with no concept of City or State; and in the case of customers who live in the Vatican, we just want to be able to drill down on the Country ‘Vatican’ and see all of the Customers who live there without drilling down through a meaningless State and a City. So what we can do is build a user hierarchy on our dimension with levels Country, State, City and Customer, and on the lower three levels set the HideMemberIf property to OnlyChildWithParentName: Then, with any sensible client tool, we can connect to the cube and browse the dimension as we want: I saw ‘sensible’ client tool, because of course this only works if you set: This much I knew. However, what I didn’t realise until last week when I was moaning about this to TK Anand from the SSAS dev team at PASS, is that for some ragged hierarchies you don’t need to set the MDX Compatibility connection string property at all. For example, if in our case we duplicate the Customer upwards rather than the Country downwards, like so: SELECT 'UK' AS Country, 'Bucks' AS State, 'Amersham' AS City, 'Chris Webb' AS Customer …and then build the dimension, setting HideMemberIf on our user hierarchy to OnlyChildWithParentName, we can get the result we want without setting the MDX Compatibility property. Here’s a screenshot of this new dimension in Excel just to prove it: The difference here is that we’re hiding all members below the State level right down to the bottom of the hierarchy, rather than hiding members somewhere in the middle of the hierarchy. Truly, this is one of those “Doh, if only I’d known!” moments… this at least means that in some of the scenarios where you’d use ragged hierarchies you can get them to work with Excel, even if it means that we have to hack the data (‘The Pope’ is a Customer, not a State or a City). On the other hand there are plenty of scenarios where you do need to hide members in the middle of a hierarchy, and frankly I don’t see why Excel 2007 can’t set MDX Compatibility=2 in its connection string so they work properly. 11/5/2009 Live Blogging @PASS - SSAS Consolidation and VirtualisationHere are some notes from the SQLCat team’s session on SSAS consolidation and virtualisation; they’re a bit fragmentary since I’m too busy paying attention to what’s being said! I get asked about these issues by my customers all the time.
The most useful session so far at this conference for me – I learned a lot. Quest add support for SSAS monitoringSomething I saw yesterday at PASS: Quest now have support for monitoring SSAS from their “Spotlight on SQL Server Enterprise” product. See http://www.quest.com/newsroom/news-releases-show.aspx?contentid=10602 It’s pretty basic at the moment – they capture some Perfmon counters and data from schema rowsets, but no trace data – and nowhere near as sophisticated as what SQLSentry have, but it’s good to see another vendor entering this market. 10/26/2009 Actions and MultiselectAt the beginning of this week a customer asked me why, in a certain third-party client tool that shall remain nameless, they could no longer do a drillthrough when they did a multiselect on a filter axis. It seemed a bit weird to me, and it got weirder when I asked around for ideas and Greg Galloway pointed out that Excel 2007 didn’t show any actions at all when there was a multiselect, and Marco Russo noted that the current beta of Excel 2010 didn’t either. This made me wonder whether the problem was in fact with Analysis Services rather than the client tools… I didn’t actually know how a client tool worked out what actions were available when, so I did some research and found out that the MDSCHEMA_ACTIONS schema rowset was how it was done. Here’s the documentation on MSDN: For example, if a client tool needs to know which actions can be called when a user clicks on a cell in a resultset, then it will execute an XMLA command something like this one on Adventure Works: 1: <Discover xmlns="urn:schemas-microsoft-com:xml-analysis"> 2: <RequestType>MDSCHEMA_ACTIONS</RequestType> 3: <Restrictions> 4: <RestrictionList xmlns="urn:schemas-microsoft-com:xml-analysis" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> 5: <CUBE_NAME>Adventure Works</CUBE_NAME> 6: <ACTION_TYPE>401</ACTION_TYPE> 7: <COORDINATE>([Customer].[Country].&[Australia],[Measures].[Internet Order Quantity])</COORDINATE> 8: <COORDINATE_TYPE>6</COORDINATE_TYPE> 9: </RestrictionList> 10: </Restrictions> 11: <Properties> 12: </Properties> 13: </Discover> You can see that a tuple is being passed into the COORDINATE to indicate which cell we’re interested in. But when there’s a multiselect, which cell in the cube are we actually clicking on? Good question… Different client tools handle multiselect in different ways, and it turns out there’s no way of telling SSAS you’re doing a multiselect in this situation. If you try to pass a set of tuples to the COORDINATE you get no actions returned, for instance. Having talked this over with Akshai Mirchandani from the dev team, what the client tool needs to do is to make multiple calls to MDSCHEMA_ACTIONS, one for each member selected in the multiselect. It then needs to work out from each of the rowsets returned which actions should be available in the current context – and of course, in this case, there’s a good chance that different client tools will do different things (if they do anything at all). Not ideal. To be honest, this really needs to be something that is solved in SSAS rather than on the client and the key to solving it properly would be to have a standard way of handling and detecting multiselect in MDX. As Mosha hinted here, it’s something that’s been on the dev team’s radar for a while but it’s still not made it into the product unfortunately. In the meantime, if there are any client tool developers from the Excel team or third parties out there reading this, it would be great if you could at least do something rather than nothing here! 9/30/2009 Building a Better Cache-Warmer, Part 1: the Storage Engine CacheSSAS caching is an interesting subject: I’ve blogged numerous times about when caching can and can’t take place, and one of the most popular posts I’ve ever written was on how to build a cache-warmer in SSIS. However that last post was written a while ago, before the CREATE CACHE statement for warming the Storage Engine cache was reintroduced in SSAS 2005 SP2, and I’ve not been satisfied with it for a number of other reasons too. So as a result I’ve started to do more research into the issue and, with the help of Mosha, Akshai Mirchandani, and the chapter 29 of the indispensible “Microsoft SQL Server 2008 Analysis Services Unleashed” I’ve been collecting my thoughts on how to build a better cache warmer with a view to blogging about it again. This first post will deal with the relatively easy bit – warming the SE cache – and when I have time I’ll write about the more difficult but more useful task of warming the Formula Engine cache, and by hopefully then I’ll be in a position to create the definitive SSIS cache warmer package and be able to blog about that too. Before I go on I need to mention that one other reason that has made me want to revisit this subject is that, at his MDX Deep Dive session at PASS last year, Mosha asserted that cache-warming was a Bad Thing. This intrigued me, since it seemed like such an obviously Good Thing to do, but of course it turns out he had good reason for saying what he did… SSAS caching is a lot more complex than it first appears and cache warming done badly can be counter-productive. I’m also tempted to say that if you don’t need to warm the FE cache (ie you don’t have any complex calculations that take a long time to run) then there’s not much point in warming the SE cache. In my experience if you have the right aggregations built then you can get excellent response times even on a cold cache. Of course to build aggregations you need to know the granularities of data that your users’ queries are, but you need the same information to warm the cache. The only difference would be that you can make your cache warmer much more dynamic than your aggregation design: if your users’ query patterns change regularly it would be difficult to keep your aggregation design in synch with them, whereas with a cache-warmer you can record the most recent queries that have been run and warm the cache accordingly. There are a number of things to bear in mind when warming the SE cache:
Of course all of the above is subject to change in later versions, but I understand that this is the current situation with SSAS 2008 SP1. How can we find out what data to use to warm the SE cache then? The MDX queries that have been run by users aren’t much help, but we can see what subcubes are being requested when a query is run by looking at the Query Subcube and Query Subcube Verbose events in Profiler; we could therefore run a Profiler trace to capture this data, but in fact the easiest option is to just use the Usage Based Optimization wizard’s Query Log since it contains the same data as Query Subcube. The following blog entry from James Snape shows how to interpret this information: Last of all, how do we actually go about loading data into the SE cache? We could run MDX queries but this is a bit of a risky approach since we may not know what calculations are present and so we can’t be sure that our query requests the right data; the best option is to use the CREATE CACHE statement that is described here: 9/23/2009 Slow metadataThere’s an interesting thread on the Analysis Services MSDN Forum that I’ve been following for a while now (it’s over 40 posts long at the time of writing) that gives a good explanation of why metadata access on very large or complex cubes can be so painfully slow. Here’s the link: In summary, it turns out that the more files there are on disk in your SSAS database folder, the slower metadata reads become, and when you reach very large numbers of files almost any kind of metadata access becomes incredibly slow. There are a lot of different factors determining how many files are created for your SSAS database but large numbers of cubes, attributes, partitions and aggregations (which end up being multiplied across each partition) are the main culprits; the problem is that the way the EstimatedSize property on metadata objects has been implemented results in SSAS having to iterate over all the files in every subdirectory in the database’s directory in the data folder to find their size. One workaround is to set the Storage Location property of your partitions to somewhere outside the data directory since this means they won’t get scanned, but doing this makes backup and restore difficult and means you get inaccurate size estimates. The only other option is to reduce the number of partitions and aggregations you have; given that the SQLCat team are now saying that partitions of 50-60 million rows each are feasible in 2008, and that you can use different partition designs for partitions containing old and new data, there’s probably a lot of scope for doing this in most cubes. Hopefully, though, this problem will be fixed in a future release… 9/20/2009 adCenter and SSAS 2008 White PaperThere’s a new white paper on how adCenter uses Analysis Services 2008 available here: For the most part it’s a fairly run-of-the-mill list of reasons to upgrade to 2008 (and, to be honest, if you’re running 2005 I can’t think of a good reason not to upgrade to 2008 – the performance is so much better) but there are several good nuggets on performance tuning for large SSAS implementations; for example, table 3 has a list of ini file settings for optimising processing. Worth a look. 7/10/2009 Analysis Services Connect digestIt’s a fairly popular thing for SQL Server bloggers and MVPs to put together lists of items on Connect (bugs that need fixing, new bits of functionality that should be in a future version, etc) and ask readers to vote on them so they get more visibility. Jamie Thomson did a post for SSIS the other day, for instance: While I’ve done something similar in the past, I thought it would be a good idea to put together a more comprehensive list of Connect items for SSAS in the hope that some of the more important issues will be addressed in the next major release. So, please look down the list below and vote on anything that you think is important!
Of course, if you’ve got your own issues open on Connect that you’d like to publicise, please leave a comment below… 7/6/2009 Disk Partition Alignment white paperVia Greg Lowe, I see there’s a new white paper out on disk partition alignment for SQL Server: Of course its main focus is the SQL Server relational engine, but it’s all equally relevant to Analysis Services. If you’re running Analysis Services on Windows 2003 or before (or even if you’re running Windows 2008 but your partitions were created on an earlier OS) then this is a must-read. 6/18/2009 Microsoft .NET Provider for Oracle deadProducts, die, come back to life, die again… Anyway, those of you building cubes from Oracle data sources may be interested to know that the Microsoft .NET provider for Oracle has been killed off. Here’s the official announcement-disguised-as-a-blog-entry: Yet again, it’s those persuasive “customers, partners, and MVPs” who are asking for the product to be canned. I just hope that there aren’t any “customers, partners, and MVPs” who are asking for Steve Ballmer to sell Microsoft to Google for $1, because then we’ll all be in trouble... Anyway, needless to say, not everyone is pleased: One blog entry I’ve had half-written for a long time is a discussion of the problems faced when building cubes from Oracle data sources (thanks for your help on this David; anyone else with any hints, please leave a comment). There are a lot of bugs and data type issues that need to be dealt with, and from what I’ve gathered using the Microsoft .NET provider was part of how they could be worked around. 5/4/2009 SSIS Trace File Source Adapter now availableHurray! At long last, my friends at www.sqlis.com have got round to making their Trace File Source Adapter for SSIS available: What’s this got to do with Analysis Services? Well, as the post mentions, there are tons of cool things you can use this for when you’ve got some Analysis Services trace files: cache warming and usage monitoring among other things. It works for SQL Server relational engine traces too, if you care about that sort of thing… 4/24/2009 SQLSentry Performance Advisor for Analysis ServicesI’m currently in Germany at the PASS European Conference. I ran a pre-conf with Allan Mitchell two days ago on monitoring SSAS, SSRS and SSIS, an expanded version of the material I used at the PASS Summit last year; we talked about how you could build a monitoring solution for the MS BI stack yourself, but for the last few months I’ve also been talking to the guys at SQL Sentry – who are also here exhibiting – about their new product called Performance Advisor for Analysis Services which has just been made public. I’ve had an in-depth demo and my first impression is that this is the first Analysis Services monitoring solution that I would actually want to use; it looks really, really good. Here’s a screenshot:
I believe SQL Sentry have the first solution that will monitor the SQL Server relational engine, Analysis Services, and SSIS and SSRS as well. As I’ve said before, I can’t believe it’s taken this long for a big third party vendor to notice the MS BI monitoring market! You can find out more and sign up for the beta here: Of course, collecting this information is one thing – actually understanding what it tells us, and using that information effectively, is something else completely. To be honest, it’s only after the prep work I did for my precon that I’ve started to think about this problem properly. Indeed, it’s only really when this kind of monitoring data is easily accessible that best practices for what to monitor can emerge; perhaps the SQLCat team need to write a white paper on this subject? In our preconf Allan also did a few cool demos using data mining on perfmon data to predict when a server is going to ‘go pop’ (in his words), and I think there are a lot of interesting possibilities here too. 4/20/2009 Using MDX to browse the file systemOne very obscure feature of Analysis Services that I’ve only ever seen documented in the books “Microsoft SQL Server Analysis Services 2005” and its successor “Microsoft SQL Server 2008 Analysis Services Unleashed” (both highly recommended for advanced SSAS users, by the way) is the fact that you can use MDX to browse the file system of the server that Analysis Services is running on, with some restrictions. Full details on this can be found on P804 of the 2008 book but it neglects to give any real examples of how to do it, so since it took me a few minutes to work out the correct syntax to use I thought it was worth a blog post. There are four MDX extensions you can use, which can be executed from SQL Management Studio just like any other MDX statement.
Clearly this functionality is quite useful if you’re automating things like backups and you don’t have administrative rights to the server file system, although you need to be an Analysis Services administrator to do this. 4/14/2009 Update Isolation Level connection string propertyI recently came across a new connection string property, Update Isolation Level, mentioned briefly in BOL but with so few accompanying details that I had no idea how it could be used. Here’s what is currently documented: From http://technet.microsoft.com/en-us/library/ms145488.aspx :
Luckily Akshai Mirchandani and Marius Dumitru from the dev team were able to provide me more details on it… Basically, Update Isolation Level is useful when you are using writeback and you are writing values back to multiple cells within the same UPDATE CUBE statement. This point is important: you won’t see a benefit if you’re just writing back to one cell in your UPDATE CUBE statement, or if you’re issuing multiple single-cell UPDATE CUBE statements within a transaction. What it allows you to do is to say that each tuple within the SET clause in the UPDATE CUBE statement doesn’t overlap with any of the others; if that’s true, AS doesn’t need to do as much work at execution time and so the UPDATE CUBE statement runs faster. Let’s take a look at some examples (I’ve used a simple cube I built based on the Adventure Works database, but it’s not Adventure Works). First of all, the following UPDATE CUBE statement contains two tuple assignments in the SET clause which overlap: UPDATE CUBE [Sales] The first tuple assigns the value 10 to Sales Amount for a particular Date, Product and Customer; the second assigns the same value to a tuple that contains just Sales Amount, and this tuple overlaps with the previous tuple because its value is the aggregation of all Sales Amounts for all Dates, Products and Customers. However, in the following example the two tuples do not overlap because they assign to Sales Amount for different Products: UPDATE CUBE [Sales] The Update Isolation Level connection string property has two possible values:
So, in theory, for the second example above if you set Update Isolation Level=1 in the connection string, the statement should run faster. I couldn’t see any differences in my limited testing but I’m assured that in more complex scenarios there are improvements – how much will depend on the amount of overlap, the structure of the cube and so on. 4/12/2009 SSAS2008 and KerberosKerberos is notoriously difficult to set up – it’s something I’ve always left to someone else to do! Anyway, I’ve just come across a very detailed white paper from Boyan Penev and Mark Dasco about setting up Kerberos for Microsoft BI which I thought was worth linking to for future reference: However, I’ve heard from a number of sources (Dr John Tunnicliffe; Dan English) recently about problems with getting Kerberos to work with Windows 2008 Server or Vista, and this post from John Desch at MS confirms the bug – worse, it doesn’t seem to be fixed in SP1 (which got released this week): One to watch out for… 3/31/2009 SQL Solutions OLAP Heartbeat and OLAP Performance AdvisorHmm, you wait for years for commercial tools for monitoring Analysis Services (the only one I’d ever seen before was Companion for Analysis Services from SQLMinds) and then two come out at once. One of these tools I’ll be blogging about towards the end of this week, hopefully – I’ve had a sneak preview and it looks very cool – but today I found out the following from SQL Solutions:
I’ll download both and give them a thorough test as soon as I can. 3/19/2009 Query behaviour when you cross the LowMemoryLimitHere’s something that had me stumped for quite a long time earlier this week… I’m doing some performance tuning work for a customer at the moment, and following my usual approach I took a test query, cleared the cache, ran it and noted how long it ran for and what I saw in Profiler and MDX Studio. However this time I saw some strange things happening: I’d run the same query on a cold cache more than once and the query times would differ by anything up to a minute; even more confusing, the information I got from Profiler and MDX Studio would be different too – for example, the number of Query Subcube Verbose events in recorded in Profiler for a query might differ from run to run by several hundred. This had me completely confused. There was no-one else on the server and I was running on a cold cache. What was going on? It turned out that I was running out of memory. I was working on a 32-bit box with a lot of databases on, so even on a cold cache the amount of memory used by SSAS was approaching the LowMemoryLimit threshold (see Marco’s post here, the AS2005 Performance Guide and this white paper for an explanation of what this is). When I ran the query – which had a lot of calculations - memory usage went above the LowMemoryLimit so the cleaner thread was waking up and evicting elements of shrinkable memory, which presumably included data in cache that had been put there by the current query. Since the exact moment that this threshold was crossed would vary from run to run, and as I assume that what the cleaner thread would evict from memory would also vary, this explained the different query times and Profiler activity. As soon as I moved onto a 64-bit box with no other databases this stopped happening. Of course nowadays I would expect that most people are running on 64-bit so it’s much more unlikely that you’ll ever cross the LowMemoryLimit, but I do see it happen from time to time. So if you’re in the same position and suffer from inconsistent query performance, check your memory usage! 3/12/2009 BI Survey 8As I think I've probably said before, I don't get many freebies as a blogger but one that I do get and I really appreciate is my review copy of the BI Survey. It being that time of year, I got the latest edition - the BI Survey 8 - last week along with an email encouraging me to blog about it, and who am I to refuse a request to blog about something as fascinating as this? It's a whopping 489 pages long so I can't even begin to summarise it, but there are some points regarding Microsoft BI that I'd like to pick up on. In general, the findings show the Microsoft BI stack as a solid and successful suite, extremely good value for money, but by no means a stellar performer. There's a long section at the end of the report showing a wide range of technical and project-related KPI ratings (for example query performance, business benefits achieved, quality of support, cost of deployment and so on) and in almost every respect Analysis Services and Reporting Services come out in the middle of the rankings. This doesn't really surprise me much: let's face it, what we've got works well, but from a technical perspective there's not been much new and exciting in the world of Microsoft BI for a while now (although with Gemini and Madison coming soon that will change), the platform in general still has some glaring holes and overlaps, and as the PerformancePoint debacle showed recently Microsoft's overall BI strategy is somewhat confused. In fact, I suspect Microsoft's entire BI strategy is not a BI strategy at all but a get-people-to-upgrade-to-the-latest-version-of-Office strategy, but I digress... For Analysis Services in particular, the survey showed 15% of those surveyed were on AS2K, 79.6% were on AS2005 and 3.5% on AS2008. Given that most people must have been surveyed last summer, well before the RTM of AS2008 that's pretty good and with the migration path from 2005 to 2008 very smooth I should think the AS2008 percentage will look much better next year. Analysis Services remains the top-ranked BI tool used against Microsoft's own databases, as you'd expect since it's essentially free with SQL Server, but it also comes in a close third for Oracle (while Oracle's own BI tools come in 12th!), second for IBM, third for Teradata and top for Open Source databases. In terms of the client tools used with Analysis Services, just over 70% of people were using one of Microsoft's own tools - mostly pivot tables and Reporting Services. 25% of SSAS users had Proclarity, and interestingly there were more people using the old free Excel addin than PerformancePoint , and a sizeable minority still using Data Analyzer. There's clearly a lot of demand for a client tool from Microsoft but from the looks of things most people are still stuck with Office 2003; this just adds weight to my argument that Microsoft coupling its BI strategy so closely to Office might help Office adoption rates but has a serious negative impact on the success of BI strategy itself... sorry... there I go again... As far as third-party tools go Panorama is still the #1 vendor, but it only has an 11% share and only 22% of SSAS users overall had any kind of third party tool in use. Incidentally Panorama was treated as a BI vendor in its own right for the purposes of the survey and came out top in a lot of the high-level KPIs including the overall KPI ranking - they obviously have some very enthusiastic customers. So who, apart from Panorama, seems to be doing well? Qlikview certainly is, which at least validates Microsoft's decision to go after that market with Gemini; Microstrategy does well too. Essbase comes out badly with great query performance offset by poor support and product quality; Cognos, apart from TM1, doesn't seem to do too well either. Clearly it's the vendors who are 100% focused on BI that are the most successful, which is as you'd expect. 3/4/2009 Analysis Services and the System File CacheEarlier this week Greg Galloway sent me an email about some new code he'd added to the Analysis Services Stored Procedure Project to clear the Windows system file cache: I thought this was quite interesting: several times when I've been doing performance tuning I've noticed that the same query, running on a cold Analysis Services cache, runs much slower when the cube has just been processed. This I put down to some form of caching happening at the OS level or below that; I also vaguely knew that it was a good idea to limit the system file cache, having seen the following on Jesse Orosz's blog: Anyway, doing some more research on this subject I came across the following blog entry that discusses the problem of excessive caching in more detail: Has anyone got any experience with this? From what I can see, installing the Dynamic Cache Service on a 64-bit SSAS box with a big cube on looks like a good idea - has anyone tried it? If you have, or are willing to, can you let me know how you get on? Comments are welcome... 2/25/2009 Madison and SSAS?On Monday Microsoft announced "SQL Server Fast Track", a set of reference architectures for data warehousing with SQL Server. This has all been blogged very well by others, so if you're interested in finding out more I suggest you read Peter Koller: A couple of accompanying white papers have also been released, though, and I was reading this one: and also: So Analysis Services is clearly going to be supported as part of Madison somehow, as a 'node within a grid'. What does this mean exactly? I'm not really sure. The focus of the paper is the 'Hub and Spoke' architecture and how Madison will enable this through its ability to transfer large amounts of data quickly via 'high speed, parallel transfers' over its grid. The following DATAllegro white paper offers some more detail on this: |
|
|