| Chris 的个人资料Chris Webb's BI Blog日志列表 | 帮助 |
|
2009/9/30 Building a Better Cache-Warmer, Part 1: the Storage Engine CacheSSAS caching is an interesting subject: I’ve blogged numerous times about when caching can and can’t take place, and one of the most popular posts I’ve ever written was on how to build a cache-warmer in SSIS. However that last post was written a while ago, before the CREATE CACHE statement for warming the Storage Engine cache was reintroduced in SSAS 2005 SP2, and I’ve not been satisfied with it for a number of other reasons too. So as a result I’ve started to do more research into the issue and, with the help of Mosha, Akshai Mirchandani, and the chapter 29 of the indispensible “Microsoft SQL Server 2008 Analysis Services Unleashed” I’ve been collecting my thoughts on how to build a better cache warmer with a view to blogging about it again. This first post will deal with the relatively easy bit – warming the SE cache – and when I have time I’ll write about the more difficult but more useful task of warming the Formula Engine cache, and by hopefully then I’ll be in a position to create the definitive SSIS cache warmer package and be able to blog about that too. Before I go on I need to mention that one other reason that has made me want to revisit this subject is that, at his MDX Deep Dive session at PASS last year, Mosha asserted that cache-warming was a Bad Thing. This intrigued me, since it seemed like such an obviously Good Thing to do, but of course it turns out he had good reason for saying what he did… SSAS caching is a lot more complex than it first appears and cache warming done badly can be counter-productive. I’m also tempted to say that if you don’t need to warm the FE cache (ie you don’t have any complex calculations that take a long time to run) then there’s not much point in warming the SE cache. In my experience if you have the right aggregations built then you can get excellent response times even on a cold cache. Of course to build aggregations you need to know the granularities of data that your users’ queries are, but you need the same information to warm the cache. The only difference would be that you can make your cache warmer much more dynamic than your aggregation design: if your users’ query patterns change regularly it would be difficult to keep your aggregation design in synch with them, whereas with a cache-warmer you can record the most recent queries that have been run and warm the cache accordingly. There are a number of things to bear in mind when warming the SE cache:
Of course all of the above is subject to change in later versions, but I understand that this is the current situation with SSAS 2008 SP1. How can we find out what data to use to warm the SE cache then? The MDX queries that have been run by users aren’t much help, but we can see what subcubes are being requested when a query is run by looking at the Query Subcube and Query Subcube Verbose events in Profiler; we could therefore run a Profiler trace to capture this data, but in fact the easiest option is to just use the Usage Based Optimization wizard’s Query Log since it contains the same data as Query Subcube. The following blog entry from James Snape shows how to interpret this information: Last of all, how do we actually go about loading data into the SE cache? We could run MDX queries but this is a bit of a risky approach since we may not know what calculations are present and so we can’t be sure that our query requests the right data; the best option is to use the CREATE CACHE statement that is described here: 2009/9/23 Slow metadataThere’s an interesting thread on the Analysis Services MSDN Forum that I’ve been following for a while now (it’s over 40 posts long at the time of writing) that gives a good explanation of why metadata access on very large or complex cubes can be so painfully slow. Here’s the link: In summary, it turns out that the more files there are on disk in your SSAS database folder, the slower metadata reads become, and when you reach very large numbers of files almost any kind of metadata access becomes incredibly slow. There are a lot of different factors determining how many files are created for your SSAS database but large numbers of cubes, attributes, partitions and aggregations (which end up being multiplied across each partition) are the main culprits; the problem is that the way the EstimatedSize property on metadata objects has been implemented results in SSAS having to iterate over all the files in every subdirectory in the database’s directory in the data folder to find their size. One workaround is to set the Storage Location property of your partitions to somewhere outside the data directory since this means they won’t get scanned, but doing this makes backup and restore difficult and means you get inaccurate size estimates. The only other option is to reduce the number of partitions and aggregations you have; given that the SQLCat team are now saying that partitions of 50-60 million rows each are feasible in 2008, and that you can use different partition designs for partitions containing old and new data, there’s probably a lot of scope for doing this in most cubes. Hopefully, though, this problem will be fixed in a future release… 2009/9/20 adCenter and SSAS 2008 White PaperThere’s a new white paper on how adCenter uses Analysis Services 2008 available here: For the most part it’s a fairly run-of-the-mill list of reasons to upgrade to 2008 (and, to be honest, if you’re running 2005 I can’t think of a good reason not to upgrade to 2008 – the performance is so much better) but there are several good nuggets on performance tuning for large SSAS implementations; for example, table 3 has a list of ini file settings for optimising processing. Worth a look. 2009/9/16 Top Ten Reasons to come to SQLBitsIt seems like every conference nowadays has to come up with a ‘top ten’ of reasons to attend, doesn’t it? Well, we at SQLBits thought we shouldn’t be left out and I got the job of writing it. So, even though it’s already been used in one of our promotional emails and Simon’s already put them on his blog, I thought I’d post them up here: 10. We’re going West (Newport, to be exact)! Which is great news if you live in Wales or the South West and you get fed up with all the tech conferences being held in London or the Thames Valley. Even if you don’t live in the local area it’s only 1hr 45 by car or train from London, it’s close to Birmingham, and with Bristol and Cardiff airports nearby it’s easy to get to from anywhere in Europe. 9. SQLBits has grown to three days. We have a day of pre-conference seminars on Thursday November 19th; on Friday November 20th we have a special SQL 2008 and SQL 2008 R2 day; and on Saturday November 21st we have our flagship free community conference. That’s probably all the SQL Server education you’ll want or need this year. 8. Meet other SQL Server professionals. Are you the only DBA in the village? Do people think you’re talking about your private life when you say you’re a “BI consultant”? Do you have to pretend to be an estate agent or a lawyer at parties to avoid the stigma of being seen as an IT geek? At SQLBits you’re among friends: no-one will mind you talking about clustered indexes or the finer points of MDX syntax. 7. Attend an advanced technical training course. Got some budget for a training course, but don’t want to spend it on yet another basic Microsoft Official Curriculum course where the instructor only read the course notes the night before? Our pre-conference seminars are run by world-class SQL Server experts and will give you the knowledge you need to do your job. Our big draw is Donald Farmer doing a whole day on Gemini and self-service BI, but I’ll be running an introductory MDX course too… 6. Win some swag, drink some beer and eat some pizza. It’s not all work, work, work at SQLBits: we have great giveaways, there are loads of opportunities for socialising and there’s beer, pizza and games at the end of the day. And if you like golf, you might be interested to know that the venue, Celtic Manor Resort, has three championship courses and is playing host to the 2010 Ryder Cup. 5. Check out the latest SQL Server-related products. Don’t have time to download and test the latest tools and utilities? Our sponsors will be happy to demo them for you and provide eval versions. They’ll probably also have loads of those cheap plastic promotional freebies that your kids love. 4. Find out what’s coming in the next version. Our new SQLBits Friday event will give you the lowdown in what’s new in SQL 2008 and what’s coming in SQL 2008 R2 (including Madison, Gemini, StreamInsight, Master Data Services and changes in Reporting Services), so you can plan for the future more effectively. 3. Bring the family. Want to come to SQLBits, but feel guilty about leaving the family at home? No need, they can come too! The venue has its own swimming pools, a spa, tennis courts and restaurants as well as the golf courses. Repeat after me: “It’s not a conference, darling, it’s a luxury weekend break...” 2. Deep technical content. We won’t make you sit through hours of marketing guff from some sales guy who doesn’t know a database from a spreadsheet - SQLBits is all about SQL Server professionals sharing their hard-won technical knowledge with the community. We’ve got big names like Donald Farmer (program manager on the Analysis Services/Gemini dev team) and Thomas Kejser (from the SQLCat team), and MVPs like Simon Sabin, Allan Mitchell and me speaking. 1. Did we mention that SQLBits Saturday is FREE? Yes, as always, the SQLBits community day on Saturday November 21st is free to attend. Top-class speakers, a massive selection of sessions and 100% SQL Server focused – and you pay nothing. That’s 100% cheaper than most tech conferences! So, what are you waiting for? Register now at: Oh, and if you are already registered you might be interested to know that voting has just opened, so you can pick the sessions you’d most like to see. Last of all, would you like to win free admission to the training day of your choice, the Friday event, and free hotel acommodation at Celtic Manor for the Thursday and Friday night? Then check out our mugshot competition: 2009/9/10 Now() and the Formula CacheYou know, I have the nagging feeling I’ve already blogged about this before… but I can’t find the post (I have been at this for over four years now) so it’s worth mentioning again… Anyway, a common question asked on the MSDN Forums is how to get the current date and then use it in an MDX calculation – see this thread for example. Usually this is because users want calculations that show the current day’s sales, or something similar. One answer is of course to use the Now() function, but what people don’t often realise is how this can impact the ability of Analysis Services to cache the values returned by calculated members, and therefore reduce overall query performance. To understand why, let’s look at some examples in Adventure Works. First of all create a calculated measure on the AW cube as follows: CREATE MEMBER CURRENTCUBE.[MEASURES].[NOWDEMO] AS NOW(); Then run the following query a few times: SELECT {[Measures].[NOWDEMO]} ON 0 As you’d expect, every time you run this query you see the current date and time – and every time you run it, you see a different value. But, you may be thinking, doesn’t Analysis Services cache the results returned by calculated members? Well, yes it does in most cases, but for non-deterministic functions (functions that could return a different result every time they’re called) like Now() no caching takes place, because otherwise the value returned from the cache might be different from the one the function actually returns. The next problem is that if you create any other calculated members that depend directly or indirectly on the value returned by a non-deterministic function, their values can’t be cached either. One problem I see sometimes in my consultancy work is poor query performance resulting from SSAS being unable to use the formula cache, because a large number of calculations have a dependency on a single calculation that uses the Now() function. Here’s a greatly simplified example of two calculated members, the first of which finds the current year and the second which returns a sales value for the year eight years before the current year: CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year] CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR] If you run a query that references the second calculated measure on a cold cache, such as: SELECT [MEASURES].[SHOW SALES FOR A YEAR] ON 0 The first time you run it you’ll see SSAS going to disk as you’d expect; the second time you run it though you’ll see SSAS is able to use the Storage Engine cache but not the Formula Engine cache, as this Profiler trace shows: The highlighted Get Data From Cache event shows data being retrieved from the measure group cache. This is ok and can be beneficial for query performance, but if the calculation we’re doing is very expensive then it can still mean our query takes a long time to run. We’ll only get an instant response on a warm cache if we can work out how to use the formula cache somehow. Luckily, in most cases where Now() is used, we don’t usually want the system date and time, we just want the date. That means that we only want to return a different value when the date changes, once every 24 hours. What we can do therefore is use a named set to somehow store the value returned by Now(), for example like this rewrite of the calculation above: CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year] CREATE SET CURRENTCUBE.MYYEAR AS CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR - CACHEABLE] What we’re doing here is finding the year we want, then finding the member for the year 8 years ago and storing that in a named set. Because a static named set is only evaluated once, when we do our final calculation we can reference the single member stored in the named set and therefore make use of the formula cache as the following Profiler trace shows: We’re now getting data from the flat cache, which is one part of the formula cache (which isn’t ideal either as it indicates the calculation is being evaluated in cell-by-cell mode, I guess because we’re referencing a named set inside it) and so warm-cache performance will be better. The next problem is that when the date does change, we need to clear the cache. This can be accomplished easily by running an XMLA ClearCache command, perhaps direct from SQL Server agent, every night at midnight or whenever necessary. To be honest, though, I’m not sure using the Now() function at all is a good thing – apart from the issues described here there are a lot of other risks involved, such as the time or date on your server being wrong or confusions with time zones and date formats. I think a better approach to the problem is to have an extra attribute in your Time dimension which flags up a date as ‘today’, and which changes every day. Of course this means you need to do some extra ETL and processing on your Time dimension as a result, but I think it’s a much cleaner solution than Now() and leads to much more efficient MDX calculations. 2009/9/4 Implementing IE8 Web Slices in an SSRS ReportOne of the new features that caught my eye in Internet Explorer 8 when it came out was Web Slices – the ability for a web developer to carve up a page into snippets that a user can then subscribe to. There’s a brief overview of what they are here: Being the BI geek that I am, my first thought was to see whether they could be used with Reporting Services reports. After all, wouldn’t it be cool if you could subscribe to a table in an SSRS report, or even better a cell within a table, and get notified when that value changed rather than have to keep pinging the report yourself? Of course it would! Here’s how to do it… The challenge with implementing web slices is to get SSRS to generate the necessary html when it renders your report. I first looked at using the new rich formatting functionality that’s available in SSRS 2008 that Teo Lachev describes here, but it turns out that you can’t use this to create Web Slices because SSRS doesn’t support the necessary attributes (see here for details – at least I assume this is why, because I couldn’t get it to work). The only way I could get it to work was to render the report as XML and then use an XSLT file to give me complete control over the HTML that SSRS generates. I won’t go into too much detail about how this works; once again, Teo has an excellent explanation in his book “Applied Microsoft SQL Server 2008 Reporting Services” (reviewed here – it’s an excellent book) on pages 263-265. To be honest this isn’t a satisfying approach for me because it involves a lot more effort to get the report looking the way you want, and of course you have to have control over how the report is rendered. However, it still makes for a fun proof-of-concept :-) The first thing I did was create a simple SSRS report in BIDS that brought back values for Internet Sales broken down by country: I then rendered the report to XML, took a look at the XML generated, and created a simple XSLT file that would generate a HTML report from that XML. I then added the XSLT file to my project and associated my report with it using the report object’s DataTransform property, so that it was always used when the report was rendered to XML. I was then able to deploy the project and, by using URL access to the report get it to render to XML and get the result treated as html, was able to see the following in IE8: Here’s an example SSRS URL that does this: Then I went back to BIDS and altered the XSLT file to add the necessary tags for a Web Slice around the main table. When I went back to IE and reopened the report after deployment I could see two new things. First, the Web Slice button appeared in the IE toolbar: And when I moved the mouse over the table in the report, it was highlighted with a green box as a Web Slice: I could then click on either to subscribe to the Web Slice and have it added to my favourites. This then meant I could see the contents of the table in my Favourites bar whenever I wanted: And whenever the data changes (you can control how often IE polls the original web page in the Web Slice’s properties, and also in the definition of the Web Slice itself) the text in the Favourites bar turns bold: So there you are. Even with the limitations that having to render to XML imposes I can think of a few useful applications of this approach… maybe I’ll test them out in a future blog entry. Let me know if you have any ideas! One last thing: I think it would great (and involve relatively little dev work) if SSRS supported the creation of Web Slices out of the box. If you agree, please vote: You can download my proof-of-concept SSRS project here: 2009/9/1 DataWarehouse ExplorerContinuing my occasional series of SSAS client tool reviews, here’s another contender in the post-Proclarity power-user market: DataWarehouse Explorer, from Dutch company CNS International. DWE is a standalone, ‘rich client’ application that gives you a lot more functionality than you get in Excel pivot tables and as such is competing in the same market that Proclarity Desktop Professional used to dominate and which is still pretty crowded. There’s also a web-based portal that you can publish reports to (see here for full details on the architecture) but if you want to build queries you need to do it on your desktop. So what’s it like? I liked it: it’s not got any flashy features that mark it out particularly, but it does everything it needs to do and it does so well. Probably the best thing is the UI – a nice Office 2007 look-and-feel and most importantly very clear and easy to use. As someone who has spent plenty of time working with Analysis Services over the last ten years or so, when I start using a new client tool I expect to be able to do what I want to do very easily: I know all the basic concepts of cubes, I know the Adventure Works cube, and I know the queries I want to run, so if I can’t work out how to do something then I lay the blame on the UI design. And if I can’t do something there’s not point expecting an end user to do it. In the case of DWE I had no problems at all and in many respects it’s much easier to use than something like Proclarity or Excel. Here’s a screenshot: The filter dialog provides a good example of how they’ve got the UI right. Filtering is something that every worthwhile client tool needs to do, but it’s easy to make it confusing for the user especially when you’re applying multiple conditions. The DWE filter dialog is uncluttered, shows all the filters you’ve already set up, makes it easy to add new ones or delete existing ones, and has a number of nice touches like the way it automatically formats any numeric conditions you enter to match the format string of the measure you’re filtering on. Other features worth mentioning include:
Overall, then, a good product and one worth evaluating if you’re looking for a desktop-based SSAS client tool. |
|
|