| Chris 的个人资料Chris Webb's BI Blog日志列表 | 帮助 |
|
2009/3/31 SQL Solutions OLAP Heartbeat and OLAP Performance AdvisorHmm, you wait for years for commercial tools for monitoring Analysis Services (the only one I’d ever seen before was Companion for Analysis Services from SQLMinds) and then two come out at once. One of these tools I’ll be blogging about towards the end of this week, hopefully – I’ve had a sneak preview and it looks very cool – but today I found out the following from SQL Solutions:
I’ll download both and give them a thorough test as soon as I can. EXISTING AND NONEMPTYHere’s an interesting MDX nugget. The other day I was working on something similar to the following query, that finds the count of the number of customers who bought something in the current month as well as the previous month: WITH On my laptop, this runs in almost 16 seconds – not good. What I was doing here was making the assumption that it was a good idea to do the EXISTING first, to filter the set of customers down as much as possible, then do the two NONEMPTYs. In fact, what I found was that by moving the EXISTING to after the NONEMPTYs had been evaluated the query ran much faster: WITH On my laptop now, the above query executes in 3 seconds. Why is this? To be honest, I’m not completely sure… looking in Profiler and Perfmon I see differences, but not many; these things are better left to the likes of Mosha who know what’s going on inside in the formula engine. But anyway I thought I’d post this up because it’s a common form of calculation – so if you are using NonEmpty and Existing together, do the NonEmpty first and then the Existing. 2009/3/29 SQLBits IVSo, yet another SQLBits finished, and judging by the feedback we got it was the best yet. Certainly we had our highest ever attendance and the sponsors all seemed very happy (which is important because, after all, they’re the ones paying for it); we’ve also had a number of very kind emails thanking us too. Now for a few months of peace and quiet before we start planning for the next one, which I suppose will be sometime in the autumn. Any suggestions on where to hold it? Bristol, Oxford, Leeds? Incidentally, if you took any photos or videos at SQLBits why not share them on the SQLBits Facebook group? There’s also a SQLBits LinkedIn group and the #sqlbits tag on Twitter that you might want to check out too. Oh, and while you’re at it why not join the UK SQL Server User Group Facebook group as well? We’re planning some BI user group events soon, I promise. I suspect the current economic problems have had a lot to with the success of SQLBits. Apart from the fact that a free conference is the perfect answer to a reduced training budget, I guess everyone’s worried about job security and therefore keen to invest in their own education - Simon Munro wrote a good blog entry on exactly this topic last week which is worth a read. However, I found it interesting talking to other BI professionals at the conference (and yes, networking is another great reason to come along) about their pipelines because everyone was saying how busy they were at the moment. When the recession hit I was sceptical of predictions that the BI sector would survive unscathed but, although I’ll admit I’ve not been as busy this year as I was last, I’m still doing more than OK. Looking at the visitor stats for this blog and my company website, I’ve noticed hits from a number of companies that have been in the news for their, ahem, financial problems – and the conclusion I draw is that there are a lot of organisations out their who have realised how little they know about their own financial situation and hope BI software is the answer. 2009/3/26 Aleri Live OLAPThere’s no real reason for this post except to indulge my trainspotterish habit of making a note whenever I find another OLAP product that supports MDX. From a recent post on Curt Monash’s blog I came across Aleri Live OLAP (see also here), an MDX-capable OLAP server that works on streaming data coming from their complex event processing platform. 2009/3/23 SDS, Analysis Services in the cloud, and local cubes?I was just reading through a list of questions and answers about the new, more SQL Server-like SDS, on the SDS team blog and had a thought. Here’s three points that are made in the post: When? or to quote JamieT “When do I get to party on this new service”? We’re on track to deliver a public CTP mid-calendar year 2009 and ship in the second half of calendar year 2009. The blog entry states “If it works with SQL Server, it will largely work with SQL Data Services.”. That word “largely” bothers me a little – it suggests the functionality is going to be reduced slightly. Details please? We will be providing documentation soon on what is and is not supported in SDS. I’ll post an entry to the blog once the guidance is available and you can also keep an eye out for it on our MSDN Dev Center. But, to answer the question – We say *largely* due to the fact that there are things that just don’t apply in a cloud based world like setting the location of a data or log file or making server wide configuration changes. In v1 we expect to deliver a surface area that will support the vast majority of SQL Server database applications. Will you offer hosted SSIS/SSAS/SSRS? It’s on the product roadmap, but I can’t comment on specifics or timing. So, we’ll get a CTP in a few months, it’s going to be mostly compatible with existing SQL Server apps, but we’re not going to get Analysis Services in the cloud just yet. What can we do while we’re waiting for cloud-based SSAS then? Well…
2009/3/19 Query behaviour when you cross the LowMemoryLimitHere’s something that had me stumped for quite a long time earlier this week… I’m doing some performance tuning work for a customer at the moment, and following my usual approach I took a test query, cleared the cache, ran it and noted how long it ran for and what I saw in Profiler and MDX Studio. However this time I saw some strange things happening: I’d run the same query on a cold cache more than once and the query times would differ by anything up to a minute; even more confusing, the information I got from Profiler and MDX Studio would be different too – for example, the number of Query Subcube Verbose events in recorded in Profiler for a query might differ from run to run by several hundred. This had me completely confused. There was no-one else on the server and I was running on a cold cache. What was going on? It turned out that I was running out of memory. I was working on a 32-bit box with a lot of databases on, so even on a cold cache the amount of memory used by SSAS was approaching the LowMemoryLimit threshold (see Marco’s post here, the AS2005 Performance Guide and this white paper for an explanation of what this is). When I ran the query – which had a lot of calculations - memory usage went above the LowMemoryLimit so the cleaner thread was waking up and evicting elements of shrinkable memory, which presumably included data in cache that had been put there by the current query. Since the exact moment that this threshold was crossed would vary from run to run, and as I assume that what the cleaner thread would evict from memory would also vary, this explained the different query times and Profiler activity. As soon as I moved onto a 64-bit box with no other databases this stopped happening. Of course nowadays I would expect that most people are running on 64-bit so it’s much more unlikely that you’ll ever cross the LowMemoryLimit, but I do see it happen from time to time. So if you’re in the same position and suffer from inconsistent query performance, check your memory usage! 2009/3/12 BI Survey 8As I think I've probably said before, I don't get many freebies as a blogger but one that I do get and I really appreciate is my review copy of the BI Survey. It being that time of year, I got the latest edition - the BI Survey 8 - last week along with an email encouraging me to blog about it, and who am I to refuse a request to blog about something as fascinating as this? It's a whopping 489 pages long so I can't even begin to summarise it, but there are some points regarding Microsoft BI that I'd like to pick up on. In general, the findings show the Microsoft BI stack as a solid and successful suite, extremely good value for money, but by no means a stellar performer. There's a long section at the end of the report showing a wide range of technical and project-related KPI ratings (for example query performance, business benefits achieved, quality of support, cost of deployment and so on) and in almost every respect Analysis Services and Reporting Services come out in the middle of the rankings. This doesn't really surprise me much: let's face it, what we've got works well, but from a technical perspective there's not been much new and exciting in the world of Microsoft BI for a while now (although with Gemini and Madison coming soon that will change), the platform in general still has some glaring holes and overlaps, and as the PerformancePoint debacle showed recently Microsoft's overall BI strategy is somewhat confused. In fact, I suspect Microsoft's entire BI strategy is not a BI strategy at all but a get-people-to-upgrade-to-the-latest-version-of-Office strategy, but I digress... For Analysis Services in particular, the survey showed 15% of those surveyed were on AS2K, 79.6% were on AS2005 and 3.5% on AS2008. Given that most people must have been surveyed last summer, well before the RTM of AS2008 that's pretty good and with the migration path from 2005 to 2008 very smooth I should think the AS2008 percentage will look much better next year. Analysis Services remains the top-ranked BI tool used against Microsoft's own databases, as you'd expect since it's essentially free with SQL Server, but it also comes in a close third for Oracle (while Oracle's own BI tools come in 12th!), second for IBM, third for Teradata and top for Open Source databases. In terms of the client tools used with Analysis Services, just over 70% of people were using one of Microsoft's own tools - mostly pivot tables and Reporting Services. 25% of SSAS users had Proclarity, and interestingly there were more people using the old free Excel addin than PerformancePoint , and a sizeable minority still using Data Analyzer. There's clearly a lot of demand for a client tool from Microsoft but from the looks of things most people are still stuck with Office 2003; this just adds weight to my argument that Microsoft coupling its BI strategy so closely to Office might help Office adoption rates but has a serious negative impact on the success of BI strategy itself... sorry... there I go again... As far as third-party tools go Panorama is still the #1 vendor, but it only has an 11% share and only 22% of SSAS users overall had any kind of third party tool in use. Incidentally Panorama was treated as a BI vendor in its own right for the purposes of the survey and came out top in a lot of the high-level KPIs including the overall KPI ranking - they obviously have some very enthusiastic customers. So who, apart from Panorama, seems to be doing well? Qlikview certainly is, which at least validates Microsoft's decision to go after that market with Gemini; Microstrategy does well too. Essbase comes out badly with great query performance offset by poor support and product quality; Cognos, apart from TM1, doesn't seem to do too well either. Clearly it's the vendors who are 100% focused on BI that are the most successful, which is as you'd expect. 2009/3/10 SDS: the new relational features announcedAfter all the rumours, here's the official announcement of the new relational features that are coming to SQL Data Services: Given that the team have already made noises about adding BI features to SDS soon, I can't wait to see what form they'll take. Of course there are already lots of ways of doing BI with data stored online as my last blog entry showed; there are also couple of startups like Birst and GoodData who do very sophisticated BI things in the cloud already. But I hope Microsoft has something up its sleeve, and that I can run an MDX query against it... Guardian Data Store - free data, and some ideas on how to play with itI was reading the Guardian (a UK newspaper) online today and saw that they have just launched something called Open Platform, basically a set of tools that allow you to access and build applications on top of their data and content. The thing that really caught my eye was the Data Store, which makes available all of the numeric data they would usually publish in tables and graphs in the paper in Google Spreadsheet format. Being a data guy I find free, interesting data irresistible: I work with data all day long, and building systems to help other people analyse data is what I do for a living, but usually I'm not that interested in analysing the data I work with myself because it's just a company's sales figures or something equally dull. However give me information on the best-selling singles of 2008 or crime stats for example, I start thinking of the fun stuff I could do with it. If you saw Donald Farmer's fascinating presentation at PASS 2008 where he used data mining to analyse the Titantic passenger list to see if he could work out the rules governing who survived and who didn't, you'll know what I mean. Given that all the data's in Google Spreadsheets anyway, the first thing I thought of doing was using Panorama's free pivot table gadget to analyse the data OLAP-style (incidentally, if you saw it when it first came out and thought it was a bit slow, like I did, take another look - it's got a lot better in the last few months). Using the data I mentioned above on best-selling singles, here's what I did to get the gadget working:
Of course, this isn't the only way you can analyse data in Google spreadsheets. Sisense Prism, which I reviewed here a few months ago, has a free version which can connect to Google spreadsheets and work with limited amounts of data. I still have it installed on my laptop, so I had a go connecting - it was pretty easy so I won't go through the steps, although I didn't work out how to get it to recognise the column headers as column headers and that polluted the data a bit. Here's a screenshot of a dashboard I put together very quickly: Lastly, having mentioned Donald Farmer's Titanic demo I thought it would be good to do some data mining. The easiest way for me was obviously to use the Microsoft Excel data mining addin: there are two flavours of this: the version (available here) that needs to be able to connect to an instance of Analysis Services, and the version that can connect to an instance of Analysis Services in the cloud (available here; Jamie MacLennan and Brent Ozar's blog entries on this are worth reading, and there's even a limited web-based interface for it too). Here's what I did:
Here's what I got out: From this we can see very clearly that if you're from the UK or under 25 you're much more likely to be producing Pop, Groups are more likely to produce Rock, and various other interesting facts. So, lots of fun certainly (at least for a data geek like me), but everything I've shown here is intended as a serious business tool. It's not hard to imagine that, in a few years time when more and more data is available online through spreadsheets or cloud-based databases, we'll be doing exactly what I've demonstrated here with that boring business data you and I have to deal with in our day jobs. 2009/3/4 Analysis Services and the System File CacheEarlier this week Greg Galloway sent me an email about some new code he'd added to the Analysis Services Stored Procedure Project to clear the Windows system file cache: I thought this was quite interesting: several times when I've been doing performance tuning I've noticed that the same query, running on a cold Analysis Services cache, runs much slower when the cube has just been processed. This I put down to some form of caching happening at the OS level or below that; I also vaguely knew that it was a good idea to limit the system file cache, having seen the following on Jesse Orosz's blog: Anyway, doing some more research on this subject I came across the following blog entry that discusses the problem of excessive caching in more detail: Has anyone got any experience with this? From what I can see, installing the Dynamic Cache Service on a 64-bit SSAS box with a big cube on looks like a good idea - has anyone tried it? If you have, or are willing to, can you let me know how you get on? Comments are welcome... |
|
|