More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  Chris Webb's BI BlogProfileFriendsBlogMore Tools Explore the Spaces community

Chris Webb's BI Blog

MDXtreme Programming!
August 27

SSWUG BI Virtual Conference

The SSWUG are organising a virtual BI conference on September 24th-26th for a very reasonable $100:
 
They've got some good speakers and sessions, so it should be worth checking out.

Generating large numbers of partitions using Excel

Quite often when I'm doing proof-of-concept type work I find myself in the situation where I need to build a cube with near-production data volumes, and in order to make sure that performance is good I have to partition that cube. However setting up tens or even possibly hundreds of partitions in BIDS manually is no-one's idea of fun, so how can you automate this process easily? If you're using a SQL Server datasource then you should try using the functionality built into the Analysis Services Stored Procedure Project:
http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=Partition&referringTitle=Home

But what if you're using an Oracle datasource or hit a bug with the ASSP code? Most consultants have a preferred method (such as their own custom code, or SSIS packages) but I thought I'd blog about the most commonly used approach which is to use Excel. Here are the steps:

  1. Your starting point should be a cube with just one partition in each measure group.I would recommend not putting all your data in this partition, but create it as a slice of the data, the first of the partitions you want to create. So you'll probably want to make it query-bound rather than table bound, and set the Slice property appropriately (see here for why this is important).
  2. Open up SQL Management Studio and expand the tree in the Object Explorer so you can see the partition for your first measure group then right click and Script Partition As -> CREATE To. This will open a new XMLA window and give you your template XMLA command to create a partition.
  3. Let's say we are going to partition by month, we have three months and they have surrogate keys 1 to 3. Our template partition is correctly configured for Month 1 and we want to be able to alter this template for months 2 and 3, so we're going to 'parameterise' the following properties:
    1. ID - so, if your existing ID property is set to "Month_1", we need to replace the 1 with a string we can easily find with a search and replace like "@@@", making the new ID "Month_@@@"
    2. Name - which is usually the same as the ID
    3. Query Definition - you will have a Where clause in the SQL query behind the partition which is something like "Where Month_ID=1" and this should be changed to "Where Month_ID=@@@"
    4. Slice - the tuple will be something like "[Period].[Month].&[1]" which again should be changed to "[Period].[Month].&[@@@]"
  4. Copy this XMLA command text into a cell in a new Excel workbook, say cell A1. Make sure you paste the text into the formula bar and not directly onto the worksheet - you want it all in one cell.
  5. Underneath this cell we're going to use Excel formulas to take this template and generate the XMLA needed for all the partitions we want. In cell A2 enter the value 1, in A3 enter 2 and so on for as many months as you need. Remember in Excel if you enter values like this that increment by 1, if you select that area then drag it downwards Excel will automatically fill the new cells with incrementing values
  6. In cell B2 we're going to use an Excel formula to replace the string @@@ with the value in A2. So something like the following will work:
    =SUBSTITUTE($A$1,"@@@",A2)
  7. You can then copy and drag this formula downwards, and you'll see all your new XMLA commands to create partitions appear in B3 and the cells underneath
  8. Copy and paste the new XMLA into a new XMLA query window in SQL Management Studio
  9. You may find that some unwanted double-quotes have appeared now. You need to replace the double sets of double quotes ("") with (") and the delete the single sets of double quotes. So first do a find and replace on "" and change it to something like @@@, then do a find and replace to delete all instances of ", then do another find and replace to change @@@ to ".
  10. You now need to wrap these XMLA commands in a batch statement so they can be run together. So paste the following text before the first Create:
    <Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    and this at the very end:
    </Batch>
  11. Now delete your original partition and execute the new XMLA batch command you've just created and hopefully you'll have your new partitions created. You can then process them.
August 20

Post-Holiday Roundup

Ahh, so I'm back from my holiday and feeling much better -even if it did manage to rain every single day while I was away (that's the risk you take with holidays in England). Now all I have to do is get through the massive pile of emails waiting for me and steel myself in preparation for the next few months of hard work... roll on Xmas! Anyway, a few interesting things that happened/thoughts that occurred to me while I was away...

Of course the big thing that happened, the day after I left, was the RTM of SQL2008. Hopefully you've heard this news by now, but the big questions here are: is AS2008 any good? Do I want to migrate, and if so, when? Personally, I've been using it for a few months now on a project and my impressions of it are positive. As I've said before there aren't any really amazing wow features that will make you want to upgrade, but the performance improvements can in some cases be quite significant, the new BIDS is easier to use, and there are a few obscure fixes/changes in behaviour which tie up some loose ends left over from AS2005. Since migration is very, very easy indeed I would encourage you to install it on a test machine if you haven't already and start thinking about moving up. Of course the mantra of 'wait until SP1' is so deeply ingrained in people's minds that most people will want to do exactly that - and there's a lot of sense to that approach, since the first bugs are being found already (see here) but equally there are a fair few known problems with AS2005 SP2 and given the problems that all of the CU releases have (see here for example, and I've heard the same story for every single CU, they create as many new bugs as they fix) I wouldn't recommend them; I suppose you could wait for 2005 SP3 but my feeling is that AS2008 is the better bet.

Meanwhile, in the cloud I see that Good Data have gone into beta, and there's a new, mysterious MDX-queryable (Mondrian-based?) offering that has broken cover called BI Cloud. If I have time, I'll try to check them out. Also on the net http://www.learnmicrosoftbi.com/ seems to have a lot of good videos explaining the basics of AS. And there's a new podcast featuring Richard Tkachuk from the SQLCat team where he talks about the performance improvements in AS2008 and seems to suggest that it's now possible to use hints in MDX with a new function whose name I couldn't make out - I'll post if I get more details.

I've also been thinking some more about the DATAllegro deal. There seems to be some discussion about when something that works with SQL2008 can be released, and the folks at DATAllegro are keen to stress that their architecture allows them to plug in new RDBMSs easily so the implication is that it will be sooner rather than later; clearly the investigation work has been going on for a while, and must somehow tie in with the MatrixDB stuff that got leaked a few months ago. All of this would be good for AS running in ROLAP/HOLAP mode on a MPP SQL Server, but can this technology but I wonder whether it could be made to work with AS in MOLAP mode? I think it could - surely the hooks are already there with the remote partitions/linked measure groups/dimensions stuff. Just conjecture though; I think we'll find out more around the time of PASS and the BI Conference.

Lastly, I've booked my place for Mosha's MDX Deep Dive pre-conf seminar at PASS this year. Who else is going?

August 04

SQLBits and my training day

And before I disappear off on my hols for a few weeks, can I remind you that SQLBits is happening on September 13th and that you really ought to be there? We're just about at the stage of finalising the sessions (and we've got a great BI track lined up) so check http://www.sqlbits.com/default.aspx for more details!
 
As I mentioned before, Allan Mitchell and I will be doing a 1-day pre-conf seminar the day before (September 12th) on the Microsoft BI stack:
http://sqlknowhow.com/training/CWAM20080912/default.htm
It's an introductory session, so if you've got colleagues who want to get a good overview of what BI is and what you can do with the MS tools in this area, then send them along.

Bill Baker leaving MS

Bill Baker, pretty much the top guy in BI at Microsoft since Microsoft first got interested in BI, is leaving the company. Not that I'm reading anything much into the move though - after ten years he's probably looking for a new challenge or some way of spending all that money he's earned.
July 30

Many-to-Many Dimension bug

Jon Axon mailed me recently with an interesting bug he'd come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It's reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469

Here's an example on Adventure Works that Jon gave me. The following query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]

Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason - this is the key point here. Now if we run the query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons]

We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons].Children

But it doesn't, it returns 18,484 again. However if you run this query which should be equivalent to the previous query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE {[Sales Reason].[Sales Reason].&[1],
       [Sales Reason].[Sales Reason].&[2],
       [Sales Reason].[Sales Reason].&[3],
       [Sales Reason].[Sales Reason].&[4],
       [Sales Reason].[Sales Reason].&[5],
       [Sales Reason].[Sales Reason].&[6],
       [Sales Reason].[Sales Reason].&[7],
       [Sales Reason].[Sales Reason].&[8],
       [Sales Reason].[Sales Reason].&[9],
       [Sales Reason].[Sales Reason].&[10]}

It returns the value we're looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it's equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database.

July 24

Microsoft buys DATAllegro

Woweee, great news: Microsoft is buying the DW appliance vendor DATAllegro! This is big, big news for Microsoft BI and I am super-excited. Curt Monash has a few posts that explain why this is so big on his blog:
See also:
 
I've long wanted to be able to stick Analysis Services in ROLAP/HOLAP mode on top of one of these MPP babies, and maybe now I'll get the chance. Potentially in the long term we might get MPP MOLAP too, which would be even better...

XLCubed (and a rant about Microsoft's client tool strategy)

The other week I stopped by in Maidenhead to see the guys at XLCubed, and to take a look at their latest stuff. XLCubed have been around a long time and their Excel addin AS client has always been one of the best out there, but with the improved Analysis Services support in Excel 2007 (especially with the introduction of 'convert to formulas') and the Proclarity acquisition has put a squeeze on the client tools sector. A lot of the third party client tools out there, XLCubed included, are better in a lot of ways than the equivalent Microsoft offerings but it's often hard to explain to someone who isn't very experienced with Analysis Services what the advantages are and why they represent a good reason to buy a non-Microsoft product. So, in order to survive, you need a clear, unique selling point and XLCubed now have one in the form of Microcharts after they bought Bonavista Systems last year (I blogged about the Microcharts product in its original form here). Microcharts gives you the ability to create sparklines, bullet graphs and other in-cell charts, which is not only impressive when used in conjunction with regular Excel and Reporting Services (with or without AS as a data source) but enters the realm of extreme coolness when you see how it's been integrated with XLCubed.

Here's just one example of the kind of dashboard you can build with XLCubed:

CIODashboard_550X 

You can see a whole page of sample dashboards here:
http://www.xlcubed.com/en/Demo_Overview.html

Nice, eh? I should also mention they have an excellent data visualisation blog that's well worth a read:
http://blog.xlcubed.com/

While on the subject of client tools, can I veer off on a tangent here and criticise Microsoft's strategy in this area? In my opinion (and just about everyone I've met agrees with me, not least disgruntled ex-Proclarity employees) what they've done has actually harmed the core Microsoft BI market over the last two years. Before the Proclarity acquisition it wasn't an ideal situation, for sure, since telling customers that they had to buy their client tools from a third party looked bad. But what Microsoft have done is bought the leading third-party client tool and effectively chucked it in the bin, saying people should use Excel and PerformancePoint instead. Excel 2007 is a good client tool but a) a lot of companies are still on Excel 2003 and before, and are not going to upgrade just for the sake of a BI project, b) it has nowhere near the kind of advanced functionality that the Proclarity desktop tool had and never will, and c) it still has a few glaring problems (see here for example); PerformancePoint too is encouraging but very much a version 1.0. Microsoft's long release cycles for both mean that we have to wait way too long for any upgrade in functionality, and in the meantime we're left with a vacuum: the third party client tool market has been weakened because now all customers will want to use Microsoft client tools as a first choice, but these client tools are not yet up to scratch. Why on earth didn't they carry on developing the Proclarity product line for a few more years until a smoother transition could be made? Why the prejudice against standalone client tools? Once again I'm left with the feeling that senior people in Redmond have little idea what's going on in the real world and more importantly are insulated from the impact that their decisions have on the bottom line. On the positive side, though, Microsoft's actions have given companies like XLCubed the breathing space they needed to innovate and survive.

July 23

Show Hidden Objects in the Calculations tab

Here's a new button I've just noticed in the toolbar on the Calculations tab in BIDS 2008: Show Hidden Objects. Quite often when you're writing MDX you want to reference hierarchies that are hidden to the end user, and in BIDS 2005 you had to unhide them to be able to see them (and so find out their unique name) in the metadata pane, which was a pain; now in BIDS 2008 you just need to click the new 'Show Hidden Objects' button to be able to see them. For some bizarre reason it doesn't seem to allow me to see hidden sets or calculated members though - why?

showhiddenobjects

July 21

Kalido Universal Information Director now generates Analysis Services cubes

Here's the press release from Kalido:
http://www.kalido.com/5a04ea36-c50e-4617-b25d-85ea56a22690/news-and-events-press-center-press-releases-detail.htm

I don't have any direct experience with Kalido's products (although I've heard good things) but I'd be interested to see the cubes it generates. I wonder to what extent it's possible to optimise automatically generated cubes?

View more entries