Chris's profileChris Webb's BI BlogBlogLists Tools Help

Blog


    6/29/2009

    Kapow Technologies

    In all of the Gemini demos seen so far, a big thing is made of the fact that it can be used to integrate data from outside sources (eg some stuff you found on the web) with internal data (eg data from your data warehouse). This is all very well, but it assumes you have some way of actually capturing the data that’s out there on the web in a usable form and automating the capture of new data from these sources on a regular basis.

    For example, if you’ve found a table containing some data you want on a website, what do you need to do to actually use it? You’d need to copy the table and then paste it into Excel perhaps; you’d then need to do some reformatting and editing, copy it again and then paste it to wherever you need it such as Gemini. All very manual; in fact, a right pain and not something you’d want to do on a regular basis. There might be an RSS or an Atom feed, and you might be able to use a tool like Yahoo Pipes (see also Jamie Thomson’s recent post), but there isn’t always one and even when there is it might not contain all the data you need or be in the right format.

    Last week I had a call with Kapow Technologies, and they’ve got a really cool tool that addresses this problem:
    http://kapowtech.com/index.php/solutions/web-and-business-intelligence
    It’s something like a cross between a screenscraper and an ETL tool, and it allows you to harvest data from web pages, do some transformation and cleansing, and then move it somewhere else like a database table (where more traditional ETL tools can take over) or another application. There’s a bit of background on this type of technology here:
    http://en.wikipedia.org/wiki/Web_scraping

    I got hold of an eval version and had a play with it, and while I’m hardly the right person to give an expert review it does seem very powerful. It’s certainly easy to use: I had a simple scenario working within twenty minutes, and I think anyone with some SSIS experience will find it fairly straightforward. First of all you have to declare a set of objects (which can then be mapped onto relational tables) that will hold the data you want to collect; you then create a robot that will harvest data from web pages and load it into the objects; finally you can run the robot either from the command line or on a schedule. More information can be found in the help:
    http://help.kapowtech.com/7.0/index.jsp

    In my example, I took a web page from the BBC website that shows the weather where I live:
    http://news.bbc.co.uk/weather/forecast/2302?&search=amersham&itemsPerPage=10&region=uk&area=Amersham

    I wanted to harvest the time, the basic outlook and the temperature from the first box in the “first 24 hours” section:

    image

    Here’s a screenshot of what that page looks like in the RoboMaker tool:
    image

    You can see that I’ve selected the tag containing the time in the central browser window, and that it will be mapped to the attribute weather.forecastTime.

    I think the thing that I find really exciting about this tool is that, when you think about it, there is a whole load of useful data out there on the web that would be great to use as part of your BI if you can extract it quickly and easily - and indeed legally, because I suspect there might be some legal objections to doing this on a large scale. One of the examples Kapow gave me was of a customer that harvested comments on their products from places like Amazon, then fed that data through a text mining application, to monitor public opinion on their products (today’s Dilbert shows how this information could be used!); I would have thought that this is something a lot of companies would like to do. The weather forecasts I was looking at could also be useful for predicting retail sales in the short term; if you’re an online retailer you’d probably want to compare your prices for certain products with those of your competitors. I’m sure we’ll be seeing much more use of web data in BI, captured using tools like this, in the future…

    6/26/2009

    SAP, MDX and the right language for BI

    Via Amyn Rajan’s blog I’ve learnt about a forthcoming MDX book, “MDX Reporting and Analytics with SAP”, and also seen some encouraging news on the SAP Business Objects Roadmap where MDX will be a core element. As an MDX guy – and more specifically as someone who make his living from MDX – I’m pleased to see this level of support for MDX from someone other than Microsoft and Mondrian. Even if I never do any consultancy work on SAP MDX (although I’d be interested since I hear rates are much higher than for Microsoft work!) I have a vested interest in the popularity of the MDX language in general. Yes, it’s difficult to learn and sometimes inconsistent, but I am 100% convinced it’s the best language for multidimensional queries and calculations and much better than SQL.

    To a certain extent we’re all guilty of promoting the technology we know and love as the way of solving all problems; and whenever we come across a problem our technology can’t solve easily, the temptation is to extend it and improve it so that it can solve the problem. However in the case of SQL and BI there comes a point where no amount of extensions and improvements can make it do what you want it to do efficiently and easily. To paraphrase Dr Johnson, a complex BI-type calculation done in SQL is like a dog walking on its hind legs. It is not done well, but you are surprised to find it done at all. There are plenty of vendors and bloggers who would disagree with me on this though, I know.

    One good example of where MDX scores over SQL is that it inherently knows about order on hierarchies: for example it knows (because you told it when you built your dimension) that Monday comes after Sunday, that 2009 comes after 2008, that June comes after May. This makes doing time-based calculations very easy, and time based calculations are part of just about every BI solution. The ability to order a set and then work with it is also important for other types of problem, such as the grouped string concatenation puzzle that Adam Machanic blogged about recently. In MDX, to solve this problem and get an ordered list of product names, you just need to order a set and then use the Generate function, for example:

    Generate(
    Order(
      NonEmpty(
       [Product].[Product].[Product].MEMBERS
      ,[Measures].[Internet Sales Amount])
    ,[Product].[Product].CurrentMember.Name,BASC)
    ,[Product].[Product].CurrentMember.Name,",")

    But I’m getting carried away again. Arguing the merits of a language is all very well, but it’s commercial support that is what actually matters of course. And to get back to my original point that’s why I’m pleased to see that SAP seems to agree with me that MDX is a good thing.

    6/22/2009

    Speaking at the PASS Summit 2009

    The agenda for the PASS Summit 2009 has been announced, and I’m pleased to say that I’ve been accepted as a speaker. A full listing of who’s speaking can be found here:
    http://summit2009.sqlpass.org/Agenda/ProgramSessions.aspx
    http://summit2009.sqlpass.org/Agenda/SpotlightSessions.aspx

    I’ll be doing a session on ‘Designing Effective Aggregations in Analysis Services 2008’. Hope to see some of you there!

    6/19/2009

    Error messages in MDX SELECT statements and what they mean

    Anyone that has tried to learn MDX will know that, when you make a mistake somewhere in your code, the error messages that Analysis Services gives you are pretty unhelpful. It was suggested to me recently while I was teaching an MDX course that I should blog about common error messages and what they actually mean; so here’s a list of a few example queries using Adventure Works that return confusing errors, the error messages themselves, and details on how to solve the problems. I’ve deliberately concentrated on query-related errors rather than calculation-related errors (that can be a future blog post); if you can think of any more errors that I should cover please leave a comment.

    1) Query causing error:

    SELECT
    {[Measures].[Internet Sales Amount]} ON COLUMNS 
    [Date].[
    Calendar Year].MEMBERS ON ROWS
    FROM [Adventure Works]

    Error message: Query (3, 1) Parser: The syntax for '[Date]' is incorrect.

    The first step to solving this fairly simple syntax error is understanding the values in brackets in the error message. (3,1) indicates that the error is at character 1 on the third line of the query, where we have the expression [Date].[Calendar Year].MEMBERS; we should also see a red squiggly underneath this text in SQL Management Studio. There’s nothing wrong with this expression though, apart from the fact that it’s in the wrong place: what has happened is that we’ve forgotten to include a comma after COLUMNS immediately beforehand. If we put one in, the query runs.

    Solution:

    SELECT
    {[Measures].[Internet Sales Amount]} ON COLUMNS, 
    [Date].[
    Calendar Year].MEMBERS ON ROWS
    FROM [Adventure Works]

     

    2) Query causing error:

    SELECT
    {[Measures].[Internet Sales Amount]} ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MEMBERS.CHILDREN ON ROWS
    FROM [Adventure Works]

    Error message: Query (3, 1) The CHILDREN function expects a member expression for the 1 argument. A tuple set expression was used.

    This is a very common error that people encounter while learning MDX, and it all comes down to understanding the difference between sets, tuples and members. In a lot of situations Analysis Services is very forgiving: if it expects a set and you give it a single member, then it will cast that member into a set with one item in it for example. It can’t do this for you all the time, though, and you do need to understand what kind of object each function returns and/or expects for a parameter. In this case, the problem is that the .CHILDREN function needs to be passed a member and the .MEMBERS function returns a set (strictly speaking, as the error says, it’s a set of tuples); therefore we can’t use the two functions together. If we want to find all of the children of all years, we can use the DESCENDANTS function instead, which can accept a set as its first parameter.

    Solution:

    SELECT
    {[Measures].[Internet Sales Amount]} ON COLUMNS,
    DESCENDANTS(
    [Date].[Calendar].[Calendar Year].MEMBERS
    , [Date].[Calendar].[Calendar Semester])
    ON ROWS
    FROM [Adventure Works]

     

    3) Query causing error:

    SELECT
    [Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount] ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MEMBERS
    ON ROWS
    FROM [Adventure Works]

    Error message: Parser: The statement dialect could not be resolved due to ambiguity.

    Analysis Services supports no less than three query languages: MDX, DMX and a very limited subset of SQL. As a result, when you run a query it needs to work out what query language you’re using and can easily get confused if you make a mistake. In the query above we’ve given a list of the two measures we want to see on the columns axis, but we’ve forgotten to surround this list in braces to turn it into a set – and it’s a set that is required for the axis definition. This is an error that is commonly made by people with a background in SQL, and indeed the problem here is that the error has made the query look a bit too much like SQL or DMX. Putting in braces where they’re needed fixes the problem and removes the ambiguity.

    Solution:

    SELECT
    {[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MEMBERS
    ON ROWS
    FROM [Adventure Works]

    4) Query causing error:

    SELECT
    {[Measures].[Internet Sales Amount1], [Measures].[Internet Tax Amount]} ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MEMBERS
    ON ROWS
    FROM [Adventure Works]

    Error message: Query (2, 2) The member '[Internet Sales Amount1]' was not found in the cube when the string, [Measures].[Internet Sales Amount1], was parsed.

    A fairly straightforward error this: we’ve tried to reference a member that doesn’t exist in our query - it’s the extra 1 on the end of the name that’s the problem. The way to avoid this is to always let Analysis Services generate unique names for you, and you can do this by dragging the member (or any other object) from the metadata pane in SQL Management Studio into the MDX query pane when you’re writing queries. Here, using the correct member unique name solves the problem.

    Solution:

    SELECT
    {[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MEMBERS
    ON ROWS
    FROM [Adventure Works]

    Note that for dimensions other than the Measures dimension, what happens in this scenario depends on how you’ve set the MDXMissingMemberMode property. By default if you write something that looks like it could be an MDX unique name, but which isn’t actually the unique name of a member on a hierarchy, Analysis Services will simply ignore it. So the following query returns nothing on rows because the year 2909 doesn’t exist in our Calendar hierarchy:

    SELECT
    {[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
    {[Date].[Calendar].[Calendar Year].&[2909]}
    ON ROWS
    FROM [Adventure Works]

    And worse, the in this query a genuine syntax error is completely ignored too:

    SELECT
    {[Measures].[Internet Sales Amount], [Measures].[Internet Tax Amount]} ON COLUMNS,
    [Date].[Calendar].[Calendar Year].MAMBERS
    ON ROWS
    FROM [Adventure Works]

    6/18/2009

    Microsoft .NET Provider for Oracle dead

    Products, die, come back to life, die again… Anyway, those of you building cubes from Oracle data sources may be interested to know that the Microsoft .NET provider for Oracle has been killed off. Here’s the official announcement-disguised-as-a-blog-entry:
    http://blogs.msdn.com/adonet/archive/2009/06/15/system-data-oracleclient-update.aspx

    Yet again, it’s those persuasive “customers, partners, and MVPs” who are asking for the product to be canned. I just hope that there aren’t any “customers, partners, and MVPs” who are asking for Steve Ballmer to sell Microsoft to Google for $1, because then we’ll all be in trouble... Anyway, needless to say, not everyone is pleased:
    http://www.theregister.co.uk/2009/06/18/microsoft_kills_oracle_connector/

    One blog entry I’ve had half-written for a long time is a discussion of the problems faced when building cubes from Oracle data sources (thanks for your help on this David; anyone else with any hints, please leave a comment). There are a lot of bugs and data type issues that need to be dealt with, and from what I’ve gathered using the Microsoft .NET provider was part of how they could be worked around. 

    PerformancePoint Planning – Back from the Dead!

    Here’s the official announcement from http://www.microsoft.com/bi/partners/default.aspx :

    Financial Planning Accelerator

    Microsoft is pleased to make available the Financial Planning Accelerator (FPA). The FPA is source code and project files derived from the PerformancePoint Server 2007 Planning module. Based on requests from customers and partners, we are making this code available on a no-cost, individual license.
    This is unsupported source code that customers and partners can use to support or change PerformancePoint Server Planning functionality. Derived object code files can be distributed to end users with Microsoft SharePoint Server Enterprise Client Access Licenses. To obtain access to the FPA a license agreement between Microsoft and the customer or partner is required. After that agreement is in place, download instructions will be made available.
    Please e-mail
    fpasupp@microsoft.com to request the agreement.

    It’s not exactly open source, but it does mean that the partners who were hit hardest when PerformancePoint Planning was killed off can now get their hands on the source code, modify it and sell it on to their customers so long as those customers have the right Sharepoint licences. The question is now, will anyone take Microsoft up on this offer?

    6/16/2009

    Implementing Analysis Services Drillthrough in Reporting Services

    For some reason I’ve never needed to implement Analysis Services drillthough (note: not the same thing as Reporting Services drillthrough; why can’t they use consistent terminology?) in a Reporting Services report. Of course, Reporting Services support for Analysis Services being what it is, it’s not a straightforward task and since I’ve recently come across a few good blog posts that discuss the different ways you can do it I thought I’d link to them.

    The main problem is that you can’t execute an MDX Drillthrough statement using the MDX query designer and the Analysis Services data source. You have four options then:

    1. You can execute the Drillthrough statement through an OLEDB data source instead. Gurvan Guyader shows how to do this in the following blog entry (in French, but with lots of screenshots):
      http://gurvang.blogspot.com/2009/05/drillthrough-ssas-dans-ssrs.html
      The problem with using an OLEDB data source is that you lose the ability to use parameters and have to use Reporting Services expressions to dynamically built your Drillthrough statement instead.
    2. It turns out you can also execute a Drillthrough statement by pretending it’s DMX, and so use regular MDX parameters, as Francois Jehl describes here (also in French):
      http://fjehl.blogspot.com/2009/06/drillthrough-ssas-dans-ssrs-ajout-au.html
    3. If you buy Intelligencia Query (which, as always, I need to state that I have a financial interest in) then Drillthrough statements now work with no tricks necessary:
      http://andrewwiles.spaces.live.com/Blog/cns!43141EE7B38A8A7A!562.entry
    4. Last of all, you can try not using a Drillthrough statement at all and use an MDX query instead to get the same data. You will lose some functionality though by doing this, however, most notably the MAXROWS option.
    6/12/2009

    Google Fusion Tables

    Well, well, well… another week, another BI-related announcement from Google. Jamie Thomson just brought my attention to Google Fusion Tables which got released this week with almost no fanfare (maybe Google wanted to avoid the kind of backlash they got with Google Squared?). Jamie’s first comment was pretty much inline with what I thought: this looks a lot like a basic version of Gemini, or indeed any other DIY BI tool. Basically you upload data, you can filter it, aggregate it, edit it and even join datasets together; then you can format the results as tables, maps, charts and so on and share the results with other people. You can find out more about how it works here:
    http://tables.googlelabs.com/public/faq.html
    http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html

    So, even though I’ve got loads to do today I had to check it out, didn’t I? Google provide a number of different free datasets for you to play with, but I thought I’d have a go with some data about the hot topic of the moment here in the UK: MP’s expenses. This data is available in Google spreadsheet form – ideal for loading into Fusion Tables – from the Guardian data store site:
    http://www.guardian.co.uk/news/datablog/2009/may/08/mps-expenses-houseofcommons

    After a bit of trial and error (and Fusion Tables is definitely prone to errors – although of course it is a beta) I managed to create a view that shows the average value of MP’s expense claims, excluding travel expenses, as a bar chart. I'm supposed to be able to share it here and I've got the HTML, but at the time of writing I can't get the gadget to embed in this blog post. When I do, I'll update this post to include it. In the meantime here’s a screenshot:

    image  

    Nevertheless, it's fun even if it’s not quite a useful business tool yet. But hmmm… is it just me or does Google have some kind of BI strategy?

    UPDATE: this article has a little more detail on the technology behind it:
    http://www.itworld.com/saas/69183/watch-out-oracle-google-tests-cloud-based-database
    although I think it's a bit premature to say that this is going to kill Oracle, Microsoft and IBM...

    6/4/2009

    Google Wave, Google Squared and Thinking Outside the Cube

    So, like everyone else this week I was impressed with the Google Wave demo, and like everyone else in the BI industry had some rudimentary thoughts about how it could be used in a BI context. Certainly a collaboration/discussion/information sharing tool like Wave is very relevant to BI: Microsoft is of course heavily promoting Sharepoint for BI (although I don’t see it used all that much at my customers, and indeed many BI consultants don’t like using it because it adds a lot of extra complexity) and cloud-based BI tools like Good Data are already doing something similar. What it could be used for is one thing; whether it will actually gain any BI functionality is another and that’s why I was interested to see the folks at DSPanel not only blog about the BI applications of Wave:
    http://beyondbi.wordpress.com/2009/06/01/google-wave-the-new-face-of-bi/
    …but also announce that their Performance Canvas product will support it:
    http://www.dspanel.com/2009-jun-02/dspanel-performance-canvas-adds-business-intelligence-to-google-wave/
    It turns out that the Wave API (this article has a good discussion of it) makes it very easy for them to do this. A lot of people are talking about Wave as a Sharepoint-killer, and while I’m not sure that’s a fair comparison I think it’s significant that DSPanel, a company that has a strong history in Sharepoint and Microsoft BI, is making this move. It’s not only an intelligent, positive step for them, but I can’t help but wonder whether Microsoft’s encroachment onto DSPanel’s old market with PerformancePoint has helped spur them on. It’s reminiscent of how Panorama started looking towards SAP and Google after the Proclarity acquisition put them in direct competition with Microsoft…

    Meanwhile, Google Squared has also gone live and I had a play with it yesterday (see here for a quick overview). I wasn’t particularly impressed with the quality of the data I was getting back in my squares though. Take the following search:
    http://www.google.com/squared/search?q=MDX+functions#
    The first results displayed are very good, but then click Add Next Ten Items and take a look at the description for the TopCount function, or the picture for the VarianceP function:
    squared

    That said, it’s still early days and of course it does a much better job with this search than Wolfram Alpha, which has no idea what MDX is and won’t until someone deliberately loads that data into it. I guess tools like Google Squared will return better data the closer we get to a semantic web.

    I suppose what I (and everyone else) like about both of these tools is that they are different, they represent a new take on a problem, unencumbered by the past. With regard to Wave, a lot of people have been pointing out how Microsoft could not come up with something similar because they are weighed down by their investment in existing enterprise software and the existing way of doing things; the need to keep existing customers of Exchange, Office, Live Messenger etc happy by doing more of the same thing, adding more features, means they can’t take a step back and do something radically new. Take the example of how, after overwhelming pressure from existing SQL Server users, SQL Data Services has basically become a cloud-based, hosted version of SQL Server with all the limitations that kind of fudge involves. I’m sure cloud-based databases will one day be able to do all of the kind of things we can do today with databases, but I very much doubt they will look like today’s databases just running on the cloud. It seems like a failure of imagination and of nerve on the part of Microsoft.

    It follows from what I’ve just said that while I would like to see some kind of cloud-based Analysis Services one day, I would be more excited by some radically new form of cloud-based database for BI. With all the emphasis today on collaboration and doing BI in Excel (as with Gemini), I can’t help but think that I’d like to see some kind of hybrid of OLAP and spreadsheets – after all, in the past they were much more closely interlinked. When I saw the demos of Fluidinfo on Robert Scoble’s blog I had a sense of this being something like what I’d want, with the emphasis more on spreadsheet than Wiki; similarly when I see what eXpresso is doing with Excel collaboration it also seems to be another part of the solution; and there are any number of other tools out that I could mention that do OLAP-y, spreadsheet-y type stuff (Gemini again, for example) that are almost there but somehow don’t fuse the database and spreadsheet as tightly as I’d like. Probably the closest I’ve seen anyone come to what I’ve got in mind is Richard Tanler in this article:
    http://www.sandhill.com/opinion/daily_blog.php?id=45
    But even then he makes a distinction between the spreadsheet and the data warehouse. I’d like to see, instead of an Analysis Services cube, a kind of cloud-based mega-spreadsheet, parts of which I could structure in a cube-like way, that I could load data into, where only I could modify the cube-like structures containing the data, where I could define multi-dimensional queries and calculations in an MDX-y but also Excel-y  and perhaps SQL-y type way – where a range or a worksheet also behaved like a table, and where multiple ranges or worksheets could be joined, where they could be stacked together into multidimensional structures, where they could even be made to represent objects. It would also be important that my users worked in essentially the same environment, accessing this data in what would in effect be their own part of the spreadsheet, entering their own data into other parts of it, and doing the things they love to do in Excel today with data either through formulas, tables bound to queries, pivot tables or charts. The spreadsheet database would of course be integrated into the rest of the online environment so users could take that data, share it, comment on it and collaborate using something like Wave; and also so that I as a developer could suck in data in from other cloud-based data stores and other places on the (semantic) web – for example being able to bind a Google Square into a range in a worksheet.

    Ah well, enough dreaming. I’m glad I’ve got that off my chest: some of those ideas have been floating around my head for a few months now. Time to get on with some real work!

    6/1/2009

    Upcoming BI User Group Events

    Two UK SQL Server user group dates to flag up for anyone with an interest in BI:

    Unfortunately I’m going to be out of the country on the 10th, but it looks like it will be a good evening…

    5/27/2009

    BeginRange and EndRange connection string properties

    Using the Timeout connection string property is a good way of making sure that your queries don’t run for too long, but sometimes – for example when you’re using SSRS – you want to restrict the amount of data that a query returns. You can’t properly do this with Analysis Services, but it is almost possible…

    Consider the following query on Adventure Works:

    SELECT
    {[Measures].[Internet Sales Amount],[Measures].[Internet Tax Amount]}
    ON 0,
    [Date].[Date].[Date].MEMBERS
    ON 1
    FROM [Adventure Works]

    It returns 1189 rows and 3 columns. If you click on any of the cells containing data in SQL Management Studio, to see the cell properties, you’ll see that the CellOrdinal property contains the index of each cell in the cellset. So the top left hand cell is ordinal 0, the one to its right is 1, and so on until the last column where it starts again one row down:

    image

    Using the BeginRange and EndRange connection string properties, you can limit the cells in a cellset that actually get populated with data. Note that you can’t restrict the overall number of cells though, which would be more useful. Both these properties take an integer value which represents a cell ordinal: BeginRange is the first cell ordinal you want to contain data, EndRange is the last cell ordinal. Their default value is –1, which for BeginRange means start at the first cell ordinal and for EndRange means end at the last cell ordinal. So, for example, with BeginRange=4 and EndRange=7, running the query above would give the following output:

    image

    As I said, the overall number of cells in the cellset remains the same, but only the cells in the range we specified actually contain data. This ‘filtering’ happens after the query axes have been resolved, as far as I can see, so adding NON EMPTY on Rows for example does not filter out any of the empty rows. If you were using SSRS, however, you could do this filtering at the DataSet level.

    If you look in Profiler you’ll see that these properties have an affect on the amount of work SSAS does at query time. On a cold cache, with no BeginRange and EndRange set, the query scans all of the year partitions in the Internet Sales measure group as you would expect. But with BeginRange and EndRange set as above, on a cold cache SSAS only reads data from the 2001 partition.

    BTW, remember that if you’re experimenting with these connection string properties in SQLMS, when you’re finished you’ll need to either close and reopen SQLMS or set BeginRange=-1 and EndRange=-1 as a result of this bug (which still doesn’t seem to be fixed in SP1).

    5/21/2009

    Yet more Gemini demos dissected

    Some more Gemini demos have appeared on the BI Blog, with more new Gemini features revealed, so let’s step through them and see what we can see…

     
    • 2:45 Nothing much so far we haven’t seen already. However the toolbars are much easier to see in this video and the first thing pointed out is the list of tables loaded into Gemini listed at the bottom of the screen.
    • 3:15 We can also see a lot more of the toolbar at the top here too. In the ribbon we can see the following areas:
      • New Table, with buttons to import new tables from a database or from the clipboard
      • Table Tools, with buttons to create relationships between tables and to manage relationships. So far I’m getting a very strong feeling of relational database concepts coming through – ok if the Gemini user is familiar with them (perhaps through Access), but is it asking too much of a user to think in terms of tables and joins?
      • Columns Tools. Can’t see much here, but we saw a bit earlier in the demo that on the far right hand side of the data area it seems you can add new columns onto the end of the table, and the buttons here allow you to manage these columns, delete them, resize them etc.
      • Sort and Filter, pretty much self-explanatory
      • Calculations. Again can’t see much here, but the button at the top says Manual. I wonder if it’s going to give you the option to either automatically apply all your calculations, or only apply them when you press a button (in case calculation takes a long time).
      • View. The options here are Pivot Table and Switch to Excel. I guess the demo is done in some kind of Table view, and we’ll have the option to view the data instead in a pivot table or go to Excel and work with the data there in the way we would with any other external data source.
    • 4:17 The now inevitable OOHHH moment in a Gemini demo where we see 20 million rows of data being manipulated in memory. Of course, though, the amount of data we can work with will not only depend on how much memory we have but also how well it can be compressed. From what I understand of COP databases like Gemini, you get great compression because it only stores the distinct values held in each column; but if your data contains a lot of different values then you won’t be able to compress it as much and you won’t be able to work with as much of it. I think.
    • 4:46 And not wishing to sound like Mr Sceptical, but watching all these demos of sorting and filtering large amounts of data very quickly raises a question in my mind: are all the rows in the table actually sorted and filtered, or does Gemini just do enough sorting and filtering to fill the screen? Finding the top 30 or so rows out of 20 million based on a value is certainly impressive, but it’s not the same as sorting all those 20 million rows.
    • 6:23 The Manage Relationships dialog. Again, very relational and strangely non-visual as well; I’d have expected a graphical representation of the two tables joined, just like you’d get in any other database tool. Maybe it’s not ready yet though.
    • 6:55 Looks like our first sight of DAX. The expression is:
      sumx(RELATEDTABLE(Purchase), Purchase[PurchaseSourceId])
      Hmm, again seems more like a SQL expression (a sum/inner join) translated to Excel rather than anything resembling MDX. It does the calculation very quickly although it’s the first time something has been less than instant.
     
    • 0:25 We’re in Excel now, using a pivot table, but notice that on the right-hand side we have the ‘Gemini task pane’ so perhaps it’s not a regular pivot table?
    • 2:48 Create Relationship dialog. Again it doesn’t seem very graphical, and notice the use of relational database terminology again with the mention of primary keys and foreign keys; for someone who is used to working with databases this is fine, the obvious term to use, but are these concepts we should expect Gemini users to understand? Shouldn’t things be less technical, more user friendly?
    • 2:59 Interesting that creating a relationship takes a few seconds and some crunching to do. I wonder what’s going on here exactly? Cube reprocessing?
    • 3:43 Show Values As menu option – ok, this is what you get in Excel anyway, but am I right in thinking there are a lot more options here now than are available in 2007? Maybe I’m wrong, but this all seems to be Excel calculations rather than calculations happening in Analysis Services.
    • 8:15 The Excel workbook containing this data is 203MB – interesting, because although Gemini is in-memory, it’s clearly possible to persist the data to disk if it’s being stored inside the workbook somehow.

    One last point prompted by all the relational database-related terms we’ve seen: if I was a pure SQL Server relational database guy, with no interest in Analysis Services, I’d still like to get my hands on Gemini and use it server side if it’s this quick. Which goes back to a point I’ve made before in the past that if Analysis Services could be used inside SQL Server as an invisible layer to speed up the execution of data warehouse/BI style TSQL queries, in the same way as Oracle OLAP can be, it would be very cool. Just think of that working with Madison, in fact…  

    5/20/2009

    Joining the results of two MDX queries together

    One question I get asked occasionally is whether it’s possible to join the results of two MDX queries together. Although I seem to remember this kind of functionality is mentioned in the OLEDB for OLAP spec it certainly isn’t supported in Analysis Services MDX and I don’t expect it ever will be; therefore, as all good consultants know, when you’re faced with a request for functionality that doesn’t exist what you have to do is look closely at the requirement to see if there’s a different way of solving the problem to get the result the customer wants…

    What people usually want to do when they think about joining MDX queries is this: they want to create a query that shows members from two different hierarchies side-by-side on the same axis. For example, in Adventure Works you might want to see a query with Calendar Years on Rows and Countries followed by Product Categories on Columns, something like this:
               

      Australia Canada Bikes Clothing
    CY 2002 $2,154,284.88 $621,602.38
    $6,530,343.53 (null)
    CY 2003   $3,033,784.21 $535,784.46 $9,359,102.62 $138,247.97

    It’s clear we can get the results we need by running two different queries, as follows:

    SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]}  ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])

    SELECT {[Product].[Category].&[1],[Product].[Category].&[3] }  ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])

    Depending on the tool we’re using, we could try to put the results next to each other to make them more easily comparable. What we can’t of course do is something like the following query:

    SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada],[Product].[Category].&[1],[Product].[Category].&[3]}  ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])

    This will result in the following error message:
    Members belong to different hierarchies in the  function.
    …for the very good reason that we have violated one of the fundamental rules of MDX – a set has to contain members of the same dimensionality, and here we have a set containing Countries and Product Categories.

    What can we do to make the query work? Well, there is a simple MDX solution: create a set of tuples containing Countries and Product Categories:

    SELECT
    {
    CROSSJOIN({[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]},{[Product].[Category].[All Products]}),
    CROSSJOIN({[Customer].[Country].[All Customers]},{[Product].[Category].&[1],[Product].[Category].&[3]}) 
    }
    ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])

    What I’ve done here is created a set using two Crossjoins. The first returns a set containing the Countries we want crossjoined with the All Member from Product Categories; the second returns a set containing the All Member from Countries crossjoined with the Product Categories we’re interested in; we can then union them together and use them on the same axis because the tuples in the set have the same dimensionality, ie (Country, Product Category). Here’s what you get back:

    image

    It’s not quite what we wanted, but it’s all the data we need in a single query and we can probably get the user to ignore the All Members, or possibly hide them in the client tool somehow. The only problem with this approach is that it becomes unwieldy the greater the number of different hierarchies we want to display on columns.

    If we’re using SSRS 2008 to display the results of our query, there’s another possible approach: we can use the new Tablix control to create the style of layout we’re after instead quite easily. You need to start by using the query designer and paste in a version of the query above with Years, Countries and Product Categories on Rows and Internet Sales Amount on columns:

    SELECT
    [Measures].[Internet Sales Amount] ON 0,
    {
    CROSSJOIN({[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]},{[Product].[Category].[All Products]}),
    CROSSJOIN({[Customer].[Country].[All Customers]},{[Product].[Category].&[1],[Product].[Category].&[3]}) 
    }
    *
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]

    You then create a new matrix, drop Calendar Year onto the row group, Internet Sales Amount into the Data area, Country onto column group, then right click on the rightmost column and select Add Group->Column Group->Adjacent Right, to create a new column group, set it to group by Product Categories and again drop Internet Sales Amount into the data area:

    image

    Then, for each Column Group you need to make sure that you don’t see aggregated values for the All Members (which of course in SSRS are returned not with the All Member’s name, but with blank names); You do this by setting a filter on each group property, using an expression like:
    =Fields!Country.Value IS Nothing
    In this case [Country] is the name of the Country in the report, and if this expression returns False we have a Country name and we’re therefore not looking at the All Member.

    Anyway, you then get an output like this, which is what we wanted:

    image

    Here’s one last impractical but fun way to solve the problem. While playing around with DMX recently it occurred to me that the SHAPE statement could also be useful in solving this problem, and a lot of help on the syntax from my friend and DMX (as well as SSIS) guru Mr Allan Mitchell, I came up with the following:

    SELECT FLATTENED t.*
    FROM
                    [Sequence Clustering] -- arbitrary just has to be a mining model
    NATURAL PREDICTION JOIN
    SHAPE
    {
    SELECT {[Customer].[Country].&[Australia],[Customer].[Country].&[Canada]}  ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])
    }
    APPEND
    (
                    {
    SELECT {[Product].[Category].&[1],[Product].[Category].&[3] }  ON 0,
    {[Date].[Calendar Year].&[2002],[Date].[Calendar Year].&[2003]} ON 1
    FROM [Adventure Works]
    WHERE([Measures].[Internet Sales Amount])
                    }
    RELATE [[Date]].[Calendar Year]].[Calendar Year]].[MEMBER_CAPTION]]]
    TO [[Date]].[Calendar Year]].[Calendar Year]].[MEMBER_CAPTION]]]
    ) AS MyNestedTable as t

    image

    To get this to work you just need to have a mining model in your SSAS database so you can put it in the FROM clause; it doesn’t matter what it is because it’s going to be ignored. I was able to join the queries on the MEMBER_CAPTION field from [Date].[Calendar Year], which contained the names of the Years on rows, although it was a struggle to work out how and where to add all the extra opening and closing square brackets that are needed in the RELATE clause! Notice, though, that we can just paste the MDX queries we need in there – usually SHAPE is used with OPENQUERY, but of course even though this is a DMX query we’re staying within the same SSAS database to get the data so that’s not necessary. Not the most elegant solution, of course, but interesting nonetheless.

    5/13/2009

    Google Squared

    Today Google announced an interesting new product: Google Squared. Here are some links:
    http://www.theregister.co.uk/2009/05/13/google_squared/
    http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9132972
    Basically it will return data from search results in spreadsheet format. And of course, when you’ve got data in Google spreadsheet format you can do all kinds of cool stuff with it, like stick Panorama’s pivot table gadget on top of it.

    This, plus moves towards support of RDFa also announced today:
    http://www.techcrunch.com/2009/05/12/google-takes-steps-towards-a-more-structured-web/
    means that there’s going to be some really interesting possibilities for doing BI direct from data sourced from the web.

    Oh, and let’s not forget about Wolfram Alpha, also coming soon and equally exciting from a web/data/BI point of view. Imagine, instead of it being able to tell you things like the distance between the Earth and the Moon right now, having your business modelled in it and then letting end users query this model using a search-engine interface.

    Metadata? Complex Event Processing?

    In part one of today’s ‘Interesting BI-related stuff I saw on the web today’ posts…

    After MDM finally reared its head, it seems like Microsoft is working on some kind of metadata tool as well:
    http://www.tdwi.org/News/display.aspx?ID=9434
    More news later this year apparently. Interesting comment:
    ”He did disclose that Microsoft's still-percolating metadata management effort will encompass both its MDM and search assets.”

    AND it seems like Microsoft is entering the Complex Event Processing market:
    http://www.biztalkgurus.com/blogs/biztalksyn/archive/2009/05/11/microsoft-announces-entry-into-complex-event-processing-cep-market.aspx
    Since other CEP vendors support some kind of OLAP on top of their data (eg SQLStream/Mondrian, Aleri) I wonder if Microsoft have a story for SSAS and CEP?

    UPDATE: more details on MS CEP here:
    http://www.dbms2.com/2009/05/13/microsoft-announced-cep-this-week-too/

    5/12/2009

    SQL2008 R2 Site Live

    So the announcements are starting to flow at TechEd – for instance, Microsoft’s long-awaited master data managment solution, now called Master Data Services, will be available as part of the SQL2008 R2 (what was known as Kilimanjaro) release. More details on this and other BI-related features can be found here:
    http://www.microsoft.com/sqlserver/2008/en/us/R2.aspx

    Looks like SSRS will be getting some new stuff - certainly the collaboration features brought in by the 90 Degree Software acquisition look like they’re going to be added to Report Builder. Perhaps we’ll finally see the Officewriter/SSRS functionality too?

    UPDATE: one other thing, mentioned by Teo here:
    http://prologika.com/CS/blogs/blog/archive/2009/05/11/transmissions-for-teched-usa-2009-day-1.aspx

    Gemini will be able to source data from SSRS reports, and SSRS will be able to expose data as 'data feeds' (ie have a new RSS/ATOM rendering extension?).

    UPDATE #2: Rob Kerr has a very good write-up and analysis of what was shown of Gemini here:
    http://www.robkerr.com/post/2009/05/Microsoft-BI---Gemini.aspx

    It looks like there's been the official announcement of a feature I've heard rumours about, namely that Gemini will have its own language for defining multidimensional calculations called DAX. As Rob says, it'll be interesting to see whether it suffers the same fate as that other attempt to simplify MDX, PEL...

     

    5/6/2009

    New Gemini Demos Dissected

    On the BI Blog on Monday a new set of Gemini demos were posted; they’re also available on YouTube. They look like the same demos I saw at PASS Europe a few weeks ago and while they don’t show much in the way of different functionality compared to what was shown late last year, I think there are a few interesting points to note. Unfortunately the quality of the picture is so poor you can’t make out much detail on the screen, so I can only really comment on what Donald Farmer (who’s presenting) specifically points out.

    Let’s step through each demo and I’ll give you a running commentary on them…

     
    • 0:11 Note that Gemini isn’t built into Excel, it’s an Excel addin. I’m not sure whether it will work in Excel 2007 or only Excel 2010 (or whatever it’s called) but this is significant for another reason: it means that Gemini release cycles are not tied to Office release cycles, so potentially new releases of Gemini can appear reasonably regularly.
    • 0:35 We start off with sourcing data from a data warehouse – probably intentionally, to forestall some of the hostility that was seen when Gemini was first announced, when Gemini was seen as being yet another “you don’t need a data warehouse” type tool.
    • 1:05 Arggh, why can’t I see the buttons on the Data-Cleaning ribbon? It looks like there’s a lot of stuff there, although it might not be all working properly yet.
    • 1:45 The obligatory boast about how much data you can work with – in this case 20 million rows – on a regular desktop machine. In my experience that’s the average number of rows I see in a fact table underneath SSAS (though of course it can handle way more than that), so the number was probably deliberately chosen for that reason, as well of course to get the Excel users out there salivating. Suddenly posts like this seem less funny, more like a chilling prediction of things to come…
    • 2:48 You can copy data into Gemini from the clipboard. Note that you don’t seem to be able to link to the data directly in Excel, at least not yet. Donald also mentions that ‘other data sources’ will be supported – it’ll be interesting to see which ones.
    • 3:18 Creating a pivot table. We seem to be back in regular Excel here and out of the Gemini addin, although Donald says that ‘in Gemini we have some cool pivot tables we can handle’. Perhaps what we’re seeing here are Excel 2010 pivot tables.
    • 3:50 Pointing out the inferred relationship between tables. I suspect this relationship was inferred well before this point; we already know you are going to be able to set these relationships up manually.
    • 4:26 Showing data as a percentage of total. There seems to be a big button to do this; are there going to be any other easy calculations available? Where are the calculations taking place, and how are they expressed – in Excel or the underlying Gemini cube?
    • 4:50 New slicer bars – mentioned as a ‘new feature in Excel’, specifically for Gemini but also available for other Excel users. So this must be new generic Excel pivot table, rather than Gemini functionality. This looks really good; I like the way they are aware of each other too, and aware of what data is available, though I wonder how exactly they know whether data is available and how this would work with cubes containing calculations etc.
     
    • 0:20 Set theme – whoa, so you can apply a theme to an Excel spreadsheet? Hmm, turns out you can already do this. But it is a powerful feature when you want to create a report.
    • 0:30 Publishing to Sharepoint, but notice how Donald mentions that publishing a model containing 20 million rows would take a bit of time. How long exactly? Minutes? Hours?
    • 0:40 The Sharepoint report centre. OK, so we can rate reports with stars, yeah that’s going to be useful… but other ‘social tools for collaboration’ might be interesting.
    • 1:03 Setting a refresh rate. Basically how often the local cube underneath Gemini gets processed, I suppose. How long will a refresh take though?
    • 1:21 Seeing the report in a thin client. This is Excel Services, I think…? This will only make it harder to choose between Excel/Excel Services/Gemini on one hand and SSRS on the other. It would be nice if there was some kind of story linking the two.
    • 2:57 The operations dashboard – again, I wish I could see more detail of what’s on screen. I can see some of the stuff you’d expect, like metrics on CPU usage and query response times. It’s all done in Excel Services again – I wonder if there’s a cube behind it all storing the performance data?
    • 3:46 Upgrading and formalising a popular app. But notice that the option is ‘upgrade to PerformancePoint’…? The focus is on upgrading for better maintenance and management rather than performance; I guess in PerformancePoint you’ve got IT control over the report design. Possibly, when server-side SSAS gets the Gemini storage engine, you’ll be able to push the Gemini cube into an instance of SSAS. But when you’ve done this will you still be able to use the performance metrics dashboard we’ve just seen?
    5/4/2009

    SSIS Trace File Source Adapter now available

    Hurray! At long last, my friends at www.sqlis.com have got round to making their Trace File Source Adapter for SSIS available:
    http://www.sqlis.com/post/Trace-File-Source-Adapter.aspx

    What’s this got to do with Analysis Services? Well, as the post mentions, there are tons of cool things you can use this for when you’ve got some Analysis Services trace files: cache warming and usage monitoring among other things. It works for SQL Server relational engine traces too, if you care about that sort of thing…

    5/1/2009

    Microsoft Solver Foundation

    Via SoCalDevGirl, I’ve just discovered another interesting piece in the somewhat fragmented Microsoft Business Intelligence story: Microsoft Solver Foundation. Here’s the official website:
    http://www.solverfoundation.com/Default.aspx

    What is it then? Ahem, well, if you can’t make much sense of the blurb on the website (like me) it’s probably not aimed at you. Here’s a sample quote:

    Solver Foundation is a Microsoft framework designed to deliver critical business insight tools to CxOs, quantitative analysts and developers of mission-critical systems. Traditionally referred to as mathematical programming, these tools provide business intelligence and planning support to organizations seeking maximal competitive advantage.

    I suggest you read the full overview to get a better idea of what it does. What I do understand, though, is that anyone who uses this is going to be interested in using the rest of the Microsoft BI stack; I sincerely hope that the Solver Foundation team is talking to the other BI teams and that some kind of coherent BI strategy will emerge. If one does it’s clearly going to be Excel-centric (which makes a lot of sense): Solver Foundation has an Excel addin; there’s also the data mining addin; there’s SSAS’s own integration with Excel; and of course Gemini will be surfaced through Excel, tying up SSAS, some kind of data cleansing functionality, and possibly some data mining functionality too into one compelling package.

    4/28/2009

    Speaking next week in Zurich

    I do quite a lot of work in Switzerland: I lived in Basel for three years and still have a lot of friends and business contacts there as a result. I’ll be there next week, in fact, and while I’m there I’ll be speaking at the May meeting of the Swiss PASS chapter in Zurich. I’ll be doing the same session I did at PASS Europe last week (so no prep time needed, luckily) on ‘Designing Effective Aggregations in SSAS 2008’. All the details are here:
    http://www.sqlpass.ch/

    Hope to see some of you there…