| Chris 的个人资料Chris Webb's BI Blog日志列表 | 帮助 |
|
|
2009/11/9 PASS Summit ThoughtsThe PASS Summit is over for another year and I’m just starting out on the long trip back home, so there’s plenty of time to get my thoughts together on what’s happened over the past week. In fact there’s not much to say about the event itself: it was, as ever, a lot of fun and totally worthwhile. Hey, within 30 minutes of arriving at the conference I learned I’d won an award for the best BI-related blog entry, for my post on implementing real SSAS drilldown in SSRS! Attendance was up from last year although probably the recession still took its toll: remember that there was no BI Conference this year and I would have thought that a lot of people who would have gone to it would have gone to PASS instead. To be honest I think not having a BI Conference is a good thing, actually. I don’t like having to choose which conference to attend, and part of the benefit of a conference is to get as many members of a tech community together in one place. And this was certainly the largest gathering of Analysis Services people I’ve ever seen: all the usual crowd were there, I met a lot of people who I’d only met a few times before, and I finally got to meet Darren Gosbell in person after having known him by email for at least five years. One complaint I would make about the event was that the sessions weren’t scheduled particularly well. I know everyone always complains about this but in this case it did seem worse than usual: my session, for example, was up against two other SSAS-specific sessions, but in other cases there were time slots with no SSAS content at all. The other benefit of PASS is that you get to talk at length about what’s going on in the world of SQL Server with other like-minded people. As a result you get to crystallise your thoughts on a lot of matters and - guess what – I’m going to share mine here. First of all, the topic that was on everyone’s lips was PowerPivot. In fact everyone at the conference must have seen the standard demo at least five times and there were also a lot of advanced sessions on it too. Don’t get me wrong, I really think PowerPivot it cool from a technology point of view, I am going to take the time to learn it, and I also think from a make-money-by-getting-people-to-upgrade-to-Office-2010 point of view it is a very clever move for Microsoft. But my feelings about it remain ambiguous. Quite apart from the arguments about it discouraging ‘one version of the truth’ and encouraging spreadmarts that have already been discussed ad nauseam, I have another problem with it: I don’t honestly know whether I, as a consultant, will be able to make any money from it. The very nature of it, as a self-service tool, means no expensive outside consultancy is necessary. I don’t think it will take business away from me though; it will be widely used and it will be used instead of regular SSAS for more basic projects, but the more serious stuff will stay with SSAS I hope. I think the need for sophisticated security and more complex calculations will be the deciding factor when people choose between SSAS and PowerPivot; I’m not sure I see many people upselling from PowerPivot to SSAS either. We’ll see. Something that worries me more about PowerPivot is the fact that it seems to have diverted the attention of the SSAS dev team. For SSAS 2008 we had few new features, although the performance improvements were very welcome. For 2008 R2 I can only think of one new feature in SSAS, and that’s the ability to use calculated members in subselects that will allow Excel 2010 to use time utility dimensions properly (I’ll blog about that at some point). Even though work on good old server-side SSAS will resume for the next major release of SQL Server I worry that PowerPivot will take priority in the future. If this happened it would be bad for me and other BI partners from a business point of view, and seems crazy given that SSAS has been such a successful product in the enterprise sector; it’s not like there aren’t a lot of new features and fixes that could be done. Shades of IE6 and Microsoft getting complacent once it’s cornered a market, I think. Last of all on PowerPivot, I suspect that there is something new relating to it in the roadmap that hasn’t been announced yet. David DeWitt devoted his keynote on Thursday to it, the specifics of column-store databases and the Vertipaq engine (which is the new in-memory storage engine that PowerPivot uses), and at the end hinted at this saying that although he couldn’t make any announcements, those people who had been paying attention might have some ideas on what the future held for it. Of course I hadn’t been paying attention properly, but the obvious thing would be to integrate it with the relational database somehow. Given that PowerPivot is now being hosted inside Sharepoint, why not host it in SQL Server too? It’s already very table and join friendly, and I could imagine a scenario where it was used inside SQL, pointed at a schema, some kind of proactive caching kept the data in SQL in synch with the data in the Vertipaq store, difficult BI calculations could be expressed in DAX, but the whole thing was transparent to TSQL. Imagine integrating that with Madison too! Moving on, the other thing that has become clear to me is that I really have to sit down and learn Sharepoint (or at least the relevant bits of it) properly. It’s at the heart of Microsoft’s BI strategy and there’s no avoiding it. I have to admit to some mixed feelings about this move though, and I know other people I talked to at the conference share them. Partly it’s because, in the past, there were BI specialists and there were Sharepoint specialists and we didn’t necessarily have much to do with each other; now, though, the two worlds are colliding and I’m outside my comfort zone. You might say that Sharepoint has been part of the MS BI strategy for ages now, what with PerformancePoint etc, but I see an awful lot of MS BI customers in my work and I very rarely seem to see any Sharepoint, although it could be because I’m not looking out for it. A more valid objection is that the need for Sharepoint Enterprise Edition CALs adds a lot of extra cost to a project; and from a technical standpoint Sharepoint itself carries a very big overhead – its installation and maintenance may put a lot of customers off if they don’t already have a company-wide Sharepoint strategy, and if they do have one they may not be willing to go to 2010 for some time. Sharepoint might be just too big for some customers to swallow, and be a difficult sell for BI partners. I’d like to stress though, once again, that I see the considerable technical benefits for using Sharepoint for BI, and even if the reception of the latest wave of PerformancePoint has been somewhat muted (eg the realisation that the decomposition tree has been tacked on at the last minute and isn’t properly integrated) I am impressed with what’s coming with Excel 2010 and Excel Services too; for example I think the Excel Services REST API is very cool indeed, and as a SSAS client Excel 2010 is a big improvement on 2007 (which wasn’t all that bad either). I’ve decided I also need to learn Excel properly now as well – get to know all those advanced Excel functions, use Solver and all that. Once again two worlds are colliding: the Excel guys and the SSAS guys are going to have to learn a lot more about each others’ technologies for truly effective BI applications to get built. Anyway, I think this post has gone on quite long enough now. As always, your comments on everything I’ve written here would be much appreciated. 2009/6/4 Google Wave, Google Squared and Thinking Outside the CubeSo, like everyone else this week I was impressed with the Google Wave demo, and like everyone else in the BI industry had some rudimentary thoughts about how it could be used in a BI context. Certainly a collaboration/discussion/information sharing tool like Wave is very relevant to BI: Microsoft is of course heavily promoting Sharepoint for BI (although I don’t see it used all that much at my customers, and indeed many BI consultants don’t like using it because it adds a lot of extra complexity) and cloud-based BI tools like Good Data are already doing something similar. What it could be used for is one thing; whether it will actually gain any BI functionality is another and that’s why I was interested to see the folks at DSPanel not only blog about the BI applications of Wave: Meanwhile, Google Squared has also gone live and I had a play with it yesterday (see here for a quick overview). I wasn’t particularly impressed with the quality of the data I was getting back in my squares though. Take the following search: That said, it’s still early days and of course it does a much better job with this search than Wolfram Alpha, which has no idea what MDX is and won’t until someone deliberately loads that data into it. I guess tools like Google Squared will return better data the closer we get to a semantic web. I suppose what I (and everyone else) like about both of these tools is that they are different, they represent a new take on a problem, unencumbered by the past. With regard to Wave, a lot of people have been pointing out how Microsoft could not come up with something similar because they are weighed down by their investment in existing enterprise software and the existing way of doing things; the need to keep existing customers of Exchange, Office, Live Messenger etc happy by doing more of the same thing, adding more features, means they can’t take a step back and do something radically new. Take the example of how, after overwhelming pressure from existing SQL Server users, SQL Data Services has basically become a cloud-based, hosted version of SQL Server with all the limitations that kind of fudge involves. I’m sure cloud-based databases will one day be able to do all of the kind of things we can do today with databases, but I very much doubt they will look like today’s databases just running on the cloud. It seems like a failure of imagination and of nerve on the part of Microsoft. It follows from what I’ve just said that while I would like to see some kind of cloud-based Analysis Services one day, I would be more excited by some radically new form of cloud-based database for BI. With all the emphasis today on collaboration and doing BI in Excel (as with Gemini), I can’t help but think that I’d like to see some kind of hybrid of OLAP and spreadsheets – after all, in the past they were much more closely interlinked. When I saw the demos of Fluidinfo on Robert Scoble’s blog I had a sense of this being something like what I’d want, with the emphasis more on spreadsheet than Wiki; similarly when I see what eXpresso is doing with Excel collaboration it also seems to be another part of the solution; and there are any number of other tools out that I could mention that do OLAP-y, spreadsheet-y type stuff (Gemini again, for example) that are almost there but somehow don’t fuse the database and spreadsheet as tightly as I’d like. Probably the closest I’ve seen anyone come to what I’ve got in mind is Richard Tanler in this article: Ah well, enough dreaming. I’m glad I’ve got that off my chest: some of those ideas have been floating around my head for a few months now. Time to get on with some real work! 2008/12/31 Fourth Blog BirthdayFor the second year running I'm late celebrating my blog birthday (it was yesterday); my only excuse is that I'm still reeling from the amount I've eaten over the last two weeks. But four years of blogging... wow... it feels like ages. And to a certain extent I feel that, after all this time, I'm running out of things to say here. The actual writing of blog entries isn't a problem, it's more the problem of having something to write about. Part of the problem is me: I don't want to write about things I don't find interesting so I haven't gone down the route of turning the blog into a MDX tutorial (Bill Pearson does that much better than I ever could, and only Mosha could ever cover the advanced stuff properly), but at the same time I'm not coming across so many MDX/SSAS issues or obscure features as I used to. Part of the problem is, too, that SSAS2008 was so light on new features that it didn't provide me with much new to write about. So I'm hoping that Gemini, Kilimanjaro, Excel 14, Azure etc will give me something to get my teeth into in 2009; I'm sure they will. If not, well, I've always wanted to spend some time getting into Mondrian and other open source BI technologies. And with the economy the way it is I suppose I'll have a lot more spare time for learning new stuff in the coming twelve months... But anyway, bear with me and keep reading! For those of you who have stuck with me for the last four years, thanks, and best wishes for 2009. 2008/12/30 Why can't we just draw our own reports?Here's a way-out thought I had over the Xmas break for a new approach to building BI reports.... Have you, when you've asked a typical non-technical business user what they want a report to look like, asked them to draw a quick sketch? I do all the time - I find seeing what the user wants the report layout to look like is much the best way to understand what they want and for the user it's the best way to express their requirements. So on the back of the proverbial envelope you'd get something like this: ...and then go back to your desk and write the query and design the report layout in something like SSRS. So - why can't we cut out a step and go direct from the sketch to the report design? I can see two options for the first step here:
You would then take the freehand drawing and:
Working out what the borders of a table should be from a freehand drawing must be possible (although implementation would be well beyond me). Interpreting what the user has written they want on columns and rows would present more problems:
So it certainly wouldn't work like magic, but at the same time I think it would offer some advantages over current report design tools, the designers of which have fallen into the trap of building a UI on top of the functionality they've got available in MDX or SQL, rather than building a UI for what the user actually wants to do. After all, don't you think that it's actually very difficult to lay out anything other than the most simplistic reports in most report design tools, compared to how easy it would be to draw the report? 2008/10/1 Interesting stuff coming from Microsoft soonA couple of interesting (and possibly BI-related) technologies are coming soon from Microsoft:
2007/11/8 Enterprise Search and BII notice from various sources (for example, Don Dodge) that Microsoft have released a free version of their Enterprise Search product, Microsoft Search Server 2008 Express. The thing that caught my eye was the list of federated connectors: http://www.microsoft.com/enterprisesearch/connectors/federated.aspx ...which includes Business Objects, Cognos, SAS, but there's no mention of Reporting Services or Analysis Services anywhere. As I think I've said here before, I'm not convinced that a search interface on top of a BI platform is going to be useful in the real world (though I bet you could do some cool demos with it): I suppose if you have hundreds of SSRS reports for example you might want to look for the ones that contain figures for a particular Product or Customer, but I would have thought that it's just as likely that you'd do a search, find a report and then find you don't have permission to view it. As for using a search interface as a way of querying a cube, all I have to say about that is two words: English Query. But I think there's a more interesting application for BI here: what if you could build a cube off the index this thing creates? You could have dimensions like Date Updated, Keyword, File Type and Path, and measures like Count of Files and File Size; you'd be able to do things like create reports which tracked the overall space taken by mp3 files on your network and where these files were, the number of emails with the phrase "new job" in; even just browsing ad hoc in Excel you'd have a new way of searching for files: for example, you could slice on File Type=Word doc, Keyword="CV" or "Resume" and then put the Path dimension on rows and drill down to find all the CVs on your network. 2006/8/25 Adapting SQLIS to work with tuples and sets as a data sourceIt's been a long time since I posted in my 'random thoughts' category... but I just had such an interesting idea I thought I'd post it up (even if there's 0% chance I'll ever get round to implementing this).
I was looking at a (non-Microsoft platform) BI tool today and got thinking about MDX, how people find it hard to work with, and how most client tools don't really expose the power of MDX sets, and how handy it would be to be able to do some procedural things in MDX too. This particular tool had some cool set-based selection functionality and I reflected that even though I'd seen similar set-based selection tools, some on AS (didn't Proclarity have something in this area?), they'd never really taken off; I also thought about the much-missed MDX Builder tool which had a similarly visual approach to building MDX expressions. I started thinking about whether it would be worth building another client tool which took this approach but quickly came to the conclusion that the world needed another AS client tool like a hole in the head, but realised that if I was going to build this kind of tool how much it would resemble Integration Services. And then I had my idea: why not extend Integration Services so it can treat MDX sets and tuples as a data source, and then use its existing functionality and create new transformations to implement MDX set-based operations?
Let me explain in more detail. I'm not talking about simply getting data out of AS in the same way you'd get it out of a SQL Server table, using an MDX query. What I'm saying is that what would be flowing though the IS data flow tasks would be members, sets and tuples: each 'row' of data would be an MDX expression returning member, or tuple, or set. So you'd create a custom data source where you could define a set as your starting point - probably at this point you'd just select a whole level, or the children of a member, or some such simple set of members. For example you might select the [Customers].[Customer].[Customer] level in your Customer dimension; the output from this would be a single text column and a single row containing the set expression [Customers].[Customers].[Customers].Members. You could then put this through an Exists() transform to return only the customers in the UK and France, the output from which would be the set expression Exists([Customer].[Customer].[Customer].Members, {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]}). Similarly then you could put this through a Crossjoin() transform to crossjoin this set with the set of all your Products, then put the result through a NonEmpty() transform to remove all non empty combinations from the set. At this point your output would still be a single row and column, consisting of the MDX expression:
NonEmpty (Crossjoin( Exists( [Customer].[Customer].[Customer].Members , {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]}) , [Product].[Product].[Product].Members) , [Measures].[Internet Sales Amount]) So far, so dull though. All we've got is a way of building up a string containing an MDX set expression and SQLIS brings little to the party. But the real fun would start with two more custom transformations: SetToFlow and FlowToSet. The former would take an input containing MDX set expressions (and conceivably there could be more than one row, although we've only got one so far) and would output a flow containing all the tuples in the set(s) we've passed in. Taking the set above, the output would be the contents of measures.outputdemo in the following query on AdventureWorks:
with member measures.outputdemo as TupleToStr(([Customer].[Customer].Currentmember, [Product].[Product].Currentmember) ) select {measures.outputdemo} on 0, NonEmpty( Crossjoin( Exists( [Customer].[Customer].[Customer].Members , {[Customer].[Country].&[United Kingdom], [Customer].[Country].&[France]}) , [Product].[Product].[Product].Members) , [Measures].[Internet Sales Amount]) on 1 from [Adventure Works] The FlowToSet transform would do the opposite, ie take an input containing tuples and return a single row containing the set represented by the entire input. For the above example, this would be a big set:
{([Customer].[Customer].&[12650],[Product].[Product].&[214]), ([Customer].[Customer].&[12650],[Product].[Product].&[225]),...}
But the point of this would be that you could then apply more MDX set expressions efficiently, although of course there's no reason why you can't apply MDX set expressions to individual tuples in a data flow. The final important
custom transform you'd need would be an Evaluate transform, which would append one or more numeric or text columns to a tuple or set dataflow: each of these columns would be populated by evaluating an MDX expression which returned a value against the set or tuple for each row. So, for example, if a row contained a the set we've been using we could apply a the Count function to it and get the value 12301 back; if a row contained the tuple ([Customer].[Customer].&[12650],[Product].[Product].&[214]) we could ask for the value of this tuple for the measure [Internet Freight Cost] and get the value 0.87 back; or to the same tuple we could ask for the value of [Customer].[Customer].CurrentMember.Name and get back the value "Aaron L. Wright".
Of course the beauty of this is that once you've got a flow containing sets, tuples and numeric values retrieved from the cube for them then you can use all the cool existing SQLIS functionality too, like multicasts, lookups, UnionAlls, Aggregates etc to do stuff with your sets that is hard in pure MDX; and of course you can easily integrate other forms of data such as relational or XML, and do useful things at the end of it all like send an email to all your male customers in the UK who bought three or more products in the last year, or who live in London and have incomes in excess of £50000 and have averaged over £50 per purchase, or who have been identified as good customers by a data mining model, and who aren't on the list of bad debtors that you've got from the Accounts department's Excel spreadsheet.
Now of course all of this is possible with using only relational data with SQLIS, or even without using SQLIS and just using pure MDX. I guess the point of this is, as always, that it provides an easier way to do stuff: build MDX expressions without having to know much MDX, integrate AS data with other data and other applications without doing (much) coding, and so on.
So, as ever, I'd be interested in your comments on this. I have the distinct feeling that this is a solution in search of a problem... but if you can think of some problems it might solve, then let me know! 2006/3/2 Microsoft, BI and SearchIt's inevitable, when you get a whole bunch of new functionality as we have with SQL2005, that you start thinking of the new types of applications that become possible. One of the things I've been thinking about for a while is how you could take the results of an RSS feed or a search engine search, do text mining on the results and build a cube to analyse what comes back. Quite an interesting idea, I think, and I know plenty of other people have been thinking along the same lines too, eg
And it's not just in the Microsoft world that these ideas are cropping up. For example, only today I saw a reference to a (non-Microsoft) OLAP solution which built cubes from the results of text mining:
Anyway, on a different note, one of the fun things about blogging is all the rumours and snippets of information about new solutions coming soon, most of which I'm not really at liberty to discuss (not that I know much anyway). You get to put these snippets, rumours and other stuff you read on the web and put them together in a 1+1=3 operation... Here, for example, is a link that Jon-who-sits-next-to-me just sent which he saw on Slashdot:
How can Microsoft beat Google in the search game? There are some interesting hints on the second page of this article, for example:
He said that Microsoft's goal -- but not its initial offering -- would go beyond finding URLs and instead focus in on the specific information sought by Internet users. "Generally these days what you get back is URLs, and based upon research 50 percent of the time you do a search you don't get the URL you're looking for," he said. Holloway said that the promise of Microsoft's search capability is to dig down. For example, he said, potential home-buyers might find a group of houses in the price range and with the precise amenities they are seeking. Or a surfer might find a restaurant with the kind of menu a diner wants in a particular geographic area. Hmm, is it me or is there a potential BI angle here? Dig down == drill down, perhaps? Slice, dice and analyse your resultset rather than just get a flat list of links? I wonder...
UPDATE: Jon, bless his heart, has come up with another interesting link on this topic:
Don't you just love wild speculation? The whole Origami thing is so last week...
UPDATE#2: Now this could just be me reading way too much into something, but here's another relevant link:
There's a coincidence here that's too good to be true... 2005/11/16 Reward Beta Testers better!Before I even write this I know I'm going to sound ungrateful, whingeing, grasping, greedy and all sorts of other things, but here goes...
There was a SQL2005 launch event in London yesterday, and one of my colleagues from another team went along. He sent the link to register to the event to me and some others on my team, but we had a look at the agenda and we realised that we'd seen pretty much all of the material before so we didn't go. I started working with AS2005 in the Spring of last year, and when I joined my current team my colleagues had already been working with AS2005 for more than a year before; as a result we know quite a lot about AS2005 and as beta testers we found a lot of bugs. Anyway, I just spoke to the guy who went to the launch event this morning and he told me that he got a FREE copy of SQL2005 Standard Edition and a FREE copy of Visual Studio Professional just for turning up. And I admit that I was jealous. Yes I got a fleece when I went to my first Yukon airlift back in 2002(?), yes I got a Yukon t-shirt and picture frame last year, and yes I got a *lot* of inside knowledge, help and support as a beta tester which has been invaluable professionally and which I'm very, very grateful for, but copies of SQL2005 and VS are really quite tasty gifts. You'd think that Microsoft would at least treat beta testers as well as the people that go to launch events by giving them a free copy of the product they've been testing, wouldn't you? 2005/10/20 Usage-Based PartitioningI was reading Dave Wickert's excellent white paper "Project REAL: Analysis Services Technical Drilldown" the other day (you can get it here), specifically the section on P39 about partitioning. In it he discusses the new functionality in AS2005 which automatically determines which members from your dimensions have data in a given partition, and goes on to talk about the new possibilities this opens up in terms of partitioning strategy. Here's an excerpt:
The partitions in the Project REAL database seem to violate one of the basic best practices of SQL Server 2000. There is no data slice set for the partitions. In SQL Server 2000, partitions must have the data slice set so that the run-time engine knows which partition to access. This is similar to specifying a hint to a relational query optimizer. In SQL Server 2005, this is no longer necessary. Processing the partition now automatically builds a histogram-like structure in the MOLAP storage. This structure identifies which members from all dimensions are included in the partition. Thus, so long as the storage method is MOLAP, the data slice is an optional (and unused) property. However, the data slice is used with ROLAP storage or when proactive caching involves a ROLAP access phase. In both of these circumstances, the actual fact data is never moved so the system does not have a chance to identify a member. In this case, setting the data slice for the partition remains a necessary and critical step if you expect the system to perform well. Because the MOLAP structures dynamically determine the data slice, a new type of partitioning technique is possible with SQL Server 2005. The best way to describe this technique is via a simple example. Suppose a system that you are designing has a product dimension of 1,000 products. Of these, the top 5 products account for 80% of the sales (roughly evenly distributed). The remaining 995 products account for the other 20% of the sales. An analysis of the end-user query patterns show that analysis based on product is a common and effective partitioning scheme. For example, most of the reports include a breakdown by product. Based on this analysis, you create six partitions. You create one partition each for the top 5 products and then one “catchall” partition for the remainder. It is easy to create a catchall partition. In the query binding, add a WHERE clause to the SQL statement as in the following code. In the top five partitions (1 through 5) use the following code. SELECT * FROM <fact table> In the catchall partition use the following code. SELECT * FROM <fact table> <SK_3rdTopProduct#> <SK_4thTopProduct#> <SK_5thTopProduct#>) This technique requires a lot of administrative overhead in SQL Server 2000 Analysis Services. In SQL Server 2000, the data slice must identify each and every member in the partition—even if there are thousands and thousands of members. To implement the example, you would need to create the catchall partition data slice with 995 members in it. This is in addition to the administrative challenge of updating that list as new members are added to the dimension. In SQL Server 2005 Analysis Services, the automatic building of the data slice in the partition eliminates the administrative overhead. This got me thinking... if we've got a Usage-Based Optimisation wizard for helping design the right aggregations for a cube, surely it's possible to do something similar so that we can design partitions on the basis of the queries that users actually run? Here's an idea on how it might work (nb this would be a strategy to use in addition to partitioning by Time, Store or other 'obvious' slices rather than a replacement):
What does everyone think? There seems to be a lot of activity these days in the comments section of my blog, so I thought I'd invite feedback. Can anyone see a fatal flaw in this approach? 2005/6/17 LinkShare: my idea for a $50000-prize winning appAs I said, I don't have the time to enter the Connected Systems Developer competition that I blogged about the other week, but that hasn't stopped me thinking about what I might build if I did enter. The following idea came to me at around 2am this morning when I was desperately trying to get my 22-month-old daughter to go back to sleep, and having nothing better to do this afternoon I thought I'd bounce it off anyone reading my blog. So comments are invited - even if they are just to say that it's a rubbish idea and/or someone's thought of it before and/or it'll never work. I have after all categorised this post under 'Random Thoughts'! Business Case: In the modern office everyone does a lot of web surfing; some of it might even be business-related. And whenever we see something interesting we typically copy the link into a mail, add a few words of explanation and send it on to a few people who might also want to have a look. I send at least two or three such emails a day. For the typically lazy web surfer, though, this process is a bit of a hassle so we only bother to do it when we think the link is really interesting and (because we don't want to get a reputation as the office spammer) we only send it to a small number of people we know who we think are going to find it interesting too. It's my contention that it would be cool if we could share more of these links with more people. So, we need to solve three problems in our quest to share the interesting links we find during our daily surfing: Of course there are plenty of existing ways that people share links, such as newsgroups, email discussion lists and blogs but they typically only address the third of the above problems fully, the second only partially and the first not very well at all. For instance, anyone reading my blog is presumably doing so because they're interested in Microsoft's BI tools and they're going to be interested in any links to webcasts, articles etc that I post up, but if they're like me they subscribe to upwards of a hundred rss feeds - and that's only on subjects they're really interested in - so we still have the proverbial information overload. The same goes for email discussion lists and newsgroups. And in all these cases in order to share information you have to open an email, write a blog post etc, which all require effort. Let me give you an idea of the kind of scenario I want to tackle. This morning I was reading this story on the Register, and followed a link on a whim to this page, a set of pictures of Cybermen with funny captions. It brought a smile to my face but I didn't send it on to anyone else because a) it didn't seem worth the bother, and b) I didn't know whether any of the people I usually send stuff onto were at least mildly into Dr Who in the way I am. I'm not going to blog about it because it's not relevant to BI, I don't subscribe to any Dr Who blogs, dls or newsgroups because I'm not that much of a Dr Who fan, and so no-one else is going to see it. Which is a shame. Functional Spec: Anyway, enough waffle about the theory. The solution I'm thinking of would consist of something like the following:
So, in practice, let's imagine it working as follows. Chris, Jon and Colin all work in a large corporation, in the same team doing the same kind of BI stuff. During his morning surfing, Chris submits 5-10 links; one, on a new feature of MDX, gets recommended automatically to Jon and Colin because everyone in the team works with MDX and has submitted MDX-related pages in the past. One, containing pictures of Cybermen with amusing comments, gets recommended only to Colin and only appears about halfway down his list because he's a bit of a sci-fi fan and has submitted a few sci-fi links in the past. Meanwhile, David, who works in a different team and doesn't know Chris, Jon or Colin finds a cool article on C-Omega and submits it so it gets recommended to the rest of his team; they all in turn click their buttons and so it eventually appears at the top of Jon's list (because he's really into coding) and somewhere down the list for Chris (because he's not so into coding, but this is a really cool article nonetheless). The larger the number of users with similar taste, the better it should work – more links submitted plus more people voting on the same links, and so the mining models can get to know people’s tastes much more quickly. I could imagine it doing well as an intranet app at a large tech company. It would probably need to give more priority to newer links (people want the latest stuff, and you don’t want old but popular links clogging up your recommendations) and maybe have some way of removing links you’ve already seen from your list of recommendations. One other extra feature that occurred to me was that the app could also generate a report showing the users who submitted the most interesting links, so as to generate a bit of rivalry and encourage future usage. The key to it all though is the fact that all you need to do to submit a page is click a button in IE - the absolute minimum effort possible - and the fact that the job of the mining model is clear - recommend a page which will make you click your button in turn. Technology: It should be fairly straightforward to build the toolbar and the web service. Qualification for the competition comes with the use of SQL 2005 for storing all the data, SQLIS to do the processing, AS to do the data mining, and RS to do the web-based reports, daily email, even the rss feed (maybe as a custom rendering extension?). I'll admit that I don't know enough about data mining to know whether that bit will really work, but hey, it might.
OK, enough fantasising. If anyone does implement this and enters the competition, please can I have a share of the winnings?
2005/1/24 Grouping in MDX - response to MoshaI'm honoured by my mention in Mosha's blog! But I think my entry on Grouping in MDX, and Mosha's comments on it, need some further clarification and discussion. First of all, CREATE SESSION CUBE. I agree it is probably the best way to implement grouping at the moment and that it works well in Excel, but it's not ideal:
Secondly, to address Mosha's point on why you would need the VisualTotals and Aggregate functions in the same query: it's because you'd want to use your new group member in any scenario where you could use a normal member, and that includes a query which used VisualTotals. Imagine you had a measure which showed the distinct count of customers across all your stores, and you wanted a report which had a) a single group member containing your top 5 stores, b) several other individual stores and c) showed the visualtotal of all the distinct customers in both the group and the individual stores displayed. I think that would be a reasonable requirement and one which wouldn't be possible unless AS 'knew' what members went into the group. Thirdly, sets in the WHERE clause (and also subcubes in the FROM clause, which do the same thing) inYukon. Unfortunately, this only works when you're slicing by the group and not when the group is on a visible axis so it doesn't fit the scenario I was describing. Overall, then, CREATE SESSION CUBE is almost the functionality that I want but it doesn't allow groups to be defined on the server. So we're close...! 2005/1/12 Grouping members togetherOne of the weaknesses of Analysis Services, in my opinion, is support for creating custom groupings of members. I reckon that 90% of all calculated members on non-measures dimensions must be doing just this, ie just doing an AGGREGATE or SUM over a set of members, and yes calculated members will return the right values but my complaint is something else. It's that you then have no idea what members were aggregated together inside this calculated member, and that functions like VISUALTOTALS, NONEMPTYCROSSJOIN etc that you would like to be 'group aware' of course aren't. Some examples needed, I think... Consider the following query on Foodmart 2000: WITH MEMBER [Customers].[All Customers].[USA].DEMO AS 'AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})' Wouldn't it be nice, then if VISUALTOTALS 'knew' what was in the set and this query WITH MEMBER [Customers].[All Customers].[USA].DEMO AS 'AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})' returned the same results as this query? WITH MEMBER [Customers].[All Customers].[USA].DEMO AS 'AGGREGATE({[Customers].[All Customers].[USA].[CA], [Customers].[All Customers].[USA].[OR]})' And that when you did a NONEMPTYCROSSJOIN against your calculated member, it would return the same results as when you did a NONEMPTYCROSSJOIN against the set that was aggregated in the calculated member? And perhaps also that you could drill down from the calculated member to see the members inside it? Of course this isn't possible at the moment, because a calculated member could contain any sort of calculation, so AS simply can't make any assumptions. But if there was a special kind of group calculated member, which simply took a set of members as its definition and which always returned an AGGREGATE of that set, surely AS could make these assumptions? Just a thought... |
|
|