Chris's profileChris Webb's BI BlogBlogLists Tools Help

Blog


    5/19/2007

    Using Linear Regression to Calculate Growth

    A few blog entries back I showed the MDX I used to calculate a seasonally-adjusted total in my chalk talk at the BI conference. This is useful but if we're looking for a calculation that we can use for the Trend property of a KPI it's not the whole story - we still need to find a way of expressing how much a value is growing or shrinking over time. Although previous period growth calculations are a lot more useful with seasonally-adjusted values, we can use simple linear regression (and it has to be simple because, as I said, I'm no statistician) to do a better job.

    The starting point for understanding how to use linear regression in MDX is (surprise, surprise) Mosha's blog entry on the subject:

    http://sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx

    However, the function that's going to be most useful here is the linregslope function. If we're looking at the values in our time series and trying to find a line of best fit for those values with the equation y=ax+b, linregslope returns the value of a in that equation, ie the gradient - when the value of x increases by 1, y increases by the value of a. Here's an example of how to use it:

    with member measures.gradient as
    linregslope(
    lastperiods(3, [Date].[Calendar].currentmember) as last3
    , [Measures].[Internet Sales Amount]
    ,rank([Date].[Calendar].currentmember, last3)
    )


    select {[Measures].[Internet Sales Amount], measures.gradient} on 0,
    [Date].[Calendar].[Month].members on 1
    from [Adventure Works]

    The trick with using this function in MDX with a time series is to be able to work out what values you want to pass in for the x axis. Here I've used the lastperiods function to get a set containing the current member on the Calendar hierarchy, the previous member on the Calendar hierarchy and the member before that, in the first parameter of the function; at the same time I've declared a named set and then used that with a rank function in the third parameter to return the values 1, 2 and 3 for each of these three members.

    This gets us the slope, then, but I was thinking it would be better to express this value as a percentage - but of what? The current period's value? Or one of the preceding two periods values? I have to admit I don't know which would be correct. Can someone help me out here? Please leave a comment..

    Comments (1)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    Steve Brownwrote:
    Business speak is typically annual growth, so the trailing 12 periods, how much have I grown in the last 12 months?
    May 26

    Trackbacks

    The trackback URL for this entry is:
    http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1201.trak
    Weblogs that reference this entry
    • None