Loading ...
Sorry, an error occurred while loading the content.

statistics books

Expand Messages
  • Jim Sterne
    I tend to cover theory and strategy and have successfully avoided digging deep into statistics. What are your favorite books on statistical analysis
    Message 1 of 17 , Jun 28, 2007
    • 0 Attachment
      I tend to cover theory and strategy and have successfully
      avoided digging deep into statistics. What are your favorite
      books on statistical analysis techniques? How about database
      marketing/analytical CRM books that cover the quantitative part?


      ------------------------------------------------------
      Emetrics Summit, The Web Analytics Conference
      Marketing Optimization Summit
      Washington D.C., Oct 14-17
      http://www.emetrics.org
      -----------------------------------------------------
      Jim Sterne <jsterne@...>
      http://www.targeting.com +1-805-965-3184
      Chairman, http://www.WebAnalyticsAssociation.org

      [Non-text portions of this message have been removed]
    • Mike Bradley
      Jim. Try Discovering Statistics Using SPSS by Andy Field for univariate statistics and Multivariate Data Analysis by Jospeh Hair for more complex
      Message 2 of 17 , Jun 29, 2007
      • 0 Attachment
        Jim.

        Try "Discovering Statistics Using SPSS" by Andy Field for univariate
        statistics and
        "Multivariate Data Analysis" by Jospeh Hair for more complex multivariate
        stats. They are both great books.

        Mike



        On 6/28/07, Jim Sterne <jsterne@...> wrote:
        >
        > I tend to cover theory and strategy and have successfully
        > avoided digging deep into statistics. What are your favorite
        > books on statistical analysis techniques? How about database
        > marketing/analytical CRM books that cover the quantitative part?
        >
        > ------------------------------------------------------
        > Emetrics Summit, The Web Analytics Conference
        > Marketing Optimization Summit
        > Washington D.C., Oct 14-17
        > http://www.emetrics.org
        > -----------------------------------------------------
        > Jim Sterne <jsterne@... <jsterne%40targeting.com>>
        > http://www.targeting.com +1-805-965-3184
        > Chairman, http://www.WebAnalyticsAssociation.org<http://www.webanalyticsassociation.org/>
        >
        > [Non-text portions of this message have been removed]
        >
        >
        >


        [Non-text portions of this message have been removed]
      • Janet Park
        The following books go beyond mere statistics into the heavy lifting of Data Mining and data modeling. Take these to the beach with you this summer and no
        Message 3 of 17 , Jun 29, 2007
        • 0 Attachment
          The following books go beyond mere statistics into the heavy lifting of
          Data Mining and data modeling. Take these to the beach with you this
          summer and no quantitative bullies from Cal Tech will kick sand in your
          face!

          Mastering Data Mining: The Art and Science of Customer Relationship
          Management, by Michael J.A. Berry and Gordon Linoff, John Wiley & Sons,
          2000.

          Data Mining Techniques For Marketing, Sales, and Customer Support, by
          Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, 1997.

          Both books offer slightly different twists on the subject and I'd
          recommend the pair, even though there is some redundancy. The authors
          are practicing Data Mining Consultants and their real world experience
          shows. They not only tell "how to," but even more important, "how not
          to." Here's a bold example from Mastering Data Mining:

          "Is the Data Mining Effort Necessary?

          A Senior Vice President in the credit card group of a large bank has
          spent tens of thousands of dollars developing a response model. This
          predictive model is designed to identify the porpsects who are most
          likely to respond to the bank's next offering. The VP is told that by
          using the model, she can save money; using only 20 percent of the
          prospect list will yield 70 percent of the responders. However, despite
          these findings, she replies that she wants every single responder -- not
          just some of them. Getting every responder requires using the entire
          prospect list, since no model is perfect. In this case, data mining is
          not necessary.

          Moral: She could have saved tens of thousands of dollars by not building
          predictive models in the first place.

          I keep both handy as references and learn something each time I pick
          them up, even though I've been practicing in this field for over 15
          years. You don't need to be a statistical guru to understand them --
          just skip over the hairy details if you prefer an "executive's" approach
          to the subject.



          [Non-text portions of this message have been removed]
        • nevertrustab
          Hi Janet, those books sound interesting. Actually data mining is what I had wanted to get into before I found out about the web :-). I read lots of stuff
          Message 4 of 17 , Jun 29, 2007
          • 0 Attachment
            Hi Janet,

            those books sound interesting. Actually data mining is what I had
            wanted to get into before I "found out" about the web :-).

            I read lots of stuff about data mining and actually I've asked
            myself this question before, too:

            Everybody talks about it and all...but is it really effective?

            From what I read in data mining theres a lot of the typical tech-guy
            doesnt understand business guy. business guy thinks tech guy is not
            necessary - thing going on lol. (probably not the only
            interdisciplinary field which has that problem).

            To be honest, I still don't know how effective "data mining" really
            is and whether it's simply hyped up (I mean it IS sort of a cool
            buzz-word that aims to make the dry-sounding "statistics" more
            exciting...when statistics is the main part of it (at least Ive been
            told so)).

            But I would argue, that just because of one example where it was a
            waste of resources that doesn't mean it is generally a waste of
            resources. For example..surprise surprise...most of the time when I
            read about "data mining" they cited mostly (exclusively?) cases with
            extremely positive results.

            Chances are if we took a (statistically significant ;)) sample of
            cases and looked how many times it worked and how many times it
            didn't work we'll find many cases where it worked wonders and many
            cases where it didn't work at all. (I dont dare make an assumption
            which cases are more frequent/if the overall situation has a plus or
            a minus).

            However, I would guess that there are quite a few (a lot?) of
            companies for which analytical CRM works out well as it seems to
            have become quite an established field. Actually I read a couple of
            days ago that "data mining had its biggest success in business in
            the field of CRM" - whether that is a good thing or not..I cant
            answer that ;) but it sounds like it does work for some companies
            out there.

            So all in all..I guess it's sort of like saying web analytics work
            (dont work), SEO works (doesnt work). We can't really make a
            statement whether something is effective or not unless we have a big
            enough sample of it as there'll always be cases where it works and
            where it doesnt. The matter is just in how many of those cases does
            it work/ doesnt it work? And how big are the benefits if it does
            work vs. the losses when it doesn't work?

            I wrote more than I thought I would.. once again ;) but Im sure you
            know that we shouldnt make a conclusion for a whole field based on
            one case..I just meant to point this out for data mining as there
            seems to be a lot of talk about that.


            --- In webanalytics@yahoogroups.com, "Janet Park" <jparkmfi@...>
            wrote:
            >
            >
            > The following books go beyond mere statistics into the heavy
            lifting of
            > Data Mining and data modeling. Take these to the beach with you
            this
            > summer and no quantitative bullies from Cal Tech will kick sand in
            your
            > face!
            >
            > Mastering Data Mining: The Art and Science of Customer Relationship
            > Management, by Michael J.A. Berry and Gordon Linoff, John Wiley &
            Sons,
            > 2000.
            >
            > Data Mining Techniques For Marketing, Sales, and Customer Support,
            by
            > Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, 1997.
            >
            > Both books offer slightly different twists on the subject and I'd
            > recommend the pair, even though there is some redundancy. The
            authors
            > are practicing Data Mining Consultants and their real world
            experience
            > shows. They not only tell "how to," but even more important, "how
            not
            > to." Here's a bold example from Mastering Data Mining:
            >
            > "Is the Data Mining Effort Necessary?
            >
            > A Senior Vice President in the credit card group of a large bank
            has
            > spent tens of thousands of dollars developing a response model.
            This
            > predictive model is designed to identify the porpsects who are most
            > likely to respond to the bank's next offering. The VP is told that
            by
            > using the model, she can save money; using only 20 percent of the
            > prospect list will yield 70 percent of the responders. However,
            despite
            > these findings, she replies that she wants every single responder -
            - not
            > just some of them. Getting every responder requires using the
            entire
            > prospect list, since no model is perfect. In this case, data
            mining is
            > not necessary.
            >
            > Moral: She could have saved tens of thousands of dollars by not
            building
            > predictive models in the first place.
            >
            > I keep both handy as references and learn something each time I
            pick
            > them up, even though I've been practicing in this field for over 15
            > years. You don't need to be a statistical guru to understand them -
            -
            > just skip over the hairy details if you prefer an "executive's"
            approach
            > to the subject.
            >
            >
            >
            > [Non-text portions of this message have been removed]
            >
          • nevertrustab
            I keep both handy as references and learn something each time I pick them up, even though I ve been practicing in this field for over 15 years. SILLY ME!
            Message 5 of 17 , Jun 29, 2007
            • 0 Attachment
              "I keep both handy as references and learn something each time I pick
              them up, even though I've been practicing in this field for over 15
              years."

              SILLY ME! Dont tell me you meant to say youre practicing in the
              field of *data mining*? (I thought you were talking more about web
              analytics)

              --- In webanalytics@yahoogroups.com, "Janet Park" <jparkmfi@...>
              wrote:
              >
              >
              > The following books go beyond mere statistics into the heavy
              lifting of
              > Data Mining and data modeling. Take these to the beach with you
              this
              > summer and no quantitative bullies from Cal Tech will kick sand in
              your
              > face!
              >
              > Mastering Data Mining: The Art and Science of Customer Relationship
              > Management, by Michael J.A. Berry and Gordon Linoff, John Wiley &
              Sons,
              > 2000.
              >
              > Data Mining Techniques For Marketing, Sales, and Customer Support,
              by
              > Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, 1997.
              >
              > Both books offer slightly different twists on the subject and I'd
              > recommend the pair, even though there is some redundancy. The
              authors
              > are practicing Data Mining Consultants and their real world
              experience
              > shows. They not only tell "how to," but even more important, "how
              not
              > to." Here's a bold example from Mastering Data Mining:
              >
              > "Is the Data Mining Effort Necessary?
              >
              > A Senior Vice President in the credit card group of a large bank
              has
              > spent tens of thousands of dollars developing a response model.
              This
              > predictive model is designed to identify the porpsects who are most
              > likely to respond to the bank's next offering. The VP is told that
              by
              > using the model, she can save money; using only 20 percent of the
              > prospect list will yield 70 percent of the responders. However,
              despite
              > these findings, she replies that she wants every single responder -
              - not
              > just some of them. Getting every responder requires using the
              entire
              > prospect list, since no model is perfect. In this case, data
              mining is
              > not necessary.
              >
              > Moral: She could have saved tens of thousands of dollars by not
              building
              > predictive models in the first place.
              >
              > I keep both handy as references and learn something each time I
              pick
              > them up, even though I've been practicing in this field for over 15
              > years. You don't need to be a statistical guru to understand them -
              -
              > just skip over the hairy details if you prefer an "executive's"
              approach
              > to the subject.
              >
              >
              >
              > [Non-text portions of this message have been removed]
              >
            • nevertrustab
              I ve looked at these books and they have great reviews. However, one thing I d still like to ask: What is really the difference between univariate and
              Message 6 of 17 , Jul 2, 2007
              • 0 Attachment
                I've looked at these books and they have great reviews.

                However, one thing I'd still like to ask:

                What is really the difference between univariate and multivariate
                statistics/data analysis?

                I know what a multiple regression is and all the other basics I
                learned in college (in a business degree), but..

                is multiple regression = multivariate statistics (as opposed to a
                regression with only one variable that can be changed)?

                or does multivariate statistics refer to changing multiple variables
                at the same time? I tried reading up a bit on this and as for
                multivariate testing in web analytics (for example) I read that it
                was about testing multiple variables at the same time, b/c it's more
                time-efficient, etc. but how can this be done? Is that really
                possible and is that what multivariate data analysis is about (not
                just having multiple variables like in a multiple regression but
                analyzing the effects of some of them at the same time?)?

                Something else I'm curious about:

                I came across this study in an SEO forum:

                http://live.psu.edu/story/24878

                There are 4 search queries, 4 different search engines (all of which
                display Google's results) and 32 participiants.

                The study was conducted in a way that they took the results from
                Google and displayed them on yahoo's msn's and an inhouse search
                engine as well as the one of Google.

                I assume each of the person only saw one query per search engine
                (per search engine's design to be correct) because otherwise they
                would have obviously caught on if they had seen the same results
                four times for each query.

                I guess this study is completely ridiculous because the data sample
                is so small, but what I was wondering is this:

                How do you determine the sample size of this? Is the sample size =
                32 (32 different individuals) or is it 32*4 b/c of 32*4 different
                results (4 queries for each of the 32 individuals)?

                On the one hand 32*4 would seem logical to me but on the other hand
                only the different individuals are probably statistically
                independent from one another.

                Does somebody know which one would be right? (My guess is 32*4 but
                im not sure)...and are things like these (rather basic..) broken
                down into one of the two books?

                thank you!

                --- In webanalytics@yahoogroups.com, "Mike Bradley"
                <michaeljohnbradley@...> wrote:
                >
                > Jim.
                >
                > Try "Discovering Statistics Using SPSS" by Andy Field for
                univariate
                > statistics and
                > "Multivariate Data Analysis" by Jospeh Hair for more complex
                multivariate
                > stats. They are both great books.
                >
                > Mike
                >
                >
                >
                > On 6/28/07, Jim Sterne <jsterne@...> wrote:
                > >
                > > I tend to cover theory and strategy and have successfully
                > > avoided digging deep into statistics. What are your favorite
                > > books on statistical analysis techniques? How about database
                > > marketing/analytical CRM books that cover the quantitative part?
                > >
                > > ------------------------------------------------------
                > > Emetrics Summit, The Web Analytics Conference
                > > Marketing Optimization Summit
                > > Washington D.C., Oct 14-17
                > > http://www.emetrics.org
                > > -----------------------------------------------------
                > > Jim Sterne <jsterne@... <jsterne%40targeting.com>>
                > > http://www.targeting.com +1-805-965-3184
                > > Chairman,
                http://www.WebAnalyticsAssociation.org<http://www.webanalyticsassocia
                tion.org/>
                > >
                > > [Non-text portions of this message have been removed]
                > >
                > >
                > >
                >
                >
                > [Non-text portions of this message have been removed]
                >
              • Michael Wexler
                ... At the end of the day, its very simple. You test multiple variables to discover two related but different things: 1) What ELSE can I change about my
                Message 7 of 17 , Jul 2, 2007
                • 0 Attachment
                  My comments inline below:

                  --- In webanalytics@yahoogroups.com, "nevertrustab" <patriccc@...> wrote:
                  >
                  > I've looked at these books and they have great reviews.
                  >
                  > However, one thing I'd still like to ask:
                  >
                  > What is really the difference between univariate and multivariate
                  > statistics/data analysis?
                  >
                  > I know what a multiple regression is and all the other basics I
                  > learned in college (in a business degree), but..
                  >
                  > is multiple regression = multivariate statistics (as opposed to a
                  > regression with only one variable that can be changed)?
                  >
                  > or does multivariate statistics refer to changing multiple variables
                  > at the same time? I tried reading up a bit on this and as for
                  > multivariate testing in web analytics (for example) I read that it
                  > was about testing multiple variables at the same time, b/c it's more
                  > time-efficient, etc. but how can this be done? Is that really
                  > possible and is that what multivariate data analysis is about (not
                  > just having multiple variables like in a multiple regression but
                  > analyzing the effects of some of them at the same time?)?

                  At the end of the day, its very simple. You test multiple variables
                  to discover two related but different things:
                  1) What ELSE can I change about my site/stimulus other than the price
                  to impact eventual behavior? (everyone tests price first, it seems)
                  2) What COMBINATION of things makes the most impact in eventual behavior?

                  Univariate testing tests one thing at a time. So, you can indeed
                  answer number 1 above with univariate testing, after you test price,
                  to see what else has impact. This happens serially, or in parallel if
                  you can split your exposed population (the visitors to the site, the
                  email you send out, whatever).

                  But number 2, the combination, is what Multivariate is designed for.
                  Now, you can do it easy (no interactions) or hard (interactions). If
                  you do it easy, you can discover that some combinations of your
                  variables appear to work better than others, but they are still
                  independent. That is, you are just summing the effects of the 2
                  variables, running them together but analytically splitting out the
                  impacts.

                  The hard one says, well, its not just enough to know both variables at
                  the same time; what if one variable impacts or moderates the effect of
                  the other? For example, at high prices, color may make a huge
                  difference on the page, but at low prices, color doesn't matter. This
                  interaction lets you know that color and price are not independent
                  variables but are related in their impact on purchase.

                  MV testing is the only way to get at that. And yet, do you need to
                  know this? Well, if the impact of color is so minimal, or you don't
                  really plan to change the color long term, then no, perhaps basic
                  testing is fine. But most analysts like to dig a bit. It all depends
                  on time, resources, and the business question you need to answer.

                  >
                  > Something else I'm curious about:
                  >
                  > I came across this study in an SEO forum:
                  >
                  > http://live.psu.edu/story/24878
                  >
                  > There are 4 search queries, 4 different search engines (all of which
                  > display Google's results) and 32 participiants.
                  >
                  > The study was conducted in a way that they took the results from
                  > Google and displayed them on yahoo's msn's and an inhouse search
                  > engine as well as the one of Google.
                  >
                  > I assume each of the person only saw one query per search engine
                  > (per search engine's design to be correct) because otherwise they
                  > would have obviously caught on if they had seen the same results
                  > four times for each query.
                  >
                  > I guess this study is completely ridiculous because the data sample
                  > is so small, but what I was wondering is this:
                  >
                  > How do you determine the sample size of this? Is the sample size =
                  > 32 (32 different individuals) or is it 32*4 b/c of 32*4 different
                  > results (4 queries for each of the 32 individuals)?
                  >
                  > On the one hand 32*4 would seem logical to me but on the other hand
                  > only the different individuals are probably statistically
                  > independent from one another.
                  >
                  > Does somebody know which one would be right? (My guess is 32*4 but
                  > im not sure)...and are things like these (rather basic..) broken
                  > down into one of the two books?


                  The issue is the level of analysis. I suspect each person saw all 4
                  queries (say, for yacht, bill gates, soccer, norway), but the "engine
                  design" was randomized. This would mean that the person count was 32,
                  but the total number of observations was probably 128. This allows
                  them to remove personal bias (the same person is tracked across 4
                  queries), query bias (32 people saw each query), and engine bias (with
                  4 engines, you have 8 obs per engine).

                  Yes, this is a small sample size in some ways, but sample size matters
                  more when the effect you are looking for is small, or there is high
                  variability in the measures you are using. I suspect neither of these
                  were the case here (clear google bias is well known, and the metrics
                  were probably 7 pt likert scales on trust, reliability, etc.

                  Coda: Here is the link to the study:
                  http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen_branding_of_search_engines.pdf
                  I haven't read it yet, so we can all look at it and laugh at me if I
                  got it
                  all wrong!

                  Michael
                • Mike Bradley
                  Univariate statistical methods test a single relationship and multivariate or multivariable statistical methods test more than one relationship at the same
                  Message 8 of 17 , Jul 2, 2007
                  • 0 Attachment
                    Univariate statistical methods test a single relationship and multivariate
                    or multivariable statistical methods test more than one relationship at the
                    same time. Multiple regression is one of many multivariate methods.

                    Mike


                    On 7/2/07, Michael Wexler <wexler@...> wrote:
                    >
                    > My comments inline below:
                    >
                    > --- In webanalytics@yahoogroups.com <webanalytics%40yahoogroups.com>,
                    > "nevertrustab" <patriccc@...> wrote:
                    > >
                    > > I've looked at these books and they have great reviews.
                    > >
                    > > However, one thing I'd still like to ask:
                    > >
                    > > What is really the difference between univariate and multivariate
                    > > statistics/data analysis?
                    > >
                    > > I know what a multiple regression is and all the other basics I
                    > > learned in college (in a business degree), but..
                    > >
                    > > is multiple regression = multivariate statistics (as opposed to a
                    > > regression with only one variable that can be changed)?
                    > >
                    > > or does multivariate statistics refer to changing multiple variables
                    > > at the same time? I tried reading up a bit on this and as for
                    > > multivariate testing in web analytics (for example) I read that it
                    > > was about testing multiple variables at the same time, b/c it's more
                    > > time-efficient, etc. but how can this be done? Is that really
                    > > possible and is that what multivariate data analysis is about (not
                    > > just having multiple variables like in a multiple regression but
                    > > analyzing the effects of some of them at the same time?)?
                    >
                    > At the end of the day, its very simple. You test multiple variables
                    > to discover two related but different things:
                    > 1) What ELSE can I change about my site/stimulus other than the price
                    > to impact eventual behavior? (everyone tests price first, it seems)
                    > 2) What COMBINATION of things makes the most impact in eventual behavior?
                    >
                    > Univariate testing tests one thing at a time. So, you can indeed
                    > answer number 1 above with univariate testing, after you test price,
                    > to see what else has impact. This happens serially, or in parallel if
                    > you can split your exposed population (the visitors to the site, the
                    > email you send out, whatever).
                    >
                    > But number 2, the combination, is what Multivariate is designed for.
                    > Now, you can do it easy (no interactions) or hard (interactions). If
                    > you do it easy, you can discover that some combinations of your
                    > variables appear to work better than others, but they are still
                    > independent. That is, you are just summing the effects of the 2
                    > variables, running them together but analytically splitting out the
                    > impacts.
                    >
                    > The hard one says, well, its not just enough to know both variables at
                    > the same time; what if one variable impacts or moderates the effect of
                    > the other? For example, at high prices, color may make a huge
                    > difference on the page, but at low prices, color doesn't matter. This
                    > interaction lets you know that color and price are not independent
                    > variables but are related in their impact on purchase.
                    >
                    > MV testing is the only way to get at that. And yet, do you need to
                    > know this? Well, if the impact of color is so minimal, or you don't
                    > really plan to change the color long term, then no, perhaps basic
                    > testing is fine. But most analysts like to dig a bit. It all depends
                    > on time, resources, and the business question you need to answer.
                    >
                    > >
                    > > Something else I'm curious about:
                    > >
                    > > I came across this study in an SEO forum:
                    > >
                    > > http://live.psu.edu/story/24878
                    > >
                    > > There are 4 search queries, 4 different search engines (all of which
                    > > display Google's results) and 32 participiants.
                    > >
                    > > The study was conducted in a way that they took the results from
                    > > Google and displayed them on yahoo's msn's and an inhouse search
                    > > engine as well as the one of Google.
                    > >
                    > > I assume each of the person only saw one query per search engine
                    > > (per search engine's design to be correct) because otherwise they
                    > > would have obviously caught on if they had seen the same results
                    > > four times for each query.
                    > >
                    > > I guess this study is completely ridiculous because the data sample
                    > > is so small, but what I was wondering is this:
                    > >
                    > > How do you determine the sample size of this? Is the sample size =
                    > > 32 (32 different individuals) or is it 32*4 b/c of 32*4 different
                    > > results (4 queries for each of the 32 individuals)?
                    > >
                    > > On the one hand 32*4 would seem logical to me but on the other hand
                    > > only the different individuals are probably statistically
                    > > independent from one another.
                    > >
                    > > Does somebody know which one would be right? (My guess is 32*4 but
                    > > im not sure)...and are things like these (rather basic..) broken
                    > > down into one of the two books?
                    >
                    > The issue is the level of analysis. I suspect each person saw all 4
                    > queries (say, for yacht, bill gates, soccer, norway), but the "engine
                    > design" was randomized. This would mean that the person count was 32,
                    > but the total number of observations was probably 128. This allows
                    > them to remove personal bias (the same person is tracked across 4
                    > queries), query bias (32 people saw each query), and engine bias (with
                    > 4 engines, you have 8 obs per engine).
                    >
                    > Yes, this is a small sample size in some ways, but sample size matters
                    > more when the effect you are looking for is small, or there is high
                    > variability in the measures you are using. I suspect neither of these
                    > were the case here (clear google bias is well known, and the metrics
                    > were probably 7 pt likert scales on trust, reliability, etc.
                    >
                    > Coda: Here is the link to the study:
                    >
                    > http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen_branding_of_search_engines.pdf
                    > I haven't read it yet, so we can all look at it and laugh at me if I
                    > got it
                    > all wrong!
                    >
                    > Michael
                    >
                    >
                    >


                    [Non-text portions of this message have been removed]
                  • justo_ibarra
                    Dear Patric: I m agree with both Mike and Michael about whats univariate and multivariate analyses. Basically you can say that univariate is to test the impact
                    Message 9 of 17 , Jul 3, 2007
                    • 0 Attachment
                      Dear Patric:

                      I'm agree with both Mike and Michael about whats univariate and
                      multivariate analyses.
                      Basically you can say that univariate is to test the impact of one
                      single variable on the results and that multivariate is to test the
                      interaction between many variables on the results.

                      On the sample size calculation you can take a more practical approach
                      just establishing a theoretical number of cases. How it works? in
                      your problem we have 4 different search engines and 4 different
                      search terms that produces 4*4=16 categories; so you need to fulfill
                      every category with an appropiate number of cases in order to reach
                      some conclusion about it.
                      So whats the number? in the minimun I say about 30 cases in each
                      category wich means a small sample in every category (N=480).

                      On my criteria the appropiate statistical method will be MANOVA
                      (Multiple Analyses of Variance) that allows you to compare the impact
                      of many independent variables (both numeric or cathegorical) over a
                      single independent cuantitative variable (I supose may be you use
                      CTR%).
                      Basically it performs comparing between medias for every group and
                      stablishing statisticall differences for it.

                      I hope you find usefull my comments.

                      Regards,

                      Justo Ibarra
                      jibarra@...



                      --- In webanalytics@yahoogroups.com, "Mike Bradley"
                      <michaeljohnbradley@...> wrote:
                      >
                      > Univariate statistical methods test a single relationship and
                      multivariate
                      > or multivariable statistical methods test more than one
                      relationship at the
                      > same time. Multiple regression is one of many multivariate methods.
                      >
                      > Mike
                      >
                      >
                      > On 7/2/07, Michael Wexler <wexler@...> wrote:
                      > >
                      > > My comments inline below:
                      > >
                      > > --- In webanalytics@yahoogroups.com <webanalytics%
                      40yahoogroups.com>,
                      > > "nevertrustab" <patriccc@> wrote:
                      > > >
                      > > > I've looked at these books and they have great reviews.
                      > > >
                      > > > However, one thing I'd still like to ask:
                      > > >
                      > > > What is really the difference between univariate and
                      multivariate
                      > > > statistics/data analysis?
                      > > >
                      > > > I know what a multiple regression is and all the other basics I
                      > > > learned in college (in a business degree), but..
                      > > >
                      > > > is multiple regression = multivariate statistics (as opposed to
                      a
                      > > > regression with only one variable that can be changed)?
                      > > >
                      > > > or does multivariate statistics refer to changing multiple
                      variables
                      > > > at the same time? I tried reading up a bit on this and as for
                      > > > multivariate testing in web analytics (for example) I read that
                      it
                      > > > was about testing multiple variables at the same time, b/c it's
                      more
                      > > > time-efficient, etc. but how can this be done? Is that really
                      > > > possible and is that what multivariate data analysis is about
                      (not
                      > > > just having multiple variables like in a multiple regression but
                      > > > analyzing the effects of some of them at the same time?)?
                      > >
                      > > At the end of the day, its very simple. You test multiple
                      variables
                      > > to discover two related but different things:
                      > > 1) What ELSE can I change about my site/stimulus other than the
                      price
                      > > to impact eventual behavior? (everyone tests price first, it
                      seems)
                      > > 2) What COMBINATION of things makes the most impact in eventual
                      behavior?
                      > >
                      > > Univariate testing tests one thing at a time. So, you can indeed
                      > > answer number 1 above with univariate testing, after you test
                      price,
                      > > to see what else has impact. This happens serially, or in
                      parallel if
                      > > you can split your exposed population (the visitors to the site,
                      the
                      > > email you send out, whatever).
                      > >
                      > > But number 2, the combination, is what Multivariate is designed
                      for.
                      > > Now, you can do it easy (no interactions) or hard (interactions).
                      If
                      > > you do it easy, you can discover that some combinations of your
                      > > variables appear to work better than others, but they are still
                      > > independent. That is, you are just summing the effects of the 2
                      > > variables, running them together but analytically splitting out
                      the
                      > > impacts.
                      > >
                      > > The hard one says, well, its not just enough to know both
                      variables at
                      > > the same time; what if one variable impacts or moderates the
                      effect of
                      > > the other? For example, at high prices, color may make a huge
                      > > difference on the page, but at low prices, color doesn't matter.
                      This
                      > > interaction lets you know that color and price are not independent
                      > > variables but are related in their impact on purchase.
                      > >
                      > > MV testing is the only way to get at that. And yet, do you need to
                      > > know this? Well, if the impact of color is so minimal, or you
                      don't
                      > > really plan to change the color long term, then no, perhaps basic
                      > > testing is fine. But most analysts like to dig a bit. It all
                      depends
                      > > on time, resources, and the business question you need to answer.
                      > >
                      > > >
                      > > > Something else I'm curious about:
                      > > >
                      > > > I came across this study in an SEO forum:
                      > > >
                      > > > http://live.psu.edu/story/24878
                      > > >
                      > > > There are 4 search queries, 4 different search engines (all of
                      which
                      > > > display Google's results) and 32 participiants.
                      > > >
                      > > > The study was conducted in a way that they took the results from
                      > > > Google and displayed them on yahoo's msn's and an inhouse search
                      > > > engine as well as the one of Google.
                      > > >
                      > > > I assume each of the person only saw one query per search engine
                      > > > (per search engine's design to be correct) because otherwise
                      they
                      > > > would have obviously caught on if they had seen the same results
                      > > > four times for each query.
                      > > >
                      > > > I guess this study is completely ridiculous because the data
                      sample
                      > > > is so small, but what I was wondering is this:
                      > > >
                      > > > How do you determine the sample size of this? Is the sample
                      size =
                      > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                      different
                      > > > results (4 queries for each of the 32 individuals)?
                      > > >
                      > > > On the one hand 32*4 would seem logical to me but on the other
                      hand
                      > > > only the different individuals are probably statistically
                      > > > independent from one another.
                      > > >
                      > > > Does somebody know which one would be right? (My guess is 32*4
                      but
                      > > > im not sure)...and are things like these (rather basic..) broken
                      > > > down into one of the two books?
                      > >
                      > > The issue is the level of analysis. I suspect each person saw all
                      4
                      > > queries (say, for yacht, bill gates, soccer, norway), but
                      the "engine
                      > > design" was randomized. This would mean that the person count was
                      32,
                      > > but the total number of observations was probably 128. This allows
                      > > them to remove personal bias (the same person is tracked across 4
                      > > queries), query bias (32 people saw each query), and engine bias
                      (with
                      > > 4 engines, you have 8 obs per engine).
                      > >
                      > > Yes, this is a small sample size in some ways, but sample size
                      matters
                      > > more when the effect you are looking for is small, or there is
                      high
                      > > variability in the measures you are using. I suspect neither of
                      these
                      > > were the case here (clear google bias is well known, and the
                      metrics
                      > > were probably 7 pt likert scales on trust, reliability, etc.
                      > >
                      > > Coda: Here is the link to the study:
                      > >
                      > >
                      http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen_
                      branding_of_search_engines.pdf
                      > > I haven't read it yet, so we can all look at it and laugh at me
                      if I
                      > > got it
                      > > all wrong!
                      > >
                      > > Michael
                      > >
                      > >
                      > >
                      >
                      >
                      > [Non-text portions of this message have been removed]
                      >
                    • nevertrustab
                      Hi Justo, thanks for the reply. And of course also thx @Michael and Mike if you re reading this. Does this mean that a multiple regression can be univariate
                      Message 10 of 17 , Jul 3, 2007
                      • 0 Attachment
                        Hi Justo,

                        thanks for the reply. And of course also thx @Michael and Mike if
                        you're reading this.

                        Does this mean that a multiple regression can be univariate
                        statistics if you only test one variable w/o changing the others?

                        As for this study: Actually Im not doing this study, but somebody
                        did it already (I think I posted a link).

                        But what youre saying about 30 for each case (almost 500) confirms
                        what I was thinking: the sample size of 132 for this study is
                        probably too small to draw any real conclusions from it.




                        --- In webanalytics@yahoogroups.com, "justo_ibarra" <jibarra@...>
                        wrote:
                        >
                        > Dear Patric:
                        >
                        > I'm agree with both Mike and Michael about whats univariate and
                        > multivariate analyses.
                        > Basically you can say that univariate is to test the impact of one
                        > single variable on the results and that multivariate is to test
                        the
                        > interaction between many variables on the results.
                        >
                        > On the sample size calculation you can take a more practical
                        approach
                        > just establishing a theoretical number of cases. How it works? in
                        > your problem we have 4 different search engines and 4 different
                        > search terms that produces 4*4=16 categories; so you need to
                        fulfill
                        > every category with an appropiate number of cases in order to
                        reach
                        > some conclusion about it.
                        > So whats the number? in the minimun I say about 30 cases in each
                        > category wich means a small sample in every category (N=480).
                        >
                        > On my criteria the appropiate statistical method will be MANOVA
                        > (Multiple Analyses of Variance) that allows you to compare the
                        impact
                        > of many independent variables (both numeric or cathegorical) over
                        a
                        > single independent cuantitative variable (I supose may be you use
                        > CTR%).
                        > Basically it performs comparing between medias for every group and
                        > stablishing statisticall differences for it.
                        >
                        > I hope you find usefull my comments.
                        >
                        > Regards,
                        >
                        > Justo Ibarra
                        > jibarra@...
                        >
                        >
                        >
                        > --- In webanalytics@yahoogroups.com, "Mike Bradley"
                        > <michaeljohnbradley@> wrote:
                        > >
                        > > Univariate statistical methods test a single relationship and
                        > multivariate
                        > > or multivariable statistical methods test more than one
                        > relationship at the
                        > > same time. Multiple regression is one of many multivariate
                        methods.
                        > >
                        > > Mike
                        > >
                        > >
                        > > On 7/2/07, Michael Wexler <wexler@> wrote:
                        > > >
                        > > > My comments inline below:
                        > > >
                        > > > --- In webanalytics@yahoogroups.com <webanalytics%
                        > 40yahoogroups.com>,
                        > > > "nevertrustab" <patriccc@> wrote:
                        > > > >
                        > > > > I've looked at these books and they have great reviews.
                        > > > >
                        > > > > However, one thing I'd still like to ask:
                        > > > >
                        > > > > What is really the difference between univariate and
                        > multivariate
                        > > > > statistics/data analysis?
                        > > > >
                        > > > > I know what a multiple regression is and all the other
                        basics I
                        > > > > learned in college (in a business degree), but..
                        > > > >
                        > > > > is multiple regression = multivariate statistics (as opposed
                        to
                        > a
                        > > > > regression with only one variable that can be changed)?
                        > > > >
                        > > > > or does multivariate statistics refer to changing multiple
                        > variables
                        > > > > at the same time? I tried reading up a bit on this and as for
                        > > > > multivariate testing in web analytics (for example) I read
                        that
                        > it
                        > > > > was about testing multiple variables at the same time, b/c
                        it's
                        > more
                        > > > > time-efficient, etc. but how can this be done? Is that really
                        > > > > possible and is that what multivariate data analysis is
                        about
                        > (not
                        > > > > just having multiple variables like in a multiple regression
                        but
                        > > > > analyzing the effects of some of them at the same time?)?
                        > > >
                        > > > At the end of the day, its very simple. You test multiple
                        > variables
                        > > > to discover two related but different things:
                        > > > 1) What ELSE can I change about my site/stimulus other than
                        the
                        > price
                        > > > to impact eventual behavior? (everyone tests price first, it
                        > seems)
                        > > > 2) What COMBINATION of things makes the most impact in
                        eventual
                        > behavior?
                        > > >
                        > > > Univariate testing tests one thing at a time. So, you can
                        indeed
                        > > > answer number 1 above with univariate testing, after you test
                        > price,
                        > > > to see what else has impact. This happens serially, or in
                        > parallel if
                        > > > you can split your exposed population (the visitors to the
                        site,
                        > the
                        > > > email you send out, whatever).
                        > > >
                        > > > But number 2, the combination, is what Multivariate is
                        designed
                        > for.
                        > > > Now, you can do it easy (no interactions) or hard
                        (interactions).
                        > If
                        > > > you do it easy, you can discover that some combinations of your
                        > > > variables appear to work better than others, but they are still
                        > > > independent. That is, you are just summing the effects of the 2
                        > > > variables, running them together but analytically splitting
                        out
                        > the
                        > > > impacts.
                        > > >
                        > > > The hard one says, well, its not just enough to know both
                        > variables at
                        > > > the same time; what if one variable impacts or moderates the
                        > effect of
                        > > > the other? For example, at high prices, color may make a huge
                        > > > difference on the page, but at low prices, color doesn't
                        matter.
                        > This
                        > > > interaction lets you know that color and price are not
                        independent
                        > > > variables but are related in their impact on purchase.
                        > > >
                        > > > MV testing is the only way to get at that. And yet, do you
                        need to
                        > > > know this? Well, if the impact of color is so minimal, or you
                        > don't
                        > > > really plan to change the color long term, then no, perhaps
                        basic
                        > > > testing is fine. But most analysts like to dig a bit. It all
                        > depends
                        > > > on time, resources, and the business question you need to
                        answer.
                        > > >
                        > > > >
                        > > > > Something else I'm curious about:
                        > > > >
                        > > > > I came across this study in an SEO forum:
                        > > > >
                        > > > > http://live.psu.edu/story/24878
                        > > > >
                        > > > > There are 4 search queries, 4 different search engines (all
                        of
                        > which
                        > > > > display Google's results) and 32 participiants.
                        > > > >
                        > > > > The study was conducted in a way that they took the results
                        from
                        > > > > Google and displayed them on yahoo's msn's and an inhouse
                        search
                        > > > > engine as well as the one of Google.
                        > > > >
                        > > > > I assume each of the person only saw one query per search
                        engine
                        > > > > (per search engine's design to be correct) because otherwise
                        > they
                        > > > > would have obviously caught on if they had seen the same
                        results
                        > > > > four times for each query.
                        > > > >
                        > > > > I guess this study is completely ridiculous because the data
                        > sample
                        > > > > is so small, but what I was wondering is this:
                        > > > >
                        > > > > How do you determine the sample size of this? Is the sample
                        > size =
                        > > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                        > different
                        > > > > results (4 queries for each of the 32 individuals)?
                        > > > >
                        > > > > On the one hand 32*4 would seem logical to me but on the
                        other
                        > hand
                        > > > > only the different individuals are probably statistically
                        > > > > independent from one another.
                        > > > >
                        > > > > Does somebody know which one would be right? (My guess is
                        32*4
                        > but
                        > > > > im not sure)...and are things like these (rather basic..)
                        broken
                        > > > > down into one of the two books?
                        > > >
                        > > > The issue is the level of analysis. I suspect each person saw
                        all
                        > 4
                        > > > queries (say, for yacht, bill gates, soccer, norway), but
                        > the "engine
                        > > > design" was randomized. This would mean that the person count
                        was
                        > 32,
                        > > > but the total number of observations was probably 128. This
                        allows
                        > > > them to remove personal bias (the same person is tracked
                        across 4
                        > > > queries), query bias (32 people saw each query), and engine
                        bias
                        > (with
                        > > > 4 engines, you have 8 obs per engine).
                        > > >
                        > > > Yes, this is a small sample size in some ways, but sample size
                        > matters
                        > > > more when the effect you are looking for is small, or there is
                        > high
                        > > > variability in the measures you are using. I suspect neither
                        of
                        > these
                        > > > were the case here (clear google bias is well known, and the
                        > metrics
                        > > > were probably 7 pt likert scales on trust, reliability, etc.
                        > > >
                        > > > Coda: Here is the link to the study:
                        > > >
                        > > >
                        >
                        http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen
                        _
                        > branding_of_search_engines.pdf
                        > > > I haven't read it yet, so we can all look at it and laugh at
                        me
                        > if I
                        > > > got it
                        > > > all wrong!
                        > > >
                        > > > Michael
                        > > >
                        > > >
                        > > >
                        > >
                        > >
                        > > [Non-text portions of this message have been removed]
                        > >
                        >
                      • Mike Bradley
                        Hi. Multiple regression is multivariate because more than two variables are involved....think, for example, the effect of two variables on a single
                        Message 11 of 17 , Jul 4, 2007
                        • 0 Attachment
                          Hi.

                          Multiple regression is multivariate because more than two variables are
                          involved....think, for example, the effect of two variables on a single
                          variable....like display ad spend and search spend on revenue or conversion
                          rate.

                          Linear regression is the univariate version of multiple regression. It
                          tests for the linear relationship between two variables.

                          Mike


                          On 7/3/07, nevertrustab <patriccc@...> wrote:
                          >
                          > Hi Justo,
                          >
                          > thanks for the reply. And of course also thx @Michael and Mike if
                          > you're reading this.
                          >
                          > Does this mean that a multiple regression can be univariate
                          > statistics if you only test one variable w/o changing the others?
                          >
                          > As for this study: Actually Im not doing this study, but somebody
                          > did it already (I think I posted a link).
                          >
                          > But what youre saying about 30 for each case (almost 500) confirms
                          > what I was thinking: the sample size of 132 for this study is
                          > probably too small to draw any real conclusions from it.
                          >
                          > --- In webanalytics@yahoogroups.com <webanalytics%40yahoogroups.com>,
                          > "justo_ibarra" <jibarra@...>
                          > wrote:
                          > >
                          > > Dear Patric:
                          > >
                          > > I'm agree with both Mike and Michael about whats univariate and
                          > > multivariate analyses.
                          > > Basically you can say that univariate is to test the impact of one
                          > > single variable on the results and that multivariate is to test
                          > the
                          > > interaction between many variables on the results.
                          > >
                          > > On the sample size calculation you can take a more practical
                          > approach
                          > > just establishing a theoretical number of cases. How it works? in
                          > > your problem we have 4 different search engines and 4 different
                          > > search terms that produces 4*4=16 categories; so you need to
                          > fulfill
                          > > every category with an appropiate number of cases in order to
                          > reach
                          > > some conclusion about it.
                          > > So whats the number? in the minimun I say about 30 cases in each
                          > > category wich means a small sample in every category (N=480).
                          > >
                          > > On my criteria the appropiate statistical method will be MANOVA
                          > > (Multiple Analyses of Variance) that allows you to compare the
                          > impact
                          > > of many independent variables (both numeric or cathegorical) over
                          > a
                          > > single independent cuantitative variable (I supose may be you use
                          > > CTR%).
                          > > Basically it performs comparing between medias for every group and
                          > > stablishing statisticall differences for it.
                          > >
                          > > I hope you find usefull my comments.
                          > >
                          > > Regards,
                          > >
                          > > Justo Ibarra
                          > > jibarra@...
                          > >
                          > >
                          > >
                          > > --- In webanalytics@yahoogroups.com <webanalytics%40yahoogroups.com>,
                          > "Mike Bradley"
                          > > <michaeljohnbradley@> wrote:
                          > > >
                          > > > Univariate statistical methods test a single relationship and
                          > > multivariate
                          > > > or multivariable statistical methods test more than one
                          > > relationship at the
                          > > > same time. Multiple regression is one of many multivariate
                          > methods.
                          > > >
                          > > > Mike
                          > > >
                          > > >
                          > > > On 7/2/07, Michael Wexler <wexler@> wrote:
                          > > > >
                          > > > > My comments inline below:
                          > > > >
                          > > > > --- In webanalytics@yahoogroups.com <webanalytics%40yahoogroups.com><webanalytics%
                          > > 40yahoogroups.com>,
                          > > > > "nevertrustab" <patriccc@> wrote:
                          > > > > >
                          > > > > > I've looked at these books and they have great reviews.
                          > > > > >
                          > > > > > However, one thing I'd still like to ask:
                          > > > > >
                          > > > > > What is really the difference between univariate and
                          > > multivariate
                          > > > > > statistics/data analysis?
                          > > > > >
                          > > > > > I know what a multiple regression is and all the other
                          > basics I
                          > > > > > learned in college (in a business degree), but..
                          > > > > >
                          > > > > > is multiple regression = multivariate statistics (as opposed
                          > to
                          > > a
                          > > > > > regression with only one variable that can be changed)?
                          > > > > >
                          > > > > > or does multivariate statistics refer to changing multiple
                          > > variables
                          > > > > > at the same time? I tried reading up a bit on this and as for
                          > > > > > multivariate testing in web analytics (for example) I read
                          > that
                          > > it
                          > > > > > was about testing multiple variables at the same time, b/c
                          > it's
                          > > more
                          > > > > > time-efficient, etc. but how can this be done? Is that really
                          > > > > > possible and is that what multivariate data analysis is
                          > about
                          > > (not
                          > > > > > just having multiple variables like in a multiple regression
                          > but
                          > > > > > analyzing the effects of some of them at the same time?)?
                          > > > >
                          > > > > At the end of the day, its very simple. You test multiple
                          > > variables
                          > > > > to discover two related but different things:
                          > > > > 1) What ELSE can I change about my site/stimulus other than
                          > the
                          > > price
                          > > > > to impact eventual behavior? (everyone tests price first, it
                          > > seems)
                          > > > > 2) What COMBINATION of things makes the most impact in
                          > eventual
                          > > behavior?
                          > > > >
                          > > > > Univariate testing tests one thing at a time. So, you can
                          > indeed
                          > > > > answer number 1 above with univariate testing, after you test
                          > > price,
                          > > > > to see what else has impact. This happens serially, or in
                          > > parallel if
                          > > > > you can split your exposed population (the visitors to the
                          > site,
                          > > the
                          > > > > email you send out, whatever).
                          > > > >
                          > > > > But number 2, the combination, is what Multivariate is
                          > designed
                          > > for.
                          > > > > Now, you can do it easy (no interactions) or hard
                          > (interactions).
                          > > If
                          > > > > you do it easy, you can discover that some combinations of your
                          > > > > variables appear to work better than others, but they are still
                          > > > > independent. That is, you are just summing the effects of the 2
                          > > > > variables, running them together but analytically splitting
                          > out
                          > > the
                          > > > > impacts.
                          > > > >
                          > > > > The hard one says, well, its not just enough to know both
                          > > variables at
                          > > > > the same time; what if one variable impacts or moderates the
                          > > effect of
                          > > > > the other? For example, at high prices, color may make a huge
                          > > > > difference on the page, but at low prices, color doesn't
                          > matter.
                          > > This
                          > > > > interaction lets you know that color and price are not
                          > independent
                          > > > > variables but are related in their impact on purchase.
                          > > > >
                          > > > > MV testing is the only way to get at that. And yet, do you
                          > need to
                          > > > > know this? Well, if the impact of color is so minimal, or you
                          > > don't
                          > > > > really plan to change the color long term, then no, perhaps
                          > basic
                          > > > > testing is fine. But most analysts like to dig a bit. It all
                          > > depends
                          > > > > on time, resources, and the business question you need to
                          > answer.
                          > > > >
                          > > > > >
                          > > > > > Something else I'm curious about:
                          > > > > >
                          > > > > > I came across this study in an SEO forum:
                          > > > > >
                          > > > > > http://live.psu.edu/story/24878
                          > > > > >
                          > > > > > There are 4 search queries, 4 different search engines (all
                          > of
                          > > which
                          > > > > > display Google's results) and 32 participiants.
                          > > > > >
                          > > > > > The study was conducted in a way that they took the results
                          > from
                          > > > > > Google and displayed them on yahoo's msn's and an inhouse
                          > search
                          > > > > > engine as well as the one of Google.
                          > > > > >
                          > > > > > I assume each of the person only saw one query per search
                          > engine
                          > > > > > (per search engine's design to be correct) because otherwise
                          > > they
                          > > > > > would have obviously caught on if they had seen the same
                          > results
                          > > > > > four times for each query.
                          > > > > >
                          > > > > > I guess this study is completely ridiculous because the data
                          > > sample
                          > > > > > is so small, but what I was wondering is this:
                          > > > > >
                          > > > > > How do you determine the sample size of this? Is the sample
                          > > size =
                          > > > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                          > > different
                          > > > > > results (4 queries for each of the 32 individuals)?
                          > > > > >
                          > > > > > On the one hand 32*4 would seem logical to me but on the
                          > other
                          > > hand
                          > > > > > only the different individuals are probably statistically
                          > > > > > independent from one another.
                          > > > > >
                          > > > > > Does somebody know which one would be right? (My guess is
                          > 32*4
                          > > but
                          > > > > > im not sure)...and are things like these (rather basic..)
                          > broken
                          > > > > > down into one of the two books?
                          > > > >
                          > > > > The issue is the level of analysis. I suspect each person saw
                          > all
                          > > 4
                          > > > > queries (say, for yacht, bill gates, soccer, norway), but
                          > > the "engine
                          > > > > design" was randomized. This would mean that the person count
                          > was
                          > > 32,
                          > > > > but the total number of observations was probably 128. This
                          > allows
                          > > > > them to remove personal bias (the same person is tracked
                          > across 4
                          > > > > queries), query bias (32 people saw each query), and engine
                          > bias
                          > > (with
                          > > > > 4 engines, you have 8 obs per engine).
                          > > > >
                          > > > > Yes, this is a small sample size in some ways, but sample size
                          > > matters
                          > > > > more when the effect you are looking for is small, or there is
                          > > high
                          > > > > variability in the measures you are using. I suspect neither
                          > of
                          > > these
                          > > > > were the case here (clear google bias is well known, and the
                          > > metrics
                          > > > > were probably 7 pt likert scales on trust, reliability, etc.
                          > > > >
                          > > > > Coda: Here is the link to the study:
                          > > > >
                          > > > >
                          > >
                          > http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen
                          > _
                          > > branding_of_search_engines.pdf
                          > > > > I haven't read it yet, so we can all look at it and laugh at
                          > me
                          > > if I
                          > > > > got it
                          > > > > all wrong!
                          > > > >
                          > > > > Michael
                          > > > >
                          > > > >
                          > > > >
                          > > >
                          > > >
                          > > > [Non-text portions of this message have been removed]
                          > > >
                          > >
                          >
                          >
                          >


                          [Non-text portions of this message have been removed]
                        • justo_ibarra
                          Hi again Patric: With reference to your question, my answer is yes. If you doesn t make changes on control variables and only test changes in one independent
                          Message 12 of 17 , Jul 4, 2007
                          • 0 Attachment
                            Hi again Patric:

                            With reference to your question, my answer is yes. If you doesn't
                            make changes on control variables and only test changes in one
                            independent variable your analysis its only univariate.
                            But in the specific case of Multiple Regresion the model doesn't work
                            well if you do that.

                            With reference to the study (even I didn't read it) the conclusions
                            had a high level of statistical error for that sample size.
                            Putting in other words maybe the conclusions are wrong (or not).

                            Justo.

                            --- In webanalytics@yahoogroups.com, "nevertrustab" <patriccc@...>
                            wrote:
                            >
                            > Hi Justo,
                            >
                            > thanks for the reply. And of course also thx @Michael and Mike if
                            > you're reading this.
                            >
                            > Does this mean that a multiple regression can be univariate
                            > statistics if you only test one variable w/o changing the others?
                            >
                            > As for this study: Actually Im not doing this study, but somebody
                            > did it already (I think I posted a link).
                            >
                            > But what youre saying about 30 for each case (almost 500) confirms
                            > what I was thinking: the sample size of 132 for this study is
                            > probably too small to draw any real conclusions from it.
                            >
                            >
                            >
                            >
                            > --- In webanalytics@yahoogroups.com, "justo_ibarra" <jibarra@>
                            > wrote:
                            > >
                            > > Dear Patric:
                            > >
                            > > I'm agree with both Mike and Michael about whats univariate and
                            > > multivariate analyses.
                            > > Basically you can say that univariate is to test the impact of
                            one
                            > > single variable on the results and that multivariate is to test
                            > the
                            > > interaction between many variables on the results.
                            > >
                            > > On the sample size calculation you can take a more practical
                            > approach
                            > > just establishing a theoretical number of cases. How it works? in
                            > > your problem we have 4 different search engines and 4 different
                            > > search terms that produces 4*4=16 categories; so you need to
                            > fulfill
                            > > every category with an appropiate number of cases in order to
                            > reach
                            > > some conclusion about it.
                            > > So whats the number? in the minimun I say about 30 cases in each
                            > > category wich means a small sample in every category (N=480).
                            > >
                            > > On my criteria the appropiate statistical method will be MANOVA
                            > > (Multiple Analyses of Variance) that allows you to compare the
                            > impact
                            > > of many independent variables (both numeric or cathegorical) over
                            > a
                            > > single independent cuantitative variable (I supose may be you use
                            > > CTR%).
                            > > Basically it performs comparing between medias for every group
                            and
                            > > stablishing statisticall differences for it.
                            > >
                            > > I hope you find usefull my comments.
                            > >
                            > > Regards,
                            > >
                            > > Justo Ibarra
                            > > jibarra@
                            > >
                            > >
                            > >
                            > > --- In webanalytics@yahoogroups.com, "Mike Bradley"
                            > > <michaeljohnbradley@> wrote:
                            > > >
                            > > > Univariate statistical methods test a single relationship and
                            > > multivariate
                            > > > or multivariable statistical methods test more than one
                            > > relationship at the
                            > > > same time. Multiple regression is one of many multivariate
                            > methods.
                            > > >
                            > > > Mike
                            > > >
                            > > >
                            > > > On 7/2/07, Michael Wexler <wexler@> wrote:
                            > > > >
                            > > > > My comments inline below:
                            > > > >
                            > > > > --- In webanalytics@yahoogroups.com <webanalytics%
                            > > 40yahoogroups.com>,
                            > > > > "nevertrustab" <patriccc@> wrote:
                            > > > > >
                            > > > > > I've looked at these books and they have great reviews.
                            > > > > >
                            > > > > > However, one thing I'd still like to ask:
                            > > > > >
                            > > > > > What is really the difference between univariate and
                            > > multivariate
                            > > > > > statistics/data analysis?
                            > > > > >
                            > > > > > I know what a multiple regression is and all the other
                            > basics I
                            > > > > > learned in college (in a business degree), but..
                            > > > > >
                            > > > > > is multiple regression = multivariate statistics (as
                            opposed
                            > to
                            > > a
                            > > > > > regression with only one variable that can be changed)?
                            > > > > >
                            > > > > > or does multivariate statistics refer to changing multiple
                            > > variables
                            > > > > > at the same time? I tried reading up a bit on this and as
                            for
                            > > > > > multivariate testing in web analytics (for example) I read
                            > that
                            > > it
                            > > > > > was about testing multiple variables at the same time, b/c
                            > it's
                            > > more
                            > > > > > time-efficient, etc. but how can this be done? Is that
                            really
                            > > > > > possible and is that what multivariate data analysis is
                            > about
                            > > (not
                            > > > > > just having multiple variables like in a multiple
                            regression
                            > but
                            > > > > > analyzing the effects of some of them at the same time?)?
                            > > > >
                            > > > > At the end of the day, its very simple. You test multiple
                            > > variables
                            > > > > to discover two related but different things:
                            > > > > 1) What ELSE can I change about my site/stimulus other than
                            > the
                            > > price
                            > > > > to impact eventual behavior? (everyone tests price first, it
                            > > seems)
                            > > > > 2) What COMBINATION of things makes the most impact in
                            > eventual
                            > > behavior?
                            > > > >
                            > > > > Univariate testing tests one thing at a time. So, you can
                            > indeed
                            > > > > answer number 1 above with univariate testing, after you test
                            > > price,
                            > > > > to see what else has impact. This happens serially, or in
                            > > parallel if
                            > > > > you can split your exposed population (the visitors to the
                            > site,
                            > > the
                            > > > > email you send out, whatever).
                            > > > >
                            > > > > But number 2, the combination, is what Multivariate is
                            > designed
                            > > for.
                            > > > > Now, you can do it easy (no interactions) or hard
                            > (interactions).
                            > > If
                            > > > > you do it easy, you can discover that some combinations of
                            your
                            > > > > variables appear to work better than others, but they are
                            still
                            > > > > independent. That is, you are just summing the effects of the
                            2
                            > > > > variables, running them together but analytically splitting
                            > out
                            > > the
                            > > > > impacts.
                            > > > >
                            > > > > The hard one says, well, its not just enough to know both
                            > > variables at
                            > > > > the same time; what if one variable impacts or moderates the
                            > > effect of
                            > > > > the other? For example, at high prices, color may make a huge
                            > > > > difference on the page, but at low prices, color doesn't
                            > matter.
                            > > This
                            > > > > interaction lets you know that color and price are not
                            > independent
                            > > > > variables but are related in their impact on purchase.
                            > > > >
                            > > > > MV testing is the only way to get at that. And yet, do you
                            > need to
                            > > > > know this? Well, if the impact of color is so minimal, or you
                            > > don't
                            > > > > really plan to change the color long term, then no, perhaps
                            > basic
                            > > > > testing is fine. But most analysts like to dig a bit. It all
                            > > depends
                            > > > > on time, resources, and the business question you need to
                            > answer.
                            > > > >
                            > > > > >
                            > > > > > Something else I'm curious about:
                            > > > > >
                            > > > > > I came across this study in an SEO forum:
                            > > > > >
                            > > > > > http://live.psu.edu/story/24878
                            > > > > >
                            > > > > > There are 4 search queries, 4 different search engines (all
                            > of
                            > > which
                            > > > > > display Google's results) and 32 participiants.
                            > > > > >
                            > > > > > The study was conducted in a way that they took the results
                            > from
                            > > > > > Google and displayed them on yahoo's msn's and an inhouse
                            > search
                            > > > > > engine as well as the one of Google.
                            > > > > >
                            > > > > > I assume each of the person only saw one query per search
                            > engine
                            > > > > > (per search engine's design to be correct) because
                            otherwise
                            > > they
                            > > > > > would have obviously caught on if they had seen the same
                            > results
                            > > > > > four times for each query.
                            > > > > >
                            > > > > > I guess this study is completely ridiculous because the
                            data
                            > > sample
                            > > > > > is so small, but what I was wondering is this:
                            > > > > >
                            > > > > > How do you determine the sample size of this? Is the sample
                            > > size =
                            > > > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                            > > different
                            > > > > > results (4 queries for each of the 32 individuals)?
                            > > > > >
                            > > > > > On the one hand 32*4 would seem logical to me but on the
                            > other
                            > > hand
                            > > > > > only the different individuals are probably statistically
                            > > > > > independent from one another.
                            > > > > >
                            > > > > > Does somebody know which one would be right? (My guess is
                            > 32*4
                            > > but
                            > > > > > im not sure)...and are things like these (rather basic..)
                            > broken
                            > > > > > down into one of the two books?
                            > > > >
                            > > > > The issue is the level of analysis. I suspect each person saw
                            > all
                            > > 4
                            > > > > queries (say, for yacht, bill gates, soccer, norway), but
                            > > the "engine
                            > > > > design" was randomized. This would mean that the person count
                            > was
                            > > 32,
                            > > > > but the total number of observations was probably 128. This
                            > allows
                            > > > > them to remove personal bias (the same person is tracked
                            > across 4
                            > > > > queries), query bias (32 people saw each query), and engine
                            > bias
                            > > (with
                            > > > > 4 engines, you have 8 obs per engine).
                            > > > >
                            > > > > Yes, this is a small sample size in some ways, but sample
                            size
                            > > matters
                            > > > > more when the effect you are looking for is small, or there
                            is
                            > > high
                            > > > > variability in the measures you are using. I suspect neither
                            > of
                            > > these
                            > > > > were the case here (clear google bias is well known, and the
                            > > metrics
                            > > > > were probably 7 pt likert scales on trust, reliability, etc.
                            > > > >
                            > > > > Coda: Here is the link to the study:
                            > > > >
                            > > > >
                            > >
                            >
                            http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen
                            > _
                            > > branding_of_search_engines.pdf
                            > > > > I haven't read it yet, so we can all look at it and laugh at
                            > me
                            > > if I
                            > > > > got it
                            > > > > all wrong!
                            > > > >
                            > > > > Michael
                            > > > >
                            > > > >
                            > > > >
                            > > >
                            > > >
                            > > > [Non-text portions of this message have been removed]
                            > > >
                            > >
                            >
                          • nevertrustab
                            Thank you Mike, maybe im jumping to conclusions here, but if i wanted to get another stats book that helped me learn more about stats than i know by now (i
                            Message 13 of 17 , Jul 4, 2007
                            • 0 Attachment
                              Thank you Mike,

                              maybe im jumping to conclusions here, but if i wanted to get another
                              stats book that helped me learn more about stats than i know by now
                              (i know the very basic stuff and a whole lot about multiple
                              regressions and tests for them) then getting the multivariate data
                              analysis book might be a better bet for me than getting the one on
                              univariate statistics using SPSS?

                              --- In webanalytics@yahoogroups.com, "Mike Bradley"
                              <michaeljohnbradley@...> wrote:
                              >
                              > Hi.
                              >
                              > Multiple regression is multivariate because more than two
                              variables are
                              > involved....think, for example, the effect of two variables on a
                              single
                              > variable....like display ad spend and search spend on revenue or
                              conversion
                              > rate.
                              >
                              > Linear regression is the univariate version of multiple
                              regression. It
                              > tests for the linear relationship between two variables.
                              >
                              > Mike
                              >
                              >
                              > On 7/3/07, nevertrustab <patriccc@...> wrote:
                              > >
                              > > Hi Justo,
                              > >
                              > > thanks for the reply. And of course also thx @Michael and Mike if
                              > > you're reading this.
                              > >
                              > > Does this mean that a multiple regression can be univariate
                              > > statistics if you only test one variable w/o changing the others?
                              > >
                              > > As for this study: Actually Im not doing this study, but somebody
                              > > did it already (I think I posted a link).
                              > >
                              > > But what youre saying about 30 for each case (almost 500)
                              confirms
                              > > what I was thinking: the sample size of 132 for this study is
                              > > probably too small to draw any real conclusions from it.
                              > >
                              > > --- In webanalytics@yahoogroups.com <webanalytics%
                              40yahoogroups.com>,
                              > > "justo_ibarra" <jibarra@>
                              > > wrote:
                              > > >
                              > > > Dear Patric:
                              > > >
                              > > > I'm agree with both Mike and Michael about whats univariate and
                              > > > multivariate analyses.
                              > > > Basically you can say that univariate is to test the impact of
                              one
                              > > > single variable on the results and that multivariate is to test
                              > > the
                              > > > interaction between many variables on the results.
                              > > >
                              > > > On the sample size calculation you can take a more practical
                              > > approach
                              > > > just establishing a theoretical number of cases. How it works?
                              in
                              > > > your problem we have 4 different search engines and 4 different
                              > > > search terms that produces 4*4=16 categories; so you need to
                              > > fulfill
                              > > > every category with an appropiate number of cases in order to
                              > > reach
                              > > > some conclusion about it.
                              > > > So whats the number? in the minimun I say about 30 cases in
                              each
                              > > > category wich means a small sample in every category (N=480).
                              > > >
                              > > > On my criteria the appropiate statistical method will be MANOVA
                              > > > (Multiple Analyses of Variance) that allows you to compare the
                              > > impact
                              > > > of many independent variables (both numeric or cathegorical)
                              over
                              > > a
                              > > > single independent cuantitative variable (I supose may be you
                              use
                              > > > CTR%).
                              > > > Basically it performs comparing between medias for every group
                              and
                              > > > stablishing statisticall differences for it.
                              > > >
                              > > > I hope you find usefull my comments.
                              > > >
                              > > > Regards,
                              > > >
                              > > > Justo Ibarra
                              > > > jibarra@
                              > > >
                              > > >
                              > > >
                              > > > --- In webanalytics@yahoogroups.com <webanalytics%
                              40yahoogroups.com>,
                              > > "Mike Bradley"
                              > > > <michaeljohnbradley@> wrote:
                              > > > >
                              > > > > Univariate statistical methods test a single relationship and
                              > > > multivariate
                              > > > > or multivariable statistical methods test more than one
                              > > > relationship at the
                              > > > > same time. Multiple regression is one of many multivariate
                              > > methods.
                              > > > >
                              > > > > Mike
                              > > > >
                              > > > >
                              > > > > On 7/2/07, Michael Wexler <wexler@> wrote:
                              > > > > >
                              > > > > > My comments inline below:
                              > > > > >
                              > > > > > --- In webanalytics@yahoogroups.com <webanalytics%
                              40yahoogroups.com><webanalytics%
                              > > > 40yahoogroups.com>,
                              > > > > > "nevertrustab" <patriccc@> wrote:
                              > > > > > >
                              > > > > > > I've looked at these books and they have great reviews.
                              > > > > > >
                              > > > > > > However, one thing I'd still like to ask:
                              > > > > > >
                              > > > > > > What is really the difference between univariate and
                              > > > multivariate
                              > > > > > > statistics/data analysis?
                              > > > > > >
                              > > > > > > I know what a multiple regression is and all the other
                              > > basics I
                              > > > > > > learned in college (in a business degree), but..
                              > > > > > >
                              > > > > > > is multiple regression = multivariate statistics (as
                              opposed
                              > > to
                              > > > a
                              > > > > > > regression with only one variable that can be changed)?
                              > > > > > >
                              > > > > > > or does multivariate statistics refer to changing
                              multiple
                              > > > variables
                              > > > > > > at the same time? I tried reading up a bit on this and
                              as for
                              > > > > > > multivariate testing in web analytics (for example) I
                              read
                              > > that
                              > > > it
                              > > > > > > was about testing multiple variables at the same time,
                              b/c
                              > > it's
                              > > > more
                              > > > > > > time-efficient, etc. but how can this be done? Is that
                              really
                              > > > > > > possible and is that what multivariate data analysis is
                              > > about
                              > > > (not
                              > > > > > > just having multiple variables like in a multiple
                              regression
                              > > but
                              > > > > > > analyzing the effects of some of them at the same time?)?
                              > > > > >
                              > > > > > At the end of the day, its very simple. You test multiple
                              > > > variables
                              > > > > > to discover two related but different things:
                              > > > > > 1) What ELSE can I change about my site/stimulus other than
                              > > the
                              > > > price
                              > > > > > to impact eventual behavior? (everyone tests price first,
                              it
                              > > > seems)
                              > > > > > 2) What COMBINATION of things makes the most impact in
                              > > eventual
                              > > > behavior?
                              > > > > >
                              > > > > > Univariate testing tests one thing at a time. So, you can
                              > > indeed
                              > > > > > answer number 1 above with univariate testing, after you
                              test
                              > > > price,
                              > > > > > to see what else has impact. This happens serially, or in
                              > > > parallel if
                              > > > > > you can split your exposed population (the visitors to the
                              > > site,
                              > > > the
                              > > > > > email you send out, whatever).
                              > > > > >
                              > > > > > But number 2, the combination, is what Multivariate is
                              > > designed
                              > > > for.
                              > > > > > Now, you can do it easy (no interactions) or hard
                              > > (interactions).
                              > > > If
                              > > > > > you do it easy, you can discover that some combinations of
                              your
                              > > > > > variables appear to work better than others, but they are
                              still
                              > > > > > independent. That is, you are just summing the effects of
                              the 2
                              > > > > > variables, running them together but analytically splitting
                              > > out
                              > > > the
                              > > > > > impacts.
                              > > > > >
                              > > > > > The hard one says, well, its not just enough to know both
                              > > > variables at
                              > > > > > the same time; what if one variable impacts or moderates
                              the
                              > > > effect of
                              > > > > > the other? For example, at high prices, color may make a
                              huge
                              > > > > > difference on the page, but at low prices, color doesn't
                              > > matter.
                              > > > This
                              > > > > > interaction lets you know that color and price are not
                              > > independent
                              > > > > > variables but are related in their impact on purchase.
                              > > > > >
                              > > > > > MV testing is the only way to get at that. And yet, do you
                              > > need to
                              > > > > > know this? Well, if the impact of color is so minimal, or
                              you
                              > > > don't
                              > > > > > really plan to change the color long term, then no, perhaps
                              > > basic
                              > > > > > testing is fine. But most analysts like to dig a bit. It
                              all
                              > > > depends
                              > > > > > on time, resources, and the business question you need to
                              > > answer.
                              > > > > >
                              > > > > > >
                              > > > > > > Something else I'm curious about:
                              > > > > > >
                              > > > > > > I came across this study in an SEO forum:
                              > > > > > >
                              > > > > > > http://live.psu.edu/story/24878
                              > > > > > >
                              > > > > > > There are 4 search queries, 4 different search engines
                              (all
                              > > of
                              > > > which
                              > > > > > > display Google's results) and 32 participiants.
                              > > > > > >
                              > > > > > > The study was conducted in a way that they took the
                              results
                              > > from
                              > > > > > > Google and displayed them on yahoo's msn's and an inhouse
                              > > search
                              > > > > > > engine as well as the one of Google.
                              > > > > > >
                              > > > > > > I assume each of the person only saw one query per search
                              > > engine
                              > > > > > > (per search engine's design to be correct) because
                              otherwise
                              > > > they
                              > > > > > > would have obviously caught on if they had seen the same
                              > > results
                              > > > > > > four times for each query.
                              > > > > > >
                              > > > > > > I guess this study is completely ridiculous because the
                              data
                              > > > sample
                              > > > > > > is so small, but what I was wondering is this:
                              > > > > > >
                              > > > > > > How do you determine the sample size of this? Is the
                              sample
                              > > > size =
                              > > > > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                              > > > different
                              > > > > > > results (4 queries for each of the 32 individuals)?
                              > > > > > >
                              > > > > > > On the one hand 32*4 would seem logical to me but on the
                              > > other
                              > > > hand
                              > > > > > > only the different individuals are probably statistically
                              > > > > > > independent from one another.
                              > > > > > >
                              > > > > > > Does somebody know which one would be right? (My guess is
                              > > 32*4
                              > > > but
                              > > > > > > im not sure)...and are things like these (rather basic..)
                              > > broken
                              > > > > > > down into one of the two books?
                              > > > > >
                              > > > > > The issue is the level of analysis. I suspect each person
                              saw
                              > > all
                              > > > 4
                              > > > > > queries (say, for yacht, bill gates, soccer, norway), but
                              > > > the "engine
                              > > > > > design" was randomized. This would mean that the person
                              count
                              > > was
                              > > > 32,
                              > > > > > but the total number of observations was probably 128. This
                              > > allows
                              > > > > > them to remove personal bias (the same person is tracked
                              > > across 4
                              > > > > > queries), query bias (32 people saw each query), and engine
                              > > bias
                              > > > (with
                              > > > > > 4 engines, you have 8 obs per engine).
                              > > > > >
                              > > > > > Yes, this is a small sample size in some ways, but sample
                              size
                              > > > matters
                              > > > > > more when the effect you are looking for is small, or
                              there is
                              > > > high
                              > > > > > variability in the measures you are using. I suspect
                              neither
                              > > of
                              > > > these
                              > > > > > were the case here (clear google bias is well known, and
                              the
                              > > > metrics
                              > > > > > were probably 7 pt likert scales on trust, reliability,
                              etc.
                              > > > > >
                              > > > > > Coda: Here is the link to the study:
                              > > > > >
                              > > > > >
                              > > >
                              > >
                              http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen
                              > > _
                              > > > branding_of_search_engines.pdf
                              > > > > > I haven't read it yet, so we can all look at it and laugh
                              at
                              > > me
                              > > > if I
                              > > > > > got it
                              > > > > > all wrong!
                              > > > > >
                              > > > > > Michael
                              > > > > >
                              > > > > >
                              > > > > >
                              > > > >
                              > > > >
                              > > > > [Non-text portions of this message have been removed]
                              > > > >
                              > > >
                              > >
                              > >
                              > >
                              >
                              >
                              > [Non-text portions of this message have been removed]
                              >
                            • nevertrustab
                              thx for your help Justo. ... work ... conclusions ... if ... somebody ... confirms ... and ... test ... in ... different ... each ... MANOVA ... over ... use
                              Message 14 of 17 , Jul 4, 2007
                              • 0 Attachment
                                thx for your help Justo.
                                --- In webanalytics@yahoogroups.com, "justo_ibarra" <jibarra@...>
                                wrote:
                                >
                                > Hi again Patric:
                                >
                                > With reference to your question, my answer is yes. If you doesn't
                                > make changes on control variables and only test changes in one
                                > independent variable your analysis its only univariate.
                                > But in the specific case of Multiple Regresion the model doesn't
                                work
                                > well if you do that.
                                >
                                > With reference to the study (even I didn't read it) the
                                conclusions
                                > had a high level of statistical error for that sample size.
                                > Putting in other words maybe the conclusions are wrong (or not).
                                >
                                > Justo.
                                >
                                > --- In webanalytics@yahoogroups.com, "nevertrustab" <patriccc@>
                                > wrote:
                                > >
                                > > Hi Justo,
                                > >
                                > > thanks for the reply. And of course also thx @Michael and Mike
                                if
                                > > you're reading this.
                                > >
                                > > Does this mean that a multiple regression can be univariate
                                > > statistics if you only test one variable w/o changing the others?
                                > >
                                > > As for this study: Actually Im not doing this study, but
                                somebody
                                > > did it already (I think I posted a link).
                                > >
                                > > But what youre saying about 30 for each case (almost 500)
                                confirms
                                > > what I was thinking: the sample size of 132 for this study is
                                > > probably too small to draw any real conclusions from it.
                                > >
                                > >
                                > >
                                > >
                                > > --- In webanalytics@yahoogroups.com, "justo_ibarra" <jibarra@>
                                > > wrote:
                                > > >
                                > > > Dear Patric:
                                > > >
                                > > > I'm agree with both Mike and Michael about whats univariate
                                and
                                > > > multivariate analyses.
                                > > > Basically you can say that univariate is to test the impact of
                                > one
                                > > > single variable on the results and that multivariate is to
                                test
                                > > the
                                > > > interaction between many variables on the results.
                                > > >
                                > > > On the sample size calculation you can take a more practical
                                > > approach
                                > > > just establishing a theoretical number of cases. How it works?
                                in
                                > > > your problem we have 4 different search engines and 4
                                different
                                > > > search terms that produces 4*4=16 categories; so you need to
                                > > fulfill
                                > > > every category with an appropiate number of cases in order to
                                > > reach
                                > > > some conclusion about it.
                                > > > So whats the number? in the minimun I say about 30 cases in
                                each
                                > > > category wich means a small sample in every category (N=480).
                                > > >
                                > > > On my criteria the appropiate statistical method will be
                                MANOVA
                                > > > (Multiple Analyses of Variance) that allows you to compare the
                                > > impact
                                > > > of many independent variables (both numeric or cathegorical)
                                over
                                > > a
                                > > > single independent cuantitative variable (I supose may be you
                                use
                                > > > CTR%).
                                > > > Basically it performs comparing between medias for every group
                                > and
                                > > > stablishing statisticall differences for it.
                                > > >
                                > > > I hope you find usefull my comments.
                                > > >
                                > > > Regards,
                                > > >
                                > > > Justo Ibarra
                                > > > jibarra@
                                > > >
                                > > >
                                > > >
                                > > > --- In webanalytics@yahoogroups.com, "Mike Bradley"
                                > > > <michaeljohnbradley@> wrote:
                                > > > >
                                > > > > Univariate statistical methods test a single relationship
                                and
                                > > > multivariate
                                > > > > or multivariable statistical methods test more than one
                                > > > relationship at the
                                > > > > same time. Multiple regression is one of many multivariate
                                > > methods.
                                > > > >
                                > > > > Mike
                                > > > >
                                > > > >
                                > > > > On 7/2/07, Michael Wexler <wexler@> wrote:
                                > > > > >
                                > > > > > My comments inline below:
                                > > > > >
                                > > > > > --- In webanalytics@yahoogroups.com <webanalytics%
                                > > > 40yahoogroups.com>,
                                > > > > > "nevertrustab" <patriccc@> wrote:
                                > > > > > >
                                > > > > > > I've looked at these books and they have great reviews.
                                > > > > > >
                                > > > > > > However, one thing I'd still like to ask:
                                > > > > > >
                                > > > > > > What is really the difference between univariate and
                                > > > multivariate
                                > > > > > > statistics/data analysis?
                                > > > > > >
                                > > > > > > I know what a multiple regression is and all the other
                                > > basics I
                                > > > > > > learned in college (in a business degree), but..
                                > > > > > >
                                > > > > > > is multiple regression = multivariate statistics (as
                                > opposed
                                > > to
                                > > > a
                                > > > > > > regression with only one variable that can be changed)?
                                > > > > > >
                                > > > > > > or does multivariate statistics refer to changing
                                multiple
                                > > > variables
                                > > > > > > at the same time? I tried reading up a bit on this and
                                as
                                > for
                                > > > > > > multivariate testing in web analytics (for example) I
                                read
                                > > that
                                > > > it
                                > > > > > > was about testing multiple variables at the same time,
                                b/c
                                > > it's
                                > > > more
                                > > > > > > time-efficient, etc. but how can this be done? Is that
                                > really
                                > > > > > > possible and is that what multivariate data analysis is
                                > > about
                                > > > (not
                                > > > > > > just having multiple variables like in a multiple
                                > regression
                                > > but
                                > > > > > > analyzing the effects of some of them at the same time?)?
                                > > > > >
                                > > > > > At the end of the day, its very simple. You test multiple
                                > > > variables
                                > > > > > to discover two related but different things:
                                > > > > > 1) What ELSE can I change about my site/stimulus other
                                than
                                > > the
                                > > > price
                                > > > > > to impact eventual behavior? (everyone tests price first,
                                it
                                > > > seems)
                                > > > > > 2) What COMBINATION of things makes the most impact in
                                > > eventual
                                > > > behavior?
                                > > > > >
                                > > > > > Univariate testing tests one thing at a time. So, you can
                                > > indeed
                                > > > > > answer number 1 above with univariate testing, after you
                                test
                                > > > price,
                                > > > > > to see what else has impact. This happens serially, or in
                                > > > parallel if
                                > > > > > you can split your exposed population (the visitors to the
                                > > site,
                                > > > the
                                > > > > > email you send out, whatever).
                                > > > > >
                                > > > > > But number 2, the combination, is what Multivariate is
                                > > designed
                                > > > for.
                                > > > > > Now, you can do it easy (no interactions) or hard
                                > > (interactions).
                                > > > If
                                > > > > > you do it easy, you can discover that some combinations of
                                > your
                                > > > > > variables appear to work better than others, but they are
                                > still
                                > > > > > independent. That is, you are just summing the effects of
                                the
                                > 2
                                > > > > > variables, running them together but analytically
                                splitting
                                > > out
                                > > > the
                                > > > > > impacts.
                                > > > > >
                                > > > > > The hard one says, well, its not just enough to know both
                                > > > variables at
                                > > > > > the same time; what if one variable impacts or moderates
                                the
                                > > > effect of
                                > > > > > the other? For example, at high prices, color may make a
                                huge
                                > > > > > difference on the page, but at low prices, color doesn't
                                > > matter.
                                > > > This
                                > > > > > interaction lets you know that color and price are not
                                > > independent
                                > > > > > variables but are related in their impact on purchase.
                                > > > > >
                                > > > > > MV testing is the only way to get at that. And yet, do you
                                > > need to
                                > > > > > know this? Well, if the impact of color is so minimal, or
                                you
                                > > > don't
                                > > > > > really plan to change the color long term, then no,
                                perhaps
                                > > basic
                                > > > > > testing is fine. But most analysts like to dig a bit. It
                                all
                                > > > depends
                                > > > > > on time, resources, and the business question you need to
                                > > answer.
                                > > > > >
                                > > > > > >
                                > > > > > > Something else I'm curious about:
                                > > > > > >
                                > > > > > > I came across this study in an SEO forum:
                                > > > > > >
                                > > > > > > http://live.psu.edu/story/24878
                                > > > > > >
                                > > > > > > There are 4 search queries, 4 different search engines
                                (all
                                > > of
                                > > > which
                                > > > > > > display Google's results) and 32 participiants.
                                > > > > > >
                                > > > > > > The study was conducted in a way that they took the
                                results
                                > > from
                                > > > > > > Google and displayed them on yahoo's msn's and an
                                inhouse
                                > > search
                                > > > > > > engine as well as the one of Google.
                                > > > > > >
                                > > > > > > I assume each of the person only saw one query per
                                search
                                > > engine
                                > > > > > > (per search engine's design to be correct) because
                                > otherwise
                                > > > they
                                > > > > > > would have obviously caught on if they had seen the same
                                > > results
                                > > > > > > four times for each query.
                                > > > > > >
                                > > > > > > I guess this study is completely ridiculous because the
                                > data
                                > > > sample
                                > > > > > > is so small, but what I was wondering is this:
                                > > > > > >
                                > > > > > > How do you determine the sample size of this? Is the
                                sample
                                > > > size =
                                > > > > > > 32 (32 different individuals) or is it 32*4 b/c of 32*4
                                > > > different
                                > > > > > > results (4 queries for each of the 32 individuals)?
                                > > > > > >
                                > > > > > > On the one hand 32*4 would seem logical to me but on the
                                > > other
                                > > > hand
                                > > > > > > only the different individuals are probably statistically
                                > > > > > > independent from one another.
                                > > > > > >
                                > > > > > > Does somebody know which one would be right? (My guess
                                is
                                > > 32*4
                                > > > but
                                > > > > > > im not sure)...and are things like these (rather
                                basic..)
                                > > broken
                                > > > > > > down into one of the two books?
                                > > > > >
                                > > > > > The issue is the level of analysis. I suspect each person
                                saw
                                > > all
                                > > > 4
                                > > > > > queries (say, for yacht, bill gates, soccer, norway), but
                                > > > the "engine
                                > > > > > design" was randomized. This would mean that the person
                                count
                                > > was
                                > > > 32,
                                > > > > > but the total number of observations was probably 128.
                                This
                                > > allows
                                > > > > > them to remove personal bias (the same person is tracked
                                > > across 4
                                > > > > > queries), query bias (32 people saw each query), and
                                engine
                                > > bias
                                > > > (with
                                > > > > > 4 engines, you have 8 obs per engine).
                                > > > > >
                                > > > > > Yes, this is a small sample size in some ways, but sample
                                > size
                                > > > matters
                                > > > > > more when the effect you are looking for is small, or
                                there
                                > is
                                > > > high
                                > > > > > variability in the measures you are using. I suspect
                                neither
                                > > of
                                > > > these
                                > > > > > were the case here (clear google bias is well known, and
                                the
                                > > > metrics
                                > > > > > were probably 7 pt likert scales on trust, reliability,
                                etc.
                                > > > > >
                                > > > > > Coda: Here is the link to the study:
                                > > > > >
                                > > > > >
                                > > >
                                > >
                                >
                                http://ist.psu.edu/faculty_pages/jjansen/academic/pres/chi2007/jansen
                                > > _
                                > > > branding_of_search_engines.pdf
                                > > > > > I haven't read it yet, so we can all look at it and laugh
                                at
                                > > me
                                > > > if I
                                > > > > > got it
                                > > > > > all wrong!
                                > > > > >
                                > > > > > Michael
                                > > > > >
                                > > > > >
                                > > > > >
                                > > > >
                                > > > >
                                > > > > [Non-text portions of this message have been removed]
                                > > > >
                                > > >
                                > >
                                >
                              • Dave and Kathleen Barber
                                Janet, I got the 1st book you listed...Jim is great, but your marketing is better. Patriccc, that s the longest post I ve seen in a while. Prehapse you
                                Message 15 of 17 , Jul 5, 2007
                                • 0 Attachment
                                  Janet, I got the 1st book you listed...Jim is great, but your marketing is
                                  better. Patriccc, that's the longest post I've seen in a while. Prehapse
                                  you should write your own book.

                                  On 6/29/07, nevertrustab <patriccc@...> wrote:
                                  >
                                  > Hi Janet,
                                  >
                                  > those books sound interesting. Actually data mining is what I had
                                  > wanted to get into before I "found out" about the web :-).
                                  >
                                  > I read lots of stuff about data mining and actually I've asked
                                  > myself this question before, too:
                                  >
                                  > Everybody talks about it and all...but is it really effective?
                                  >
                                  > From what I read in data mining theres a lot of the typical tech-guy
                                  > doesnt understand business guy. business guy thinks tech guy is not
                                  > necessary - thing going on lol. (probably not the only
                                  > interdisciplinary field which has that problem).
                                  >
                                  > To be honest, I still don't know how effective "data mining" really
                                  > is and whether it's simply hyped up (I mean it IS sort of a cool
                                  > buzz-word that aims to make the dry-sounding "statistics" more
                                  > exciting...when statistics is the main part of it (at least Ive been
                                  > told so)).
                                  >
                                  > But I would argue, that just because of one example where it was a
                                  > waste of resources that doesn't mean it is generally a waste of
                                  > resources. For example..surprise surprise...most of the time when I
                                  > read about "data mining" they cited mostly (exclusively?) cases with
                                  > extremely positive results.
                                  >
                                  > Chances are if we took a (statistically significant ;)) sample of
                                  > cases and looked how many times it worked and how many times it
                                  > didn't work we'll find many cases where it worked wonders and many
                                  > cases where it didn't work at all. (I dont dare make an assumption
                                  > which cases are more frequent/if the overall situation has a plus or
                                  > a minus).
                                  >
                                  > However, I would guess that there are quite a few (a lot?) of
                                  > companies for which analytical CRM works out well as it seems to
                                  > have become quite an established field. Actually I read a couple of
                                  > days ago that "data mining had its biggest success in business in
                                  > the field of CRM" - whether that is a good thing or not..I cant
                                  > answer that ;) but it sounds like it does work for some companies
                                  > out there.
                                  >
                                  > So all in all..I guess it's sort of like saying web analytics work
                                  > (dont work), SEO works (doesnt work). We can't really make a
                                  > statement whether something is effective or not unless we have a big
                                  > enough sample of it as there'll always be cases where it works and
                                  > where it doesnt. The matter is just in how many of those cases does
                                  > it work/ doesnt it work? And how big are the benefits if it does
                                  > work vs. the losses when it doesn't work?
                                  >
                                  > I wrote more than I thought I would.. once again ;) but Im sure you
                                  > know that we shouldnt make a conclusion for a whole field based on
                                  > one case..I just meant to point this out for data mining as there
                                  > seems to be a lot of talk about that.
                                  >
                                  > --- In webanalytics@yahoogroups.com <webanalytics%40yahoogroups.com>,
                                  > "Janet Park" <jparkmfi@...>
                                  > wrote:
                                  > >
                                  > >
                                  > > The following books go beyond mere statistics into the heavy
                                  > lifting of
                                  > > Data Mining and data modeling. Take these to the beach with you
                                  > this
                                  > > summer and no quantitative bullies from Cal Tech will kick sand in
                                  > your
                                  > > face!
                                  > >
                                  > > Mastering Data Mining: The Art and Science of Customer Relationship
                                  > > Management, by Michael J.A. Berry and Gordon Linoff, John Wiley &
                                  > Sons,
                                  > > 2000.
                                  > >
                                  > > Data Mining Techniques For Marketing, Sales, and Customer Support,
                                  > by
                                  > > Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, 1997.
                                  > >
                                  > > Both books offer slightly different twists on the subject and I'd
                                  > > recommend the pair, even though there is some redundancy. The
                                  > authors
                                  > > are practicing Data Mining Consultants and their real world
                                  > experience
                                  > > shows. They not only tell "how to," but even more important, "how
                                  > not
                                  > > to." Here's a bold example from Mastering Data Mining:
                                  > >
                                  > > "Is the Data Mining Effort Necessary?
                                  > >
                                  > > A Senior Vice President in the credit card group of a large bank
                                  > has
                                  > > spent tens of thousands of dollars developing a response model.
                                  > This
                                  > > predictive model is designed to identify the porpsects who are most
                                  > > likely to respond to the bank's next offering. The VP is told that
                                  > by
                                  > > using the model, she can save money; using only 20 percent of the
                                  > > prospect list will yield 70 percent of the responders. However,
                                  > despite
                                  > > these findings, she replies that she wants every single responder -
                                  > - not
                                  > > just some of them. Getting every responder requires using the
                                  > entire
                                  > > prospect list, since no model is perfect. In this case, data
                                  > mining is
                                  > > not necessary.
                                  > >
                                  > > Moral: She could have saved tens of thousands of dollars by not
                                  > building
                                  > > predictive models in the first place.
                                  > >
                                  > > I keep both handy as references and learn something each time I
                                  > pick
                                  > > them up, even though I've been practicing in this field for over 15
                                  > > years. You don't need to be a statistical guru to understand them -
                                  > -
                                  > > just skip over the hairy details if you prefer an "executive's"
                                  > approach
                                  > > to the subject.
                                  > >
                                  > >
                                  > >
                                  > > [Non-text portions of this message have been removed]
                                  > >
                                  >
                                  >
                                  >


                                  [Non-text portions of this message have been removed]
                                • nevertrustab
                                  Writing my own book lol..you re about the 5th person that has told me this during the last 12 months. The other day some woman who needed help with her
                                  Message 16 of 17 , Jul 7, 2007
                                  • 0 Attachment
                                    Writing my own book lol..you're about the 5th person that has told
                                    me this during the last 12 months. The other day some woman who
                                    needed help with her (altruistic) website asked a question and I
                                    answered by breaking down all the basics on how to get started with
                                    SEO in a >2 pages long post to help her..afterwards I was adviced I
                                    might want to break down my posts into chapters LOL (ironically of
                                    course).

                                    Maybe I will get to do that later hehe.

                                    --- In webanalytics@yahoogroups.com, "Dave and Kathleen Barber"
                                    <barbers@...> wrote:
                                    >
                                    > Janet, I got the 1st book you listed...Jim is great, but your
                                    marketing is
                                    > better. Patriccc, that's the longest post I've seen in a while.
                                    Prehapse
                                    > you should write your own book.
                                    >
                                    > On 6/29/07, nevertrustab <patriccc@...> wrote:
                                    > >
                                    > > Hi Janet,
                                    > >
                                    > > those books sound interesting. Actually data mining is what I had
                                    > > wanted to get into before I "found out" about the web :-).
                                    > >
                                    > > I read lots of stuff about data mining and actually I've asked
                                    > > myself this question before, too:
                                    > >
                                    > > Everybody talks about it and all...but is it really effective?
                                    > >
                                    > > From what I read in data mining theres a lot of the typical tech-
                                    guy
                                    > > doesnt understand business guy. business guy thinks tech guy is
                                    not
                                    > > necessary - thing going on lol. (probably not the only
                                    > > interdisciplinary field which has that problem).
                                    > >
                                    > > To be honest, I still don't know how effective "data mining"
                                    really
                                    > > is and whether it's simply hyped up (I mean it IS sort of a cool
                                    > > buzz-word that aims to make the dry-sounding "statistics" more
                                    > > exciting...when statistics is the main part of it (at least Ive
                                    been
                                    > > told so)).
                                    > >
                                    > > But I would argue, that just because of one example where it was
                                    a
                                    > > waste of resources that doesn't mean it is generally a waste of
                                    > > resources. For example..surprise surprise...most of the time
                                    when I
                                    > > read about "data mining" they cited mostly (exclusively?) cases
                                    with
                                    > > extremely positive results.
                                    > >
                                    > > Chances are if we took a (statistically significant ;)) sample of
                                    > > cases and looked how many times it worked and how many times it
                                    > > didn't work we'll find many cases where it worked wonders and
                                    many
                                    > > cases where it didn't work at all. (I dont dare make an
                                    assumption
                                    > > which cases are more frequent/if the overall situation has a
                                    plus or
                                    > > a minus).
                                    > >
                                    > > However, I would guess that there are quite a few (a lot?) of
                                    > > companies for which analytical CRM works out well as it seems to
                                    > > have become quite an established field. Actually I read a couple
                                    of
                                    > > days ago that "data mining had its biggest success in business in
                                    > > the field of CRM" - whether that is a good thing or not..I cant
                                    > > answer that ;) but it sounds like it does work for some companies
                                    > > out there.
                                    > >
                                    > > So all in all..I guess it's sort of like saying web analytics
                                    work
                                    > > (dont work), SEO works (doesnt work). We can't really make a
                                    > > statement whether something is effective or not unless we have a
                                    big
                                    > > enough sample of it as there'll always be cases where it works
                                    and
                                    > > where it doesnt. The matter is just in how many of those cases
                                    does
                                    > > it work/ doesnt it work? And how big are the benefits if it does
                                    > > work vs. the losses when it doesn't work?
                                    > >
                                    > > I wrote more than I thought I would.. once again ;) but Im sure
                                    you
                                    > > know that we shouldnt make a conclusion for a whole field based
                                    on
                                    > > one case..I just meant to point this out for data mining as there
                                    > > seems to be a lot of talk about that.
                                    > >
                                    > > --- In webanalytics@yahoogroups.com <webanalytics%
                                    40yahoogroups.com>,
                                    > > "Janet Park" <jparkmfi@>
                                    > > wrote:
                                    > > >
                                    > > >
                                    > > > The following books go beyond mere statistics into the heavy
                                    > > lifting of
                                    > > > Data Mining and data modeling. Take these to the beach with you
                                    > > this
                                    > > > summer and no quantitative bullies from Cal Tech will kick
                                    sand in
                                    > > your
                                    > > > face!
                                    > > >
                                    > > > Mastering Data Mining: The Art and Science of Customer
                                    Relationship
                                    > > > Management, by Michael J.A. Berry and Gordon Linoff, John
                                    Wiley &
                                    > > Sons,
                                    > > > 2000.
                                    > > >
                                    > > > Data Mining Techniques For Marketing, Sales, and Customer
                                    Support,
                                    > > by
                                    > > > Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, 1997.
                                    > > >
                                    > > > Both books offer slightly different twists on the subject and
                                    I'd
                                    > > > recommend the pair, even though there is some redundancy. The
                                    > > authors
                                    > > > are practicing Data Mining Consultants and their real world
                                    > > experience
                                    > > > shows. They not only tell "how to," but even more
                                    important, "how
                                    > > not
                                    > > > to." Here's a bold example from Mastering Data Mining:
                                    > > >
                                    > > > "Is the Data Mining Effort Necessary?
                                    > > >
                                    > > > A Senior Vice President in the credit card group of a large
                                    bank
                                    > > has
                                    > > > spent tens of thousands of dollars developing a response model.
                                    > > This
                                    > > > predictive model is designed to identify the porpsects who are
                                    most
                                    > > > likely to respond to the bank's next offering. The VP is told
                                    that
                                    > > by
                                    > > > using the model, she can save money; using only 20 percent of
                                    the
                                    > > > prospect list will yield 70 percent of the responders. However,
                                    > > despite
                                    > > > these findings, she replies that she wants every single
                                    responder -
                                    > > - not
                                    > > > just some of them. Getting every responder requires using the
                                    > > entire
                                    > > > prospect list, since no model is perfect. In this case, data
                                    > > mining is
                                    > > > not necessary.
                                    > > >
                                    > > > Moral: She could have saved tens of thousands of dollars by not
                                    > > building
                                    > > > predictive models in the first place.
                                    > > >
                                    > > > I keep both handy as references and learn something each time I
                                    > > pick
                                    > > > them up, even though I've been practicing in this field for
                                    over 15
                                    > > > years. You don't need to be a statistical guru to understand
                                    them -
                                    > > -
                                    > > > just skip over the hairy details if you prefer an "executive's"
                                    > > approach
                                    > > > to the subject.
                                    > > >
                                    > > >
                                    > > >
                                    > > > [Non-text portions of this message have been removed]
                                    > > >
                                    > >
                                    > >
                                    > >
                                    >
                                    >
                                    > [Non-text portions of this message have been removed]
                                    >
                                  • Paula Thornton
                                    Data mining is typically ineffective for marketing...MCI spent millions on predictive models to segment their leads to make sure that the highest profiled
                                    Message 17 of 17 , Jul 8, 2007
                                    • 0 Attachment
                                      Data mining is typically ineffective for marketing...MCI spent millions on
                                      predictive models to segment their leads to make sure that the 'highest
                                      profiled' leads were marketed the 'highest profile' products. When you
                                      legally can only pitch to a number every 3 months, you have to make sure not
                                      to 'waste' a single effort.

                                      The problem was that the models were based on artifacts, not facts. Now,
                                      data mining against facts would be different...but marketers typically don't
                                      have facts, because they haven't agreessively taken a strategic position to
                                      drive IT initiatives. But MCI could have learned a lot more from their own
                                      customers by simply capturing their preferences. It's simply a matter of
                                      opening up 'listening' channels and/or creating data models to accept new
                                      data. MCI spent millions 'buying' data from the likes of the credit
                                      providers (yes, those same people who issue your credit score should give it
                                      to you for free because they make plenty by selling the data to
                                      corporations). Certainly you might want to buy that data, but not for
                                      predictions.

                                      Let's start with the basics of data mining. You mine the data to look for
                                      patterns...you make a prediction, you capture the model and you then apply
                                      it to other data and select the data based on the predictions. Companies
                                      should first start with mining and understanding their own data, and
                                      leverage the other data to identify people who match the criteria and then
                                      test the results to see how accurate the predictions and/or the assumptions
                                      about the pattern were in the first place. Does this typically happen?
                                      Categorically, no.

                                      Web transactions and even behavioral data are factual, but they are
                                      incomplete facts. You can surely draw the wrong conclusions from the
                                      findings. You have to know the data and the implications of the conditions.
                                      Most situations I read about I can extrapolate the key issues in a moment.
                                      It's a matter of very simplistic parts and pieces that fit together.

                                      MCI did make data mining very successful in one particular case and it paid
                                      handsomely. But look at what it required. The mining itself is based on a
                                      certain algorithm. Different algorithms are more appropriate to find certain
                                      types of patterns. That means that you have to know enough about the data to
                                      know what type of algorithm (or combination thereof) might produce the most
                                      fruitful results (this is not something to try at home). In the case of the
                                      success at MCI, they 'bought' the author of a particular algorithm -- an
                                      advanced university grad student. The guy knew his own creation. It was like
                                      a living being. It only performed based on the conditions. He knew enough to
                                      alter the algorithm to meet the needs of the data...and data changes...he
                                      constantly tuned the algorithm (you're not going to get this from some 3rd
                                      party product). His algorithm was used to identify anomolies in calling
                                      patterns. Effectively it was a fraud sniffer. And it was VERY effective...it
                                      saved MCI in both credibility and in actual costs and it locked down
                                      criminal factions.

                                      On a more reasonable side, iPerceptions has a mining algorithm to find
                                      patterns in the words used in textual comments. This simplistic pattern
                                      recognition is useful in identifying certain activities and behaviors, but
                                      it is only as good as the tuning that goes on -- if the model does not
                                      account for the fact that in your business the term 'fast' is the name of
                                      your product, you're not going to get good results.

                                      Once it has found something, you have to be intelligent enough to know how
                                      to interpret it. That is, you don't use the tool to prove something you
                                      already assume to be true, because you will try to 'find' a pattern that may
                                      not exist and/or you will use a pattern that means something else to
                                      reinforce the thing you've already decided, which may already be flawed.

                                      So data mining is not one thing. It is many different things. It's no
                                      different than the term 'transportation'. You can say, "I'm going to use
                                      'transportation' to get to China". But if the form of transportation is
                                      'walking', your results might not ever come to bear.

                                      One last story...in the mid-90s when calls came in to our Director from data
                                      mining vendors he'd send them to me. There were often college guys who'd
                                      been hired to make phonecalls (maybe it was a friend of the tool
                                      'author')...they didn't know what they were selling or what it could do. And
                                      they certainly had no idea about scalability. I soon learned that there was
                                      a simple question that I could start with: what platform did it operate on?
                                      You see, many of these algorithms started like the one we bought for
                                      fraud...they were created at a university. But they were used against very
                                      small sets of data. Therefore, they often were only designed (at this time)
                                      to operate on a SPREADSHEET! The database we were leveraging the tools
                                      against had close to 3 million rows. But that's yet another small irony of
                                      some aspects of data mining (the pattern case against the words noted
                                      above is different, it's already specific)...you have to have determined the
                                      question before you start and pull out a segment of the data that's
                                      relevant. If your goal is simply to uncover something interesting...well
                                      then, you have to start with data survey! [Yes, there's a book for that
                                      too.]


                                      [Non-text portions of this message have been removed]
                                    Your message has been successfully submitted and would be delivered to recipients shortly.