Loading ...
Sorry, an error occurred while loading the content.
 

RE: [SearchCoP] Google search appliance

Expand Messages
  • John Kane
    Lee, You may want to review the below article and blog for a review of Google Search Appliance (GSA): Put to the test: Google Enterprise Search Appliance
    Message 1 of 14 , Apr 3, 2008

      Lee,

      You may want to review the below article and blog for a review of Google Search Appliance (GSA):

       

      Put to the test: Google Enterprise Search Appliance Version 5
      The Google Mini and Google Search Appliance have long offer solid alternatives for Web site and departmental search. Targeting broader deployments, Google's latest upgrades have brought improved results ranking and an option for "source biasing," but security and adaptability still come up short for the demands of enterprise deployments.


      By Adriaan Bloem and Tony Byrne (Note: Adriaan Bloem and Tony Byrne are, respectively, Contributing Analyst and Founder of CMS Watch, a vendor-neutral technology evaluation firm.)

      March 10, 2008

      PROS
      -- Exceptionally easy to install, configure, and maintain
      -- Familiar interface perceived as authoritative by enterprise searchers
      -- Good default relevancy rankings, especially for Web pages and office files
      CONS
      -- Not well suited for environments requiring document-level security
      -- Clumsy and inefficient approaches to accessing non-Web-page content
      -- Lacks advanced tuning controls needed in more complex, enterprise settings
      http://www.intelligententerprise.com/showArticle.jhtml?articleID=206902752

      See also comments on this blog: http://jiyeon.interspike.com/?p=43

       

       

      Regards,

      John

       

      John T. Kane

      Search Evangelist

       

      From: SearchCoP@yahoogroups.com [mailto:SearchCoP@yahoogroups.com] On Behalf Of Lee Romero
      Sent: Thursday, April 03, 2008 12:11 PM
      To: searchcop
      Subject: [SearchCoP] Google search appliance

       

      Hi all - I'm researching the pros and cons of the Google appliance as
      an enterprise search engine.

      I attended the Search CoP webcast (about a year ago, I think) where it
      was discussed and also attended the Enterprise Search Summit last fall
      in San Jose (where a session by Kerry Hughes from Dow Chemical covered
      pros and cons of it). So I have some (limited) insights.

      Does anyone have any more recent experience they can share?

      What are the things that work well in your environment?

      What are the gotchas you didn't anticipate that have caused stumbling blocks?

      Do you use the appliance to index content that requires authentication
      to access? Do you index static content and also crawl web
      applications?

      I know in the past I've heard it described as a tool you just turn on
      and let it run - there is relatively little tweaking you can even do
      as an administrator. Is that still the case? Has that been an issue
      for you?

      Thanks for any insight you can share.

      Lee Romero

    • Tim
      We are an Autonomy shop and that hasn t been an easy path. However, I do keep an eye on the Google GSA because we have so many users and executives that ask
      Message 2 of 14 , Apr 3, 2008

        We are an Autonomy shop and that hasn't been an easy path.  However, I do keep an eye on the Google GSA because we have so many users and executives that ask why we're not using Google.  Since we are not Google customers, I am only echoing what I've heard so take my info with a grain of salt.

        I too attended the Dow presentation and took good notes.  Here are my recollections from the Dow presentation (which were very similar comments to other people I spoke with at the conference that used the Google GSA as well).

        • GSA was a tactical solution to fill an immediate void
        • GSA would not be the strategic solution
        • Dow only crawled web content
        • Took 3 FTEs six months to work out the issues and get relevant results
        • The ran into infrastructure "features" that they did not anticipate
        • GSA requires constant "care and feeding" to sustain
        • Licensing is by document count and that was a constant limiting factor
        • Looping content such as the Buzz Lightyear calendar
        • Many painful limitations (file size, metadata, abstracts, fields and the stemming file all had size limits)
        • Had to write custom web application as a work-around to some limitations
        • IIS web server must be installed on the file shares and they experienced several issues related to IIS web server integration with the GSA
        • Suffered from inaccurate product documentation
        • Ultimately the project was judged a success

        We ran into a couple of the same issues with Autonomy so some of Dow's issues are just the nature of the search beast.  These issues were... "constant care and feeding" is a vast understatement with Autonomy.  We are constantly finding looping issues or more specifically; web sites/pages that cause duplicates.  We also had the Buzz Lightyear calendars.  Finally, we also ran into a lot of unanticipated "infrastructure features." 

        We looked closely at Google before we went with Autonomy.  Google was ruled out because of the requirement to have IIS installed on our file shares (we have thousands of file shares!)  Also the GSA security was lacking - even Autonomy which had a great reputation for security needed enhancements to support a whole range of features we required such as large Window AD groups, cross-domain groups, Windows DFS support, amoung others.   The late binding was also an issue with Google and as of last fall, they were still clinging to that technology.   Most enterprise-class vendors support both early and late binding.

        The final killer was the sheer number of GSAs we would need to scale to 80 TB of content was impractical and not cost effective.  Keep in mind that this was over 2 years ago so that facts regarding GSA may have changed. 

        From the people I've spoke with at the conference and the speakers I heard talk about the GSA, it sounds like a great tactical search solution - for example, a public web site search engine with thousands (not millions) of documents and only simple security requirements.  I would be concerned about scalability and security if that is an issue for you. 

        Susan Feldman stated at the conference that once you get into millions of documents you can't get away from a large-scale deployment.  I would agree.  There are so many complexities and nuances in a large enterprise search deployment, across the vast IT infrastructure, in a large company, that I would have a hard time believing that you could just plug in a search appliance and have it work - let alone having end users be happy with the search results.

        Tim

        --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@...> wrote:
        >
        > Hi all - I'm researching the pros and cons of the Google appliance as
        > an enterprise search engine.
        >
        > I attended the Search CoP webcast (about a year ago, I think) where it
        > was discussed and also attended the Enterprise Search Summit last fall
        > in San Jose (where a session by Kerry Hughes from Dow Chemical covered
        > pros and cons of it). So I have some (limited) insights.
        >
        > Does anyone have any more recent experience they can share?
        >
        > What are the things that work well in your environment?
        >
        > What are the gotchas you didn't anticipate that have caused stumbling blocks?
        >
        > Do you use the appliance to index content that requires authentication
        > to access? Do you index static content and also crawl web
        > applications?
        >
        > I know in the past I've heard it described as a tool you just turn on
        > and let it run - there is relatively little tweaking you can even do
        > as an administrator. Is that still the case? Has that been an issue
        > for you?
        >
        > Thanks for any insight you can share.
        >
        > Lee Romero
        >

      • Lee Romero
        John - Thanks for the pointer. I subscribe to that blog s feed and noticed that article but it didn t come to mind as I ve started doing a bit more digging
        Message 3 of 14 , Apr 7, 2008
          John - Thanks for the pointer. I subscribe to that blog's feed and
          noticed that article but it didn't come to mind as I've started doing
          a bit more digging into the details.

          Thanks for the reminder :-)

          Lee Romero

          On Thu, Apr 3, 2008 at 4:06 PM, John Kane <jt-kane@...> wrote:
          >
          >
          >
          >
          > Lee,
          >
          > You may want to review the below article and blog for a review of Google
          > Search Appliance (GSA):
          >
          >
          >
          > Put to the test: Google Enterprise Search Appliance Version 5
        • Lee Romero
          Tim - Thank you very much for summarizing your notes. I had many of the same notes but you definitely caught a couple that had escaped my pen at the time. I
          Message 4 of 14 , Apr 7, 2008
            Tim - Thank you very much for summarizing your notes. I had many of
            the same notes but you definitely caught a couple that had escaped my
            pen at the time.

            I appreciate you taking the time to send along the details.

            Lee Romero

            On Thu, Apr 3, 2008 at 5:35 PM, Tim <tbwendt@...> wrote:
            >
            >
            >
            > We are an Autonomy shop and that hasn't been an easy path. However, I do
            > keep an eye on the Google GSA because we have so many users and executives
            > that ask why we're not using Google. Since we are not Google customers, I
            > am only echoing what I've heard so take my info with a grain of salt.
            >
          • Jim
            ...
            Message 5 of 14 , Apr 8, 2008


              --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@...> wrote:
              > Hi all - I'm researching the pros and cons of the Google appliance
              as
              > an enterprise search engine....
              >
              >... Thanks for any insight you can share.


              Lee

              The company I work for recently completed a fairly intensive search analysis with an RFQ and then a POC installation at our site to compare the final two search companies for an enterprise search solution. The final companies analyzed were Autonomy and Google. Here are my observation and opinion ( The Good, The Bad and The Ugly) . This is not a complete list and I can give you more time if you want to discuss this on a call . Sorry My examples may not make sense because of the  industry I am working in but I didn't have the time to generalize these. I'll try and give a consistent comparison.

              Google -

              • Google uses an appliance approach – no hardware or software for client to maintain
              •  Search appliances are refreshed every two years
              •  Initial setup of appliance and indexing of "public" content was very fast and easy (hours ) Nice Plug and play without
                significant effort.
              •  Google was able to query Documentum (our ECM), Day (our WCM) , Wiki's. 
              •  Searching of protected content - could only handle one domain/security model per appliance. 
              •  Security model is post indexed so it slows search with a lot of security 
              •  Very limited in the number of content connectors - other is 3rd party or need to be created by you or a third party partner. (not sure on long term support)
              • Index about 100,000 documents per hour.
              •  Auto tuning so static content will eventually only be checked a couple of times a month thus if a document is added or deleted from the system it will not be indexed until the next cycle.
              •  Pricing is done by document (actually unique links - this is a UNIX based system) Thus if you have multiple variation of a URL in an IIS server (mixed case and all lower case and all uppercase) each instance is counted as a new document and will appear in the search result too.
              •  To handle multiple domains probably require more appliances.
              •  Tuning is limited to global setting
              •  Granularity of any tuning parameters is global in most cases
              •  Search Console/dashboard does not have any hierarchical other than admin or business manager
              •  Keyword based ( must manually inter relationships such as HVAC is same as heating and ventilation)
              •  Best Bet/keyworded documents - Each keywords is added manually and then all urls tied to that keyword are added to that file.
              •  Cannot easily mix relevance - e.g. if some content is highly relevant based on dates ( e.g. new and events) where as other documents are relevant based on keywords and usage we were not able to determine how that could be done except through separate appliances.
              •  Foreign language support is not fully developed. Can read most core languages but only few languages are fully supported for stemming etc.
              •  Main focus is on basic search of enterprise not complex business applications. 
              •  Employee recognize the name Google

              Summary: Low requirement for technical support – very black and white tool set which reduces support requirements. When special requirements occur it's a Yes or No response.  You may end up investing in other search products for other business
              needs such as call center handling etc. Google does what it does very well but it not a fully evolved enterprise solution for large
              companies with a large amount of content or diverse employee populations General relevance is high – Knowledge  management /concept searching is not really available.

              Autonomy:

              • Runs on UNIX, Linux and Microsoft platforms. Other platforms are supported as a second tier release.
              •  Initially more expensive but does have an enterprise license model. 
              •  Poor out of the box relevance. They haven't developed a plug and play kcik off application yet ( requires tuning and understanding you business content) 
              •  Able to index nearly any kind of readable content allowing for more types of uses with the application. We did not have any issues and connectors are kept current with releases of the content SW. 
              •  Understands various URL formats are the same URL. 
              •  Has a large set of features and functions for various business problems 
              •  Indexes files faster and can handle any size document. I believe it uses a faster process to better understanding of what has and has not been indexed. 
              •  Can index across multiple domains 
              •  Handles most standard security including LDAP and MS and UNIX security. 
              •  Tuning of indexes can be scheduled so systems with particular content schedules can be index right after the content changes. 
              •  Many features to allow you to tune content searches based on business rules. ( I can go into detail privately about business rules) Very granular which helps for very specific issues. 
              •  Can do near real time indexing of highly active areas. 
              •  Indexes databases and generates a well formatted document on the fly.
              •  New admin/business console is flexible so different people can be given control over part of the system either at an admin level or as a business analyst. 
              •  Can handle key word but also does concept based searches.
              • Sometime confusing to the user but also can help discover content that's related but written with a different focus. (e.g. management of a HVAC system could also relate to management of an Air conditioning system or a Chiller system) 
              •  Create a company dictionary of words automatically – thus special term, "company terminology", special spellings etc are all in the dictionary. Also will help you create a thesaurus through its concept based search.  
              •  Fully handles over 70 Languages and can read over 100.
              •  Best Bet/keyworded documents - they have a tool in the admin to manage this but they were also able to read our metadata tag and use it for the keyword matches.
              •  Main focus is on deep enterprise tools with search and knowledge management across all types of content.
              •  Plays well with various metadata and taxonomies you may already have in place. 
              •  General employees do not recognize Autonomy as a brand.

              Summary: Autonomy is not for the hobbyist. If you are not dedicated to using search beyond just the portal search then Autonomy's over kill. If you expect to index and use search for a number of special needs and you want to integrate it into B2B and B2C then it will do it. You will need to also be committed to having a dedicated resource that will be trained in the advanced tools. It is very open source so you can make it do what you want but on the other hand it will do poorly if you don't invest in the time and skills. I think the economic for this tools set become more attractive as you go global and/or the business is complex business and/or a broad number of businesses related or not. It has great discovery tools and other
              enhancements but you have to use them. The product also needs to be marketed to your IT and business organization or they will fall back to third party specialty products that cost more and do less than what Autonomy can.

              My analogy –

              Google search is like getting to know a popular person as a friend. As long as you both have the same goals everyone gets  along. Once your requirements change as the relationship grows you discover the friends limitations. You can learn to give up those requirements or it can not then it puts a lot of pressure on the friendship because you have to develop other relationship to meet those needs. Since it's a 2 year agreement you can part ways but it still expensive to develop a new relationship with someone else although you will be much more knowledgeable of your needs.

              Autonomy is much more of a life partner that has many strong attributes. It cost more in time and effort to get started and requires commitment from both sides and if you both are committed then the rewards follow and you are able to continue to grow. If of you are not committed eventually you have a rough relationship and possible a costly separation due to your sunk costs.

              Email me if you need further explanation.

            • Lee Romero
              Wow, Jim! That s some great insight and I really appreciate you taking the time to write it up. Hopefully others on the list will also benefit from your
              Message 6 of 14 , Apr 8, 2008
                Wow, Jim! That's some great insight and I really appreciate you
                taking the time to write it up.

                Hopefully others on the list will also benefit from your insights :-)

                A couple questions for you based on your insights:

                * Your evaluation was for an intranet search (not a public site),
                correct? Does your intranet require a login and did you look at how
                the Google Search Appliance works in that environment (requiring
                form-based authentication)?

                * When you say "domain" (as in your comment, "could only handle one
                domain/security model per appliance." or your comment "To handle
                multiple domains probably require more appliances") - what is a
                "domain" to the GSA? Is abc.acme.com different from xyz.acme.com,
                which is different from abc.xyz.acme.com? Or does it mean the
                top-level domain being different (so each of those examples would be
                in the same domain)?

                * Related to the previous question - when you describe to GSA what to
                index, is that done by specifically telling it domains to index or is
                there any kind of "wild-carding" of domains to index? In other words,
                if I want to index abc.acme.com and also xyz.acme.com, can I do that
                on the same appliance by configuring it to index *.acme.com or do I
                need to tell it each specific domain to look at?


                While I'm not currently specifically looking at Autonomy - some
                clarifications on that as well:

                * When you say Autonomy has "an enterprise license model" - you mean
                to differentiate it from the per-document licensing of Google (so you
                pay for the software and then can index however much content you need
                to)?

                * When you say, "Understands various URL formats are the same URL" -
                does Autonomy even recognize that two query strings with the same
                parameters but in a different order are the same URL?


                Thanks again, Jim - much appreciated.

                Lee

                On Tue, Apr 8, 2008 at 12:37 PM, Jim <jim.smith@...> wrote:
                > Lee
                > The company I work for recently completed a fairly intensive search
                > analysis with an RFQ and then a POC installation at our site to
                > compare the final two search companies for an enterprise search
                > solution. The final companies analyzed were Autonomy and Google.
                > Here are my observation and opinion ( The Good, The Bad and The
                > Ugly) . This is not a complete list and I can give you more time if
                > you want to discuss this on a call . Sorry My examples may not make
                > sense because of the industry I am working in but I didn't have the
                > time to generalize these. I'll try and give a consistent comparison.
              • Tim
                ... Our Autonomy licensing model is pretty much unlimited if my memory serves me correctly. I think the only restriction was we can t use the technology to
                Message 7 of 14 , Apr 8, 2008
                  > * When you say Autonomy has "an enterprise license model" - you mean
                  > to differentiate it from the per-document licensing of Google (so you
                  > pay for the software and then can index however much content you need
                  > to)?

                  Our Autonomy licensing model is pretty much unlimited if my memory
                  serves me correctly. I think the only restriction was we can't use
                  the technology to create our own Google or Yahoo using there
                  technology (which it's not designed for anyway).

                  > * When you say, "Understands various URL formats are the same URL" -
                  > does Autonomy even recognize that two query strings with the same
                  > parameters but in a different order are the same URL?

                  I'm not sure this is true exactly as stated. There are a lot of
                  things you can do to tweak URLs during the Autonomy fetch (aka,
                  spider) process. Ultimately, the URL is like the unique identifier
                  for each search record. So it can be easy to generate duplicates.

                  For example, our Plumtree portal generated a lot of duplicate search
                  records based on permutations of the same URL. Our Jahia portal also
                  generated duplicates because it passes "session state" information in
                  the URL.

                  Autonomy gives you crude tools to configure around these one-offs as
                  you discover them. One such tool is to use a completely different
                  technique, a checksum value that actually compares the content on each
                  page. This feature instantly solves the URL issue but introduces
                  another issue such as a dynamic date on a web page then generates
                  duplicates. (The story of my life as a search analyst! Fix one
                  problem and generate two more)

                  Tim
                • Jim
                  Lee , I ll respond to your question in context below (My background is Mechanical engineering, sales. I moved into the Web in 1996 and have focused on
                  Message 8 of 14 , Apr 9, 2008

                    Lee , I'll respond to your question in context below (My background is Mechanical engineering, sales.  I moved into the Web in 1996 and have focused on WCM/CM/KM meta data and taxonomies and search since then.  I only understand the concepts and approaches and not the full details of the IT side of search.  Again these are  my observation and notes I gleaned from the other technical analysis.  Above is only the main points their are other nuances to the Google product so my understanding of any IT issue can be discounted if someone has a better understanding.  All I can respond to is my information that was gather 12/2007 - February /2008 so it's as fresh as you can find today.

                    --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@...> wrote:
                    >
                    > Wow, Jim! That's some great insight and I really appreciate you
                    > taking the time to write it up.  ......

                    >
                    > * Your evaluation was for an intranet search (not a public site),
                     correct? (YES)

                    >Does your intranet require a login (YES) ..

                    >and did you look at how  the Google Search Appliance works in that environment (requiring
                    > form-based authentication)? (YES) , In the POC they were able to handle our SiteMinder forms based authentication, and they can handle other types of security, what they cannot handle more than one security model per appliance.  They may have this resolved by the end of the year.  But they still do a post security test instead of at index time so if there is  a lot of authorization within your portal then that will definitely impact search response speed.


                    > * When you say "domain" (as in your comment, "could only handle one
                    > domain/security model per appliance." or your comment "To handle
                    > multiple domains probably require more appliances") -I met top level domain.  For instance if the company has 3 active  top levels domains e.g. AAA.com  and AAAWI.net and  AmericanAutomobileAssociation.com  then content in each domain will require another appliance.  
                    > "domain" to the GSA? Is abc.acme.com different from xyz.acme.com,
                    > which is different from abc.xyz.acme.com? Or does it mean the
                    > top-level domain being different (so each of those examples would be
                    > in the same domain)?
                    >
                    > * Related to the previous question - when you describe to GSA what to
                    > index, is that done by specifically telling it domains to index or is
                    > there any kind of "wild-carding" of domains to index? In other words,
                    > if I want to index abc.acme.com and also xyz.acme.com, can I do that
                    > on the same appliance by configuring it to index *.acme.com or do I
                    > need to tell it each specific domain to look at?  Actually it a simple spider so if you link off to these various domains and tell it to do a full index it will crawl them.  It has to find them so you would at least have to provide it a starting point on each server. I assume you would just give it your DNS list if you have hundreds of domains you want to index.  That's not my area of expertise. 


                    >
                    >
                    > While I'm not currently specifically looking at Autonomy - some
                    > clarifications on that as well:
                    >
                    > * When you say Autonomy has "an enterprise license model" - you mean
                    > to differentiate it from the per-document licensing of Google (so you
                    > pay for the software and then can index however much content you need
                    > to)?Yes we can internally and externally crawl our current domains and we have an unlimited amount of documents and our license for user was set high enough so we don't have to be concerned for the next 5 years.  In an enterprise license you are moving from ala carte to a bundled solution so everything has room for negotiations.
                    >
                    > * When you say, "Understands various URL formats are the same URL" -
                    > does Autonomy even recognize that two query strings with the same
                    > parameters but in a different order are the same URL?  If it's indexing a query I'm not sure.  But it does understand standard URLs when they may be written in different styles but still work because you are working with an IIS server.
                    >
                    >
                    > Thanks again, Jim - much appreciated.
                    >
                    > Lee
                    >
                    > On Tue, Apr 8, 2008 at 12:37 PM, Jim jim.smith@... wrote:
                    > > Lee
                    > > The company I work for recently completed a fairly intensive search
                    > > analysis with an RFQ and then a POC installation at our site to
                    > > compare the final two search companies for an enterprise search
                    > > solution. The final companies analyzed were Autonomy and Google.
                    > > Here are my observation and opinion ( The Good, The Bad and The
                    > > Ugly) . This is not a complete list and I can give you more time if
                    > > you want to discuss this on a call . Sorry My examples may not make
                    > > sense because of the industry I am working in but I didn't have the
                    > > time to generalize these. I'll try and give a consistent comparison.
                    >

                  • Walter Underwood
                    With Autonomy, were you using HTTP Fetch or Ultra Spider? Ultra is a much, much smarter spider. wunder
                    Message 9 of 14 , Apr 9, 2008
                      Re: [SearchCoP] Re: Google search appliance With Autonomy, were you using HTTP Fetch or Ultra Spider? Ultra is a much, much smarter spider.

                      wunder

                      On 4/9/08 9:36 AM, "Jim" <jim.smith@...> wrote:


                       

                      Lee , I'll respond to your question in context below (My background is Mechanical engineering, sales.  I moved into the Web in 1996 and have focused on WCM/CM/KM meta data and taxonomies and search since then.  I only understand the concepts and approaches and not the full details of the IT side of search.  Again these are  my observation and notes I gleaned from the other technical analysis.  Above is only the main points their are other nuances to the Google product so my understanding of any IT issue can be discounted if someone has a better understanding.  All I can respond to is my information that was gather 12/2007 - February /2008 so it's as fresh as you can find today.
                      --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@...> wrote:
                      >
                      > Wow, Jim! That's some great insight and I really appreciate you
                      > taking the time to write it up.  ......

                      >
                      > * Your evaluation was for an intranet search (not a public site),
                       correct? (YES)

                      >Does your intranet require a login (YES) ..

                      >and did you look at how  the Google Search Appliance works in that environment (requiring
                      > form-based authentication)? (YES) , In the POC they were able to handle our SiteMinder forms based authentication, and they can handle other types of security, what they cannot handle more than one security model per appliance.  They may have this resolved by the end of the year.  But they still do a post security test instead of at index time so if there is  a lot of authorization within your portal then that will definitely impact search response speed.


                      > * When you say "domain" (as in your comment, "could only handle one
                      > domain/security model per appliance." or your comment "To handle
                      > multiple domains probably require more appliances") -I met top level domain.  For instance if the company has 3 active  top levels domains e.g. AAA.com  and AAAWI.net and  AmericanAutomobileAssociation.com  then content in each domain will require another appliance.  
                      > "domain" to the GSA? Is abc.acme.com different from xyz.acme.com,
                      > which is different from abc.xyz.acme.com? Or does it mean the
                      > top-level domain being different (so each of those examples would be
                      > in the same domain)?
                      >
                      > * Related to the previous question - when you describe to GSA what to
                      > index, is that done by specifically telling it domains to index or is
                      > there any kind of "wild-carding" of domains to index? In other words,
                      > if I want to index abc.acme.com and also xyz.acme.com, can I do that
                      > on the same appliance by configuring it to index *.acme.com or do I
                      > need to tell it each specific domain to look at?  Actually it a simple spider so if you link off to these various domains and tell it to do a full index it will crawl them.  It has to find them so you would at least have to provide it a starting point on each server. I assume you would just give it your DNS list if you have hundreds of domains you want to index.  That's not my area of expertise.  


                      >
                      >
                      > While I'm not currently specifically looking at Autonomy - some
                      > clarifications on that as well:
                      >
                      > * When you say Autonomy has "an enterprise license model" - you mean
                      > to differentiate it from the per-document licensing of Google (so you
                      > pay for the software and then can index however much content you need
                      > to)?Yes we can internally and externally crawl our current domains and we have an unlimited amount of documents and our license for user was set high enough so we don't have to be concerned for the next 5 years.  In an enterprise license you are moving from ala carte to a bundled solution so everything has room for negotiations.
                      >
                      > * When you say, "Understands various URL formats are the same URL" -
                      > does Autonomy even recognize that two query strings with the same
                      > parameters but in a different order are the same URL?  If it's indexing a query I'm not sure.  But it does understand standard URLs when they may be written in different styles but still work because you are working with an IIS server.
                      >
                      >
                      > Thanks again, Jim - much appreciated.
                      >
                      > Lee
                      >
                      > On Tue, Apr 8, 2008 at 12:37 PM, Jim jim.smith@... wrote:
                      > > Lee
                      > > The company I work for recently completed a fairly intensive search
                      > > analysis with an RFQ and then a POC installation at our site to
                      > > compare the final two search companies for an enterprise search
                      > > solution. The final companies analyzed were Autonomy and Google.
                      > > Here are my observation and opinion ( The Good, The Bad and The
                      > > Ugly) . This is not a complete list and I can give you more time if
                      > > you want to discuss this on a call . Sorry My examples may not make
                      > > sense because of the industry I am working in but I didn't have the
                      > > time to generalize ! these. I 'll try and give a consistent comparison.
                      >
                       

                          


                    • Jim
                      I believe we will be moving top Ultra spider for the http content. ... a much, ... background is ... have focused ... only ... of the IT side ... from the
                      Message 10 of 14 , Apr 10, 2008
                        I believe we will be moving top Ultra spider for the http content.

                        --- In SearchCoP@yahoogroups.com, Walter Underwood <wunderwood@...>
                        wrote:
                        >
                        > With Autonomy, were you using HTTP Fetch or Ultra Spider? Ultra is
                        a much,
                        > much smarter spider.
                        >
                        > wunder
                        >
                        > On 4/9/08 9:36 AM, "Jim" <jim.smith@...> wrote:
                        >
                        > >
                        > >
                        > >
                        > > Lee , I'll respond to your question in context below (My
                        background is
                        > > Mechanical engineering, sales. I moved into the Web in 1996 and
                        have focused
                        > > on WCM/CM/KM meta data and taxonomies and search since then. I
                        only
                        > > understand the concepts and approaches and not the full details
                        of the IT side
                        > > of search. Again these are my observation and notes I gleaned
                        from the other
                        > > technical analysis. Above is only the main points their are
                        other nuances to
                        > > the Google product so my understanding of any IT issue can be
                        discounted if
                        > > someone has a better understanding. All I can respond to is my
                        information
                        > > that was gather 12/2007 - February /2008 so it's as fresh as you
                        can find
                        > > today.
                        > > --- In SearchCoP@yahoogroups.com, "Lee Romero" <pekadad@> wrote:
                        > >> >
                        > >> > Wow, Jim! That's some great insight and I really appreciate you
                        > >> > taking the time to write it up. ......
                        > >
                        > >> >
                        > >> > * Your evaluation was for an intranet search (not a public
                        site),
                        > > correct? (YES)
                        > >
                        > >> >Does your intranet require a login (YES) ..
                        > >
                        > >> >and did you look at how the Google Search Appliance works in
                        that
                        > >> environment (requiring
                        > >> > form-based authentication)? (YES) , In the POC they were able
                        to handle our
                        > >> SiteMinder forms based authentication, and they can handle other
                        types of
                        > >> security, what they cannot handle more than one security model
                        per appliance.
                        > >> They may have this resolved by the end of the year. But they
                        still do a post
                        > >> security test instead of at index time so if there is a lot of
                        authorization
                        > >> within your portal then that will definitely impact search
                        response speed.
                        > >
                        > >
                        > >> > * When you say "domain" (as in your comment, "could only
                        handle one
                        > >> > domain/security model per appliance." or your comment "To
                        handle
                        > >> > multiple domains probably require more appliances") -I met top
                        level
                        > >> domain. For instance if the company has 3 active top levels
                        domains e.g.
                        > >> AAA.com and AAAWI.net and AmericanAutomobileAssociation.com
                        then content
                        > >> in each domain will require another appliance.
                        > >> > "domain" to the GSA? Is abc.acme.com different from
                        xyz.acme.com,
                        > >> > which is different from abc.xyz.acme.com? Or does it mean the
                        > >> > top-level domain being different (so each of those examples
                        would be
                        > >> > in the same domain)?
                        > >> >
                        > >> > * Related to the previous question - when you describe to GSA
                        what to
                        > >> > index, is that done by specifically telling it domains to
                        index or is
                        > >> > there any kind of "wild-carding" of domains to index? In other
                        words,
                        > >> > if I want to index abc.acme.com and also xyz.acme.com, can I
                        do that
                        > >> > on the same appliance by configuring it to index *.acme.com or
                        do I
                        > >> > need to tell it each specific domain to look at? Actually it
                        a simple
                        > >> spider so if you link off to these various domains and tell it
                        to do a full
                        > >> index it will crawl them. It has to find them so you would at
                        least have to
                        > >> provide it a starting point on each server. I assume you would
                        just give it
                        > >> your DNS list if you have hundreds of domains you want to
                        index. That's not
                        > >> my area of expertise.
                        > >
                        > >
                        > >> >
                        > >> >
                        > >> > While I'm not currently specifically looking at Autonomy - some
                        > >> > clarifications on that as well:
                        > >> >
                        > >> > * When you say Autonomy has "an enterprise license model" -
                        you mean
                        > >> > to differentiate it from the per-document licensing of Google
                        (so you
                        > >> > pay for the software and then can index however much content
                        you need
                        > >> > to)?Yes we can internally and externally crawl our current
                        domains and we
                        > >> have an unlimited amount of documents and our license for user
                        was set high
                        > >> enough so we don't have to be concerned for the next 5 years.
                        In an
                        > >> enterprise license you are moving from ala carte to a bundled
                        solution so
                        > >> everything has room for negotiations.
                        > >> >
                        > >> > * When you say, "Understands various URL formats are the same
                        URL" -
                        > >> > does Autonomy even recognize that two query strings with the
                        same
                        > >> > parameters but in a different order are the same URL? If it's
                        indexing a
                        > >> query I'm not sure. But it does understand standard URLs when
                        they may be
                        > >> written in different styles but still work because you are
                        working with an
                        > >> IIS server.
                        > >> >
                        > >> >
                        > >> > Thanks again, Jim - much appreciated.
                        > >> >
                        > >> > Lee
                        > >> >
                        > >> > On Tue, Apr 8, 2008 at 12:37 PM, Jim jim.smith@ wrote:
                        > >>> > > Lee
                        > >>> > > The company I work for recently completed a fairly
                        intensive search
                        > >>> > > analysis with an RFQ and then a POC installation at our
                        site to
                        > >>> > > compare the final two search companies for an enterprise
                        search
                        > >>> > > solution. The final companies analyzed were Autonomy and
                        Google.
                        > >>> > > Here are my observation and opinion ( The Good, The Bad and
                        The
                        > >>> > > Ugly) . This is not a complete list and I can give you more
                        time if
                        > >>> > > you want to discuss this on a call . Sorry My examples may
                        not make
                        > >>> > > sense because of the industry I am working in but I didn't
                        have the
                        > >>> > > time to generalize ! these. I 'll try and give a consistent
                        comparison.
                        > >> >
                        > >
                        > >
                        > >
                        > >
                        >
                      • Tim
                        So what is Ultra Spider? Is it the latest version of the HTTP fetch with the KeyView technology or is this a separate product? Tim
                        Message 11 of 14 , Apr 10, 2008
                          So what is Ultra Spider? Is it the latest version of the HTTP fetch
                          with the KeyView technology or is this a separate product?

                          Tim

                          --- In SearchCoP@yahoogroups.com, "Jim" <jim.smith@...> wrote:
                          >
                          > I believe we will be moving top Ultra spider for the http content.
                          >
                          > --- In SearchCoP@yahoogroups.com, Walter Underwood <wunderwood@>
                          > wrote:
                          > >
                          > > With Autonomy, were you using HTTP Fetch or Ultra Spider? Ultra is
                          > a much,
                          > > much smarter spider.
                        • Walter Underwood
                          Three or four years ago at Verity, I adapted the Ultraseek spider/search engine to be spider-only and to submit batches to K2. That product was called Ultra
                          Message 12 of 14 , Apr 10, 2008
                            Re: [SearchCoP] Re: Google search appliance Three or four years ago at Verity, I adapted the Ultraseek spider/search engine to be spider-only and to submit batches to K2. That product was called "Ultra Spider" and effectively replaced vspider and K2 Spider.

                            More recently, Ultra Spider has been IDOLized, and can talk to the Autonomy indexer. That happened after I left, so I don't have details. I don't see it mentioned on the Autonomy website, but Autonomy has never been big on putting lots of info on their website.

                            Ultra Spider is a completely different code base than HTTP Fetch.

                            wunder
                            ==
                            Walter Underwood
                            Former Ultraseek Architect

                            On 4/10/08 11:20 AM, "Tim" <tbwendt@...> wrote:

                            So what is Ultra Spider?  Is it the latest version of the HTTP fetch
                            with the KeyView technology or is this a separate product?

                            Tim

                          • Jim
                            Tim and Walter, I have some contacts at Autonomy. I ll try and get the current information and a better understanding and then get back to you. Jim
                            Message 13 of 14 , Apr 17, 2008
                              Tim and Walter,

                              I have some contacts at Autonomy. I'll try and get the current
                              information and a better understanding and then get back to you.
                              Jim
                            Your message has been successfully submitted and would be delivered to recipients shortly.