Loading ...
Sorry, an error occurred while loading the content.

[TaxoCoP] Re: Single subject domain?

Expand Messages
  • Janice M Herd
    Software does a huge number of things. One of them is to integrate taxonomies (automated or humanly created) with databases, Web sites, etc. Well-formed
    Message 1 of 21 , Apr 22, 2005
    • 0 Attachment
      Software does a huge number of things.
      One of them is to integrate taxonomies (automated or humanly created) with databases, Web sites, etc.
      Well-formed hierarchical taxonomies in any subject area must be carefully created just as you described "should" be done in the computer science/programming field.
      If you are interested in this area perhaps you would like to attend a session on
      Metadata and Enterprise Architecture http://www.montague.com/roundtable24.htm
      Jan

      >>> dbe@... 4/22/05 11:28:00 AM >>>


      Jan

      >
      > in financial systems you may have need to cover "stocks" and
      > "bonds" but will never use the definition of
      > soup stock
      > arms and legs in stocks
      > stockyards
      > she takes stock of her wardrobe
      > etc.
      >

      I beg to differ... while in an ideal world one SHOULD not mix
      stockyard & financial terminology together, inside a software
      system, there is no inherent mechanism to prevent such happening.

      May I safely assume you've never written software for a living or
      managed a software project?

      One of the major, major difficulties in software is that is is
      essentially a language written for non-human consumption.

      As a librarian you're accustomed to dealing with stuff written &
      organized with the intent of it being consumed (read) by humans.

      Software is written to be read by a machine (the compiler).


      Example:

      In computer software, these 3 statements are functionally identical.

      a = b * c

      weeklyPay = hoursWorked * payRate

      Bobbie = Fred * Fido



      One of the known most productive & effective means of improving
      the quality of software is for the organization to RELIGIOUSLY have
      formal peer reviews.

      Unfortunately it is an extremely thinly practiced activity. I've
      never seen any statistics on how many organizations ALWAYS
      subject their software to peer reviews, but I'd be more than happy
      to bet way under 5%

      This lack of visibility/review/edit by other humans leads to all sorts
      of ugly behavior. Since software is fundamentally invisible, who
      cares that it's unreadable & incomprehensible... certainly not
      mangement.

      Management is reading the HBR article about how irrelevant systems
      are.

      - David







      Yahoo! Groups Links
      To visit your group on the web, go to:
      http://groups.yahoo.com/group/TaxoCoP/
      To unsubscribe from this group, send an email to:
      TaxoCoP-unsubscribe@yahoogroups.com
      Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
    • David Eddy
      Seth - ... I certainly did not intend to be contentious... argumentative, yes, but not contentious. The point I was trying to make is the entirely non-human
      Message 2 of 21 , Apr 22, 2005
      • 0 Attachment
        Seth -

        >
        > Also, though I don't think you mean it this way, the statement
        > " you've never written software... " etc might be taken in a
        > contentious way.
        >
        > Janice's point is from the perspective of humans. When developing
        > taxonomies, we need to limit the scope of knowledge to what is
        > appropriate for the audience and their tasks.
        >

        I certainly did not intend to be contentious... argumentative, yes,
        but not contentious.

        The point I was trying to make is the entirely non-human aspect of
        software is one of the major reasons we've dug this hole so deep.

        I certainly hear people grinding their teeth in frustration since
        they're drowning in "digital stuff."

        Somehow we're going to have to use software in a different way to help
        climb out of the hole. (I hope)

        To the best of my knowledge systems have not been written with any
        sort of taxonomy in consideration... so we're going to have to
        retro-fit around existing systems.

        So I want to make sure that people very quickly loose any sort of
        assumption that meaningful words are used inside software systems.


        Also there's the context of my having found that in the domain of
        "document management", software code is not considered to be a
        document!

        Hmmm... software code is one of the primary machine tools of the 21st
        century & we ignore it? This I do not grok.

        - David
      • Kathleen.a.ellis@att.net
        David, In my experience there were times when the software types and librarians did a better job of working together. At one time, I worked for a couple of
        Message 3 of 21 , Apr 22, 2005
        • 0 Attachment

          David,

           

          In my experience there were times when the software types and librarians did a better job of working together.  At one time, I worked for a couple of government agencies that built huge repositories of bibliographic information that cataloged their research. The software types had to communicate with the library types to build systems that could be offered as indexes of research to the public. That isn't to say that the systems were perfect or that the two professions always got along but they certainly cooperated enough to do the job. Our organization used COSATI as the cataloging standard and added keywords that were based on a subject thesaurus. That was the world of mainframes and structured databases. Organizations such as NASA, DOE and DoD are still supporting some form of these databases that started as printed indexes.

           

          Today, we live in the world of web-based computing and unstructured data. Companies accumulate gigabytes of reports, spreadsheets, PowerPoint presentations, etc. This data rarely has any meta data or cataloging attached to it and is often stored in file folders on company servers or even personal hard drives. Today, the problem we face is not how to properly fill the fields of a well structured database, but how to find the this amorphous mass of company data and then force some kind of structure over it so that it can be retrieved and reused. The software types have given us crawlers to sift through the file servers, websites, and company databases. These crawlers reduce these documents into words and tease the meaning of the documents from verb and noun phrases. Then they classify the documents based a clustering, Bayesian or semantic analysis to provide some order or classification to the documents. This application of technology works to some extent, but it is the librarians with their taxonomies, thesauri and other structured word lists that are still providing the organization and language that makes the software technologies work better and brings the information to the user. As professionals, the software types and library types are still trying to work together. The connection is just not as obvious and as in our company, the technical or software people keep trying to go off on their own. The technology may eventually eliminate the need for the librarian's intervention, but it isn't there yet.

           

          Kathy Ellis

          -------------- Original message from "David Eddy" <dbe@...>: --------------



          Seth -

          Thanks for bringing this topic to "public" discussion.

          I'm quite new to this arena under the au courant labels of taxonomy,
          ontology & semantics.

          As a software type, it (unfortunately) recently dawned on me that
          perhaps the folks over in libraries MIGHT have some useful
          experience on how to organize information so that others can
          find what they need.

          As far as I can tell, in my time in this trade, there has been
          minimal cross-fertilization between librarianship & organizing
          software/data on computers.  Why there is such a chasm between
          these two professions remains a mystery to me.

          What we primarily have today is a huge mess in systems... stuff that
          has been thrown together at a furious pace over the past several
          decades, with minimal consideration or though as to how the
          stuff/systems/data should be organized in the greater whole.

          I see the challenge of taxonomy (unfortunately my grasp of these
          precise meanings is so limited that I include ontology & semantics
          in the same fuzzy cloud) is to somehow retrofit  around existing
          legacy systems.

          Promises to be an interesting journey.






        • David Eddy
          Seth - ... Point I m trying to emphasize is by the time a business solution is expressed in working software, tremendous amounts of context has been stripped
          Message 4 of 21 , Apr 23, 2005
          • 0 Attachment
            Seth -

            >
            > Your point is that many times people write code without human
            > context. In our role of developing and applying meaningful systems
            > for classification, we need to keep that context and thus limit the
            > domains of knowledge as Janice described.
            >

            Point I'm trying to emphasize is by the time a business solution is
            expressed in working software, tremendous amounts of context has
            been stripped out.

            I'm hoping that this rising interest in taxonomy/ontology/semantics
            (while I dimly acknowledge these are different things, I do lump all
            three into the same fuzzy cloud in my brain... rather like keeping
            "nut" and "bolt" together mentally even when I do use them
            incorrectly) will provide much needed flexibility in dealing with
            systems.

            The base business/social problem is that software systems are very
            difficult to understand, & I'm expecting that wrapping taxonomy
            around existing systems (remember "rip & replace" is NOT an option
            anymore) will provide benefits to many players.

            If this taxonomy process is to be done while ignoring the very real
            limits of EXISTING software systems then I don't see the point.

            I largely see "reality" (whatever that is) thru a lens of history...
            which is about 99.9% chaotic. Humans--bless their perpetually
            optimistic souls--seem to have an inherent need to throw a veneer
            of organization around the chaos of life... and I think flexible
            taxonomies should be much more widely accessible to help give
            us a satisfying illusion of order in an inherently chaotic world.

            - David
          • David Eddy
            Kathy - ... Strong agreement. You point at multiple significant issues. Coming from the allegedly structured side of things, software, I am not at all
            Message 5 of 21 , Apr 23, 2005
            • 0 Attachment
              Kathy -

              >
              > Today, the problem we face is not how to properly fill the fields of
              > a well structured database, but how to find the this amorphous mass
              > of company data and then force some kind of structure over it so
              > that it can be retrieved and reused.
              >

              Strong agreement.

              You point at multiple significant issues.

              Coming from the allegedly structured side of things, software, I am
              not at all comfortable with what seems to be the unchallenged mantra
              in the document management realm, which to my ear seems to assume
              that docments are inherently unstructured & therefore difficult &
              software/databases are structured & therefore "easy."

              I see software as the base machine tool of the 21st century that
              produces most other documents... and as far as I can tell software
              (the source code/progamming languages themselves) is NOT considered
              to be a "document" worthy of calling a document.

              I am completely baffled by this assumption.

              Is it because non-software folks ASSUME software is inherently
              organized & structured? <peels of raucus laughter!!!>

              Is it because I've had it up to my eyeballs with the chaotic nature of
              software & am able to delude myself that those wise folks using
              taxonomies in libraries know how to retrofit order to native chaos?


              What I currently embrace is the belief (fed by experience) that
              systems--in order to be more flexible--MUST have some additional
              organizing net thrown around them (the systems).

              I do know that when I was actively writing systems, if I'd said:
              "STOP! ...I'm not writing any more code until we have a taxonomy."
              I'd have been shown the door VERY promptly.

              I'm hoping that taxonomy/ontology/semantics is part of that "we gotta
              be better organized" effort. And it has to be somehow done to the
              systems that run our society & were most decidedly NOT built with any
              consideration to having an organized taxonomy.

              - David
            • kathy_a_ellis
              Dave, I m sure that I m missing the point of your concern but would like to understand what your saying. I ve never written a software program and I don t
              Message 6 of 21 , Apr 23, 2005
              • 0 Attachment
                Dave,

                I'm sure that I'm missing the point of your concern but would like to
                understand what your saying. I've never written a software program
                and I don't understand how a taxonomy could be used to enhance the
                organization of a software program? Does it have something to do with
                the notations or documentation that are included within programs that
                explain what you are doing and why?

                Kathy

                --- In TaxoCoP@yahoogroups.com, "David Eddy" <dbe@j...> wrote:
                >
                >
                > Kathy -
                >
                > >
                > > Today, the problem we face is not how to properly fill the fields
                of
                > > a well structured database, but how to find the this amorphous
                mass
                > > of company data and then force some kind of structure over it so
                > > that it can be retrieved and reused.
                > >
                >
                > Strong agreement.
                >
                > You point at multiple significant issues.
                >
                > Coming from the allegedly structured side of things, software, I am
                > not at all comfortable with what seems to be the unchallenged mantra
                > in the document management realm, which to my ear seems to assume
                > that docments are inherently unstructured & therefore difficult &
                > software/databases are structured & therefore "easy."
                >
                > I see software as the base machine tool of the 21st century that
                > produces most other documents... and as far as I can tell software
                > (the source code/progamming languages themselves) is NOT considered
                > to be a "document" worthy of calling a document.
                >
                > I am completely baffled by this assumption.
                >
                > Is it because non-software folks ASSUME software is inherently
                > organized & structured? <peels of raucus laughter!!!>
                >
                > Is it because I've had it up to my eyeballs with the chaotic nature
                of
                > software & am able to delude myself that those wise folks using
                > taxonomies in libraries know how to retrofit order to native chaos?
                >
                >
                > What I currently embrace is the belief (fed by experience) that
                > systems--in order to be more flexible--MUST have some additional
                > organizing net thrown around them (the systems).
                >
                > I do know that when I was actively writing systems, if I'd said:
                > "STOP! ...I'm not writing any more code until we have a
                taxonomy."
                > I'd have been shown the door VERY promptly.
                >
                > I'm hoping that taxonomy/ontology/semantics is part of that "we
                gotta
                > be better organized" effort. And it has to be somehow done to the
                > systems that run our society & were most decidedly NOT built with
                any
                > consideration to having an organized taxonomy.
                >
                > - David
              • David Eddy
                Kathy - ... Individual programs are a pain ... but it s when you start dealing with the huge numbers of systems, programs & program components that some sort
                Message 7 of 21 , Apr 23, 2005
                • 0 Attachment
                  Kathy -

                  >
                  > I've never written a software program
                  > and I don't understand how a taxonomy could be used to enhance the
                  > organization of a software program? Does it have something to do
                  > with the notations or documentation that are included within
                  > programs that explain what you are doing and why?
                  >

                  Individual programs are a pain ... but it's when you start dealing
                  with the huge numbers of systems, programs & program components that
                  some sort of accessible "taxonomy" would be very useful.

                  I first became aware of the nature of this communications/vocabulary
                  problem at an insurance company that had found 70 different names for
                  the concept of "policy number." Two that I remember were: M0101 and
                  MSTR-POL-NO. "Words" like that is what makes software so difficult to
                  understand.

                  A basic issue is that by the time a business problem is expressed in
                  software, virtually all the original intent & context has been
                  stripped out. Then factor in time & many different people with
                  different motivations & skill levels wander thru & maintaining
                  the code. It's gibberish.

                  In more normal documents there are typically lots of contextual
                  clues from surrounding text for what a particular word or phrase
                  means. Typically there is very little accurate, useful
                  human-sensible documentation in code.


                  Here's one example... a friend told me of a system that he had to
                  work on where the original programmer/designer/builder used names
                  from his own family genealogy to indicate the hierarchy of what was
                  going on in the system.


                  If there were a vocabulary roadmap--which is what I envision a
                  taxonomy being--it would really, really help in making systems
                  easier to maintain.


                  BUT... a word of caution... this means there will need to be MANY
                  taxonomies in a business. Sometimes such taxonomies can be blissfully
                  ignorant of others... and at other places, overlapping taxonomies
                  are going to have to be mapped to each other.

                  BUT any kind of delusion of THE grand taxonomy in the sky that does
                  all, knows all, etc. is to be avoided, since I do not believe it to
                  be possible.

                  Does this make sense?

                  - David
                • librarianjoshua
                  David, I ve read of many in the AI realm working to develop the ideal world you describe, personalized agents that are able to navigate across taxonomies to
                  Message 8 of 21 , Apr 25, 2005
                  • 0 Attachment
                    David,

                    I've read of many in the AI realm working to develop the "ideal
                    world" you describe, personalized agents that are able to navigate
                    across taxonomies to cull information in response to a query. This
                    would be a kind of auto-researcher that would be on the lookout for
                    new and relevant information 24/7. Very cool indeed!

                    Referring back to your first message and the librarian/software
                    developer chasm, it does seem a bit odd to me that software folks
                    don't interact more with librarians for we so often work with
                    researchers and understand the process (and hard work) it takes
                    getting them to express their information needs. The, "Gee... I don't
                    quite understand when you just said XYZ..." comment of yours is very
                    apt and is a part of the librarian's reference interview process. I
                    would think Google would be all over us for our experience and
                    seeking ways to automate that process. I recently came across a great
                    comparison of agents and librarians that perhaps you'd be interested
                    in taking a look at:

                    http://www.firstmonday.dk/issues/issue5_5/zick/index.html#z1

                    As a software developer, why do you think your group doesn't reach
                    out more to the library community? I know many in the library world
                    that love technology and take any/every opportunity (lectures, demos,
                    email) to learn new stuff and meet the technologists that create the
                    stuff. How could we as librarians make ourselves better known in your
                    circles?

                    -Joshua


                    --- In TaxoCoP@yahoogroups.com, "David Eddy" <dbe@j...> wrote:
                    >
                    >
                    > --- In TaxoCoP@yahoogroups.com, "David Eddy" <dbe@j...> wrote:
                    >
                    > >
                    > > Assuming your a professionally trained librarian, do you have any
                    > > speculations as to why there is such a chasm between librarian-
                    ship
                    > > and software people?
                    > >
                    >
                    > I ***HATE*** it when those things called fingers at the end of my
                    arms
                    > produce "your" when I was thinking "you're." <hrumph!>
                    >
                    >
                    > Which is a wonderful example of one challenge that Google seems to
                    > make a dent in... when you search for a term & mizpell it, Google
                    will
                    > push back ever so gently & say "Did you mean....misspell?"
                    >
                    > In my ideal world, I'd have a taxonomy on my computer that has
                    > learned the vocabulary I use and the explicit meanings I want
                    > for those words. Just because a word may have 18 meanings in a
                    > dictionary is not to say I use, much less know all such possible
                    > meanings.
                    >
                    > ....so my taxonomy can negotiate with "your" taxonomy.
                    >
                    > Humans do it all the time... constantly listening for out of context
                    > words... "Gee... I don't quite understand when you just said XYZ...
                    > did you mean <offers some personal contextual image>?"
                    >
                    >
                    > A more direct example would be... say I'm working on a software
                    > system system in an insurance company. Therefore "policy number"
                    > will be a concept that I have to pay attention to. Given that I'd
                    > rather think of myself as Java hotshot & not particularly interested
                    > in the business of insurance, I'm going to need some assistance in
                    > learning/finding that I should also be paying attention to "contract
                    > id."
                    >
                    > - David
                  • kathy_a_ellis
                    I think that I m beginning to understand. In the library world, part of this problem would be handled by some type of kind of standard. ... blissfully ... Dave
                    Message 9 of 21 , Apr 27, 2005
                    • 0 Attachment
                      I think that I'm beginning to understand. In the library world, part
                      of this problem would be handled by some type of kind of standard.

                      >BUT... a word of caution... this means there will need to be MANY
                      > taxonomies in a business. Sometimes such taxonomies can be
                      blissfully
                      > ignorant of others... and at other places, overlapping taxonomies
                      > are going to have to be mapped to each other.

                      Dave Clarke talked about two taxonomy solutions that he called Meta-
                      Vocabulary Cluster and Umbrella & Silo Taxonomies, which seem to meet
                      your description. Did you attend the Advanced Taxonomy Webinar?

                      --- In TaxoCoP@yahoogroups.com, "David Eddy" <dbe@j...> wrote:
                      >
                      >
                      > Kathy -
                      >
                      > >
                      > > I've never written a software program
                      > > and I don't understand how a taxonomy could be used to enhance
                      the
                      > > organization of a software program? Does it have something to do
                      > > with the notations or documentation that are included within
                      > > programs that explain what you are doing and why?
                      > >
                      >
                      > Individual programs are a pain ... but it's when you start dealing
                      > with the huge numbers of systems, programs & program components that
                      > some sort of accessible "taxonomy" would be very useful.
                      >
                      > I first became aware of the nature of this communications/vocabulary
                      > problem at an insurance company that had found 70 different names
                      for
                      > the concept of "policy number." Two that I remember were: M0101 and
                      > MSTR-POL-NO. "Words" like that is what makes software so difficult
                      to
                      > understand.
                      >
                      > A basic issue is that by the time a business problem is expressed in
                      > software, virtually all the original intent & context has been
                      > stripped out. Then factor in time & many different people with
                      > different motivations & skill levels wander thru & maintaining
                      > the code. It's gibberish.
                      >
                    • Bob Doyle
                      Hi David, Actually the past two decades or more have seen enormous advances in structured software, from Ed Yourdon, Tom DeMarco, and their Structured
                      Message 10 of 21 , Apr 27, 2005
                      • 0 Attachment
                        Hi David,

                        Actually the past two decades or more have seen enormous advances in
                        "structured" software, from Ed Yourdon, Tom DeMarco, and their
                        Structured Programming revolution to the latest techniques of
                        object-oriented software, which use a concept of inheritance that a
                        librarian or a biologist would indeed recognize as a kind of taxonomy.

                        In the document management field, and in the bibliographic universe of
                        printed books, documents are truly unstructured. Dublin Core metadata
                        and the work to create a machine readable catalog (MARC) are efforts to
                        "force a kind of structure,"as Kathy says, outside of the documents
                        themselves.

                        And the vast majority of web pages are unstructured. Many modern
                        content management systems store reusable elements as XML structures for
                        example, but these are pretty expensive to operate.

                        These reusable elements of content (especially when they have an
                        application widget associated) have something in common with the O-O
                        software effort to identify "design patterns" in the code and create
                        reusable approaches to software components. They arrange these design
                        patterns in a library of components that has a taxonomic organization.

                        See http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29

                        Bob Doyle

                        David Eddy wrote:

                        >
                        >
                        > Kathy -
                        >
                        > >
                        > > Today, the problem we face is not how to properly fill the fields of
                        > > a well structured database, but how to find the this amorphous mass
                        > > of company data and then force some kind of structure over it so
                        > > that it can be retrieved and reused.
                        > >
                        >
                        > Strong agreement.
                        >
                        > You point at multiple significant issues.
                        >
                        > Coming from the allegedly structured side of things, software, I am
                        > not at all comfortable with what seems to be the unchallenged mantra
                        > in the document management realm, which to my ear seems to assume
                        > that docments are inherently unstructured & therefore difficult &
                        > software/databases are structured & therefore "easy."
                        >
                        > I see software as the base machine tool of the 21st century that
                        > produces most other documents... and as far as I can tell software
                        > (the source code/progamming languages themselves) is NOT considered
                        > to be a "document" worthy of calling a document.
                        >
                        > I am completely baffled by this assumption.
                        >
                        > Is it because non-software folks ASSUME software is inherently
                        > organized & structured? <peels of raucus laughter!!!>
                        >
                        > Is it because I've had it up to my eyeballs with the chaotic nature of
                        > software & am able to delude myself that those wise folks using
                        > taxonomies in libraries know how to retrofit order to native chaos?
                        >
                        >
                        > What I currently embrace is the belief (fed by experience) that
                        > systems--in order to be more flexible--MUST have some additional
                        > organizing net thrown around them (the systems).
                        >
                        > I do know that when I was actively writing systems, if I'd said:
                        > "STOP! ...I'm not writing any more code until we have a taxonomy."
                        > I'd have been shown the door VERY promptly.
                        >
                        > I'm hoping that taxonomy/ontology/semantics is part of that "we gotta
                        > be better organized" effort. And it has to be somehow done to the
                        > systems that run our society & were most decidedly NOT built with any
                        > consideration to having an organized taxonomy.
                        >
                        > - David


                        --
                        Bob Doyle
                        Editor In Chief, CMS Review - http://www.cmsreview.com
                        Technology Adviser, CM Pros - http://www.cmprofessionals.org
                        CEO, skyBuilders - http://www.skybuilders.com
                        77 Huron Avenue
                        Cambridge, MA 02138
                        617-876-5678
                      Your message has been successfully submitted and would be delivered to recipients shortly.