Loading ...
Sorry, an error occurred while loading the content.

Re: Changing wide unnormalised table to tall thin tables - performance ? - HELP

Expand Messages
  • Joe Celko
    ... Well, that is the problem. By definitions: 1) All datatypes in RDBMS can have a NULL, but you can constraint a particular domain to not allow NULLs. 2) A
    Message 1 of 25 , Aug 13 10:54 AM
    • 0 Attachment
      >> The BIT column could be made NULLable surely? <<

      Well, that is the problem. By definitions:

      1) All datatypes in RDBMS can have a NULL, but you can constraint a
      particular domain to not allow NULLs.

      2) A BIT must be either '1' or '0' and absolutely, positively,
      never, never anything else. These guys live in physical hardware
      and have to obey this law.

      You can argue for breaking one of these "laws of the Universe" or
      you can simply say that BIT cannot belong in SQL because it is a low
      level, physical implementation concept.
    • Richard Quinn
      Help! Mr. Celko has well and truly mauled me! Pounced and ripped me to shreds. And I thought I was so clever! Anyway: It is smart to argue with a genius;
      Message 2 of 25 , Aug 13 1:47 PM
      • 0 Attachment
        Help!

        Mr. Celko has well and truly mauled me! Pounced and ripped me to shreds.
        And I thought I was so clever! Anyway: "It is smart to argue with a
        genius; listeners can't tell which is which." :))

        But it is true, I didn't know that ISO had standardized GENDER (maybe
        someday they *will* get around to standardizing sex, there is certainly
        too much variety around :) ). Can't wait to hear the rant about DBAs who
        fail to use industry standard codes :)

        But to answer your questions, I don't hate 1FN! I love it! Nearly as
        much as I love 2NF! Which is nearly as loveable as 3NF! And so on... But
        let's see:

        ObjectID SubjectID Permission (CHAR)
        ------ ------- ---------
        2 103 read
        2 103 write
        2 103 exec
        ... ... . . .

        Compared with
        ObjectID SubjectID Permission(Byte)
        ------ ------- ---------
        2 103 7

        I agree, the 2nd example is not in 1NF (it is in "hand-optimized 0NF"
        lol) so building sets and ranges may become quite difficult. But (and as
        long as I'm running on x86 architctures it remains true) I can read
        every state of the subject <has permission on> object relationship in
        one fell swoop on the heap or the index, obviating the need for most
        sets. Same goes for deleting all values or for updating the state. In
        the systems that I've helped push off the cliff, with millions of
        objects having one of many possible states on hundreds of subjects, we
        have been able to realize *serious* reduction in concurrency contention
        and big performance gains. We did think about it, we even argued a bit
        too, but the need for speed plus a willingness to sacrifice some of
        Codd's principles helped us to quickly make a very fast system. (And I
        have to point out, we actually do do quite a bit of normalization, too
        much for some people on this list perhaps).

        BTW, there are no !illegal! Combinations here, that would be quite a
        problem with a byte datatype. Or any other metadata describing the
        acceptable state of an attribute :))

        So, in summing up, it may not be normalized, but it sure as hell is fast
        whilst remaining thought out. And by the time my customers get around to
        wanting to count the number of people with permission x or y and z on an
        action, they've already changed the data so often the question looses
        meaning. Its just my evolutionary response to an ever more aggressive
        predator...

        - Richard







        >>
        >> A classic example: I have a field in my PERSON table called
        IsMale (BIT). When set to true it informs me that the person is a
        male. <<

        This is a clasic example, but not how you think. What is the
        **attribute** we are measuring? Sex (or gender)! What is the
        domain of values for this atribute? The ISO sex codes, of course:

        0= Unknown
        1= Male
        1= Female
        1= N/A (lawful persons, such as corporations)

        Now, anyone who uses ISO Standard can share my data. No experienced
        DBA would fail to use industry standard codes, while newbies today
        are so @&$%! lazy or stupid, they never think of looking for one.
        But that is another rant of mine.

        >> The attribute can not possibly be logically or physically in any
        other state, but the information stored is just as valid as that
        stored in any other field [sic]. <<

        That is a major problem -- the domain for a BIT is not the domain
        for sex. The name "Ismale" tells you this is *not* an attribute,
        but a predicate, a shorthand for (foobar.sex = 1) in ISO codes,
        which is the state in the database of a column within a row.

        >> I see that the third state of "UNKNOWN" is not quite the same
        state as "NULL", but who cares? <<

        ISO cares. Anyone who uses ISO standards cares.

        This is very subtle, like the difference betweeen a number and a
        numeral, or predicate calculus and first-order logic. Quantifiers
        are at a different logical level than simple predicates.

        >> On another note, and I can tell you're gooing to hate this one :)
        I occasionally use BYTE fields, especially when storing things like
        permissions in an ACL kind of table. 0=no rights, 1=read, 2=write,
        4=execute, 8=delete, 16=take ownership. <<

        I don't mind -- I made a six-digit income fixing this assembly
        language coding :). Wait until you move from a high-end to a low-
        end machine or change word size!! Why do you hate first normal form
        so much that you drop back to 1950's machine code and all the
        complexity in the application code? Where are your constraints to
        prevent illegal combinations? Etc.

        You can get away with numeric codes unless you try to actually pass
        it as a BYTE without conversion to the appropriate host langauge
        datatype.

        >> So if the value in there is a 12 it is easy to compute what
        permissions the subject has on an object (exec and del). It means
        that I can store - in a single row - much more densely packed info.
        I think that this fits in quite nicely with some Agile principles.<<

        The "code first, think later, normalize never!" principle? I see a
        lot of that with agile programming; we used to call it a kludge
        or "Wright Brothers system engineering" (put it all together, push
        it off a cliff, see if it flies and try again ).

        <<
      • Bayley, Alistair
        ... Sorry. Just read the SQL Server docs. Bit is a bit special: you can t store nulls (and it s not indexable). AFAICT it s the only SQL Server datatype that
        Message 3 of 25 , Aug 14 12:52 AM
        • 0 Attachment
          > From: Joe Celko [mailto:celko@...]
          >
          > 2) A BIT must be either '1' or '0' and absolutely, positively,
          > never, never anything else. These guys live in physical hardware
          > and have to obey this law.
          >
          > You can argue for breaking one of these "laws of the Universe" or
          > you can simply say that BIT cannot belong in SQL because it is a low
          > level, physical implementation concept.

          Sorry. Just read the SQL Server docs. Bit is a bit special: you can't store
          nulls (and it's not indexable). AFAICT it's the only SQL Server datatype
          that doesn't allow nulls. How odd...


          > From: Richard Quinn [mailto:rquinn@...]
          >
          > But it is true, I didn't know that ISO had standardized GENDER (maybe
          > someday they *will* get around to standardizing sex, there is
          > certainly too much variety around :) ).

          If you're recording data about whether a person is Male, Female, or
          Something Else, then it's probably more correct to call it Sex. See:
          http://www.iath.virginia.edu/~umw8f/Barbarians/Essays/gender.html

          Sex: "Biological sex is obvious -- a person is either male or female." <--
          hermaphrodites?

          Gender: "Gender is a cultural construct, part of the way an individual
          defines him/herself and is defined by others in the society. Gender is
          connected only in part to biological sex and issues of sexuality and
          reproduction; to a great extent, we study gender in the context of economic
          and power relations, of production and class, of ritual, belief and
          ideology."

          See also:
          http://www.bartleby.com/64/C005/010.html

          Gender was a linguistic term, but has been borrowed by fields outside of
          linguistics to refer to sex-based categories. It's meaning has become
          ambiguous.


          *****************************************************************
          The information in this email and in any attachments is
          confidential and intended solely for the attention and use
          of the named addressee(s). This information may be
          subject to legal professional or other privilege or may
          otherwise be protected by work product immunity or other
          legal rules. It must not be disclosed to any person without
          our authority.

          If you are not the intended recipient, or a person
          responsible for delivering it to the intended recipient, you
          are not authorised to and must not disclose, copy,
          distribute, or retain this message or any part of it.
          *****************************************************************
        • Scott W. Ambler
          Seems to me you might want to consider two fields: 1. Boolean: Indicator whether or not you know the value. 2. Char(1): M=Male, F=Female, A=Androgynous (that s
          Message 4 of 25 , Aug 14 5:36 AM
          • 0 Attachment
            Seems to me you might want to consider two fields:

            1. Boolean: Indicator whether or not you know the value.
            2. Char(1): M=Male, F=Female, A=Androgynous (that's a valid value in some
            domains), NULL if unknown.

            Or you could just go with one field, the second, and add U for unknown as
            the previous poster suggested.

            I'm a firm believer in following standards, see
            www.agilemodeling.com/practices.htm#ApplyModelingStandards, when they're
            applicable. I'm not sure the ISO standard has it covered, but that would
            depend on the actual requirements that you're trying to fulfill, wouldn't it?

            - Scott

            At 10:02 AM 8/13/2003 +0300, you wrote:
            >Hello Richard,
            >
            > >A classic example: I have a field in my PERSON table callied IsMale
            > >(BIT). When set to true it informs me that the person is a male. If NULL
            > >I can't say. False == female. The attribute can not possibly be
            >
            >In my opinion it's more correct for your field to be called SEX.
            >So you will have values: Male, Female, Unknown, NULL which is perfectly
            >correct to be coded using numeric values (except NULL ofcourse).
            >
            >--
            >Best regards,
            > Alexander mailto:saho@...

            ====================================================
            Scott W. Ambler
            Senior Consultant, Ronin International, Inc.
            www.ronin-intl.com/company/scottAmbler.html

            www.agiledata.org
            www.agilemodeling.com
            www.ambysoft.com
            www.enterpriseunifiedprocess.info
            www.modelingstyle.info
            www.ronin-intl.com
          • Alex Weatherall
            ... can t store ... datatype ... Err... according to Books Online, the BIT type is NULLABLE (I tried this and you can set a bit column to NULL)
            Message 5 of 25 , Aug 15 1:33 AM
            • 0 Attachment
              > Sorry. Just read the SQL Server docs. Bit is a bit special: you
              can't store
              > nulls (and it's not indexable). AFAICT it's the only SQL Server
              datatype
              > that doesn't allow nulls. How odd...

              Err... according to Books Online, the BIT type is NULLABLE (I tried
              this and you can set a bit column to NULL)
              <MS SQLSERVER Books Online>
              bit
              Integer data type 1, 0, or NULL.

              Remarks
              Columns of type bit cannot have indexes on them.

              Microsoft® SQL Server™ optimizes the storage used for bit columns. If
              there are 8 or fewer bit columns in a table, the columns are stored
              as 1 byte. If there are from 9 through 16 bit columns, they are
              stored as 2 bytes, and so on.
              </MS SQLSERVER Books Online>

              Obviously setting an index on this column wouldn't be a good idea as
              there are only two (not including null) values to index. That would
              be like writing a book in binary and then having an index at the back
              with all the pages with 1's in and all the pages with 0's in. The
              index would be larger than the data!
            • Bayley, Alistair
              ... Oh, I see. I looked at the 6.5 docs first. The docs for SQL Server 2000 state that nulls are allowed. They must have changed it sometime after 6.5.
              Message 6 of 25 , Aug 15 1:38 AM
              • 0 Attachment
                > From: Alex Weatherall [mailto:alexweatherall@...]
                >
                > Err... according to Books Online, the BIT type is NULLABLE (I tried
                > this and you can set a bit column to NULL)
                > <MS SQLSERVER Books Online>
                > bit
                > Integer data type 1, 0, or NULL.


                Oh, I see. I looked at the 6.5 docs first. The docs for SQL Server 2000
                state that nulls are allowed. They must have changed it sometime after 6.5.


                *****************************************************************
                The information in this email and in any attachments is
                confidential and intended solely for the attention and use
                of the named addressee(s). This information may be
                subject to legal professional or other privilege or may
                otherwise be protected by work product immunity or other
                legal rules. It must not be disclosed to any person without
                our authority.

                If you are not the intended recipient, or a person
                responsible for delivering it to the intended recipient, you
                are not authorised to and must not disclose, copy,
                distribute, or retain this message or any part of it.
                *****************************************************************
              • Joe Celko
                ... sometime after 6.5.
                Message 7 of 25 , Aug 15 8:12 AM
                • 0 Attachment
                  >> 6.5 docs ... SQL Server 2000 .. They must have changed it
                  sometime after 6.5. <<

                  Yet another problem with using proprietary datatypes and working at
                  the machine hardware level; the vendor gets to make up the rules as
                  they go along and you cannot appeal to ANSI or ISO. I sure hope
                  that you like an Intel architecture, because you can be locked into
                  it for awhile.
                • Eytan Ben-Meir
                  Hi, This is a general question. I m interested in peoples experience & insight. Have you encountered a situation in your practice where: 1) Amount of data was
                  Message 8 of 25 , Aug 15 9:36 AM
                  • 0 Attachment
                    Hi,
                     
                    This is a general question.
                    I'm interested in peoples experience & insight.
                    Have you encountered a situation in your practice where:
                    1) Amount of data was very large (much larger - at least one order of magnitude - than available RAM).
                    2) Requirements included flexibile schema
                         - Need to add(/remove) attributes to entities after deployment
                         - Need to add(/remove)  relationships between existing entities
                         - Need to add(/remove) new entities with relationships to existing (old) entities
                    ???
                    3) If so, what are the patterns for coping?
                     
                    Thanks,
                     
                    Eytan Ben-Meir
                     
                       
                    -----Original Message-----
                    From: Joe Celko [mailto:celko@...]
                    Sent: Fri, August 15, 2003 8:12 AM
                    To: agileDatabases@yahoogroups.com
                    Subject: [agileDatabases] Re: Changing wide unnormalised table to tall thin tables - performance ? - HELP

                    >> 6.5 docs ... SQL Server 2000 .. They must have changed it
                    sometime after 6.5. <<

                    Yet another problem with using proprietary datatypes and working at
                    the machine hardware level; the vendor gets to make up the rules as
                    they go along and you cannot appeal to ANSI or ISO.  I sure hope
                    that you like an Intel architecture, because you can be locked into
                    it for awhile. 





                    To unsubscribe from this group, send an email to:
                    agileDatabases-unsubscribe@yahoogroups.com



                    Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
                  • Simon Larsen
                    Actually I think that was mssql 7 and earlier. 2k does allow nulls (don t remember how I know this but it has a memory of bad associated with it, in that I
                    Message 9 of 25 , Aug 17 3:48 PM
                    • 0 Attachment
                      Actually I think that was mssql 7 and earlier. 2k does allow nulls
                      (don't remember how I know this but it has a memory of "bad" associated
                      with it, in that I think I got myself in some difficulty using them that
                      way when I first came to 2K).

                      From BOL: "If the setting of sp_dbcmptlevel is 65 or lower, bit data
                      types default to NOT NULL if the column does not have an explicit NULL
                      or NOT NULL."

                      Although you are correct that it is not indexable, although why anyone
                      would do that is beyond me, even with nullable you only have 3 possible
                      values.

                      sl


                      From: Bayley, Alistair [mailto:alistair_bayley@...]

                      Sorry. Just read the SQL Server docs. Bit is a bit special: you can't
                      store nulls (and it's not indexable). AFAICT it's the only SQL Server
                      datatype that doesn't allow nulls. How odd...
                    Your message has been successfully submitted and would be delivered to recipients shortly.