Loading ...
Sorry, an error occurred while loading the content.

[boost] units library: A hash function approach type safe quantities

Expand Messages
  • Dean Foster
    The following idea kept me awake last night. GOAL: Construct a system would allow an unlimited number of different units to be defined and still catch 99.9% of
    Message 1 of 41 , Oct 1, 2001
    • 0 Attachment
      The following idea kept me awake last night.

      GOAL:

      Construct a system would allow an unlimited number of different units
      to be defined and still catch 99.9% of all type errors. It should
      automatically generate new units when basic units are multiplied or
      divided.

      SUMMARY:

      A units class is described that will have a single integer that
      represents a "hash" of the unit involved. If two units (say foot and
      lbs) are multiplied a hash of the product (foot_lbs) is created to act
      as a signature of this new unit. If two units are added and their
      hashes don't agree--the system complains at compile time. Type safety
      is thus 99.999% assured. (there is a chance that two complex units
      would have the same hash value and so mistakenly look the same.) The
      hash function has the property that any way of generating foot_lbs
      from other units will generate the same computed hash.

      INTRODUCTION:

      There are two problems the units library can solve: unit conversion
      and type safety. If the first goal is totally dropped, I think we can
      generate an elegant solution to the second problem. This would
      suggest that we should have two libraries at the end of the day:
      SIunits which does conversions (but ignores most type safety), and a
      type safe library to be named latter (that does no unit conversions).

      It seems that several of our candidate libraries look something like:

      // feet
      // | lbs
      // | | seconds
      // | | |
      // v v v
      //
      typedef Unit< 0 > pure;
      typedef Unit< 1 > foot;
      typedef Unit< 0, 1 > pound;
      typedef Unit< 0, 0, 1> second;
      typedef Unit< 1, 1, 0> foot_lbs;
      typedef Unit< 1, 0, -1> feet_per_second;

      What makes this totally ugly is that if you want 10 different units,
      you have 10 template integers. Not only is the code unreadable, but
      to use 12 different units instead of 10 is almost impossible without
      learning sed first!

      Basically what is going on is that when two units are multiplied we
      want to add a vector that represents the units involved. When two
      units are divided, we subtract the vector of their units.
      Mathematically this means we need a "ring" to represent the dimension
      of our units. Currently all the systems that I've heard about use
      basis elements--that leads to the vector space. But if instead we
      didn't use basis elements, we wouldn't need such a large space to
      represent each unit.

      Each basic unit is given a hash value h(unit). Now if we multiply two
      units

      h(unit_A * unit_B) = h(unit_A) + h(unit_B)

      if we divide two units:

      h(unit_A * unit_B) = h(unit_A) - h(unit_B)

      A complex transformation looks like:

      h(unit_A^2 * unit_B / unit_C) = 2 * h(unit_A) + h(unit_B) - h(unit_C)


      Using this scheme we can represent the previous units as:

      typedef Unit< 0 > pure;
      typedef Unit< h_foot > foot;
      typedef Unit< h_pound > pound;
      typedef Unit< h_second > second;
      typedef Unit< h_foot+h_pound > foot_lbs;
      typedef Unit< h_foot - h_second > feet_per_second;

      where h_foot, h_pound and h_second are carefully chosen integers. I
      laid it out so that it looks like the vector math we did above.

      The cool thing about such a representation is that now we should be
      able to replace the last two templates with the following:

      Product_of_units<foot,pound> foot_lbs;
      Ratio_of_units<foot,second> feet_per_second;

      where

      Product_of_units<unit_A,unit_B>

      is interconvertable with a

      Unit< h(unit_A) + h(unit_B) >

      and similarly for Ratio_of_units. Thus our final code would look
      something like:

      typedef Unit< 0 > pure;
      typedef Unit< h_foot > foot;
      typedef Unit< h_pound > pound;
      typedef Unit< h_second> second;
      typedef Product_of_units<foot,pound> foot_lbs;
      typedef Ratio_of_units<foot,second> feet_per_second;


      Such a scheme would allow as many type safe doubles as desired.
      Regardless of the number of types introduced almost all incorrect
      assignments will be captured.

      ISSUES AND PROBLEMS:

      o It would be nice to use a proper hash function--say addition
      modulo 2^32. This would allow larger hashes and better colosion
      avoidance. Is there an easy way of doing modulo arithmetic in
      templates? (I know they are Turing complete, but kinda slow!)

      o Is there a way of automatically generating good hash values for the
      basic units? Something close to random would be ideal. If we can't
      do the modulo arithmetic they have to be kinda small though. (say
      around 10-100 Million of a typical machine with 2^31 being the maximum
      signed value.)

      o I don't know how to do the conversions easilly between the
      Product_of_units and the basic Units.

      =============================================================================
      Dean Foster dean@...
      Statistics, Wharton, U. Penn 215 898 8233
      Philadelphia PA 19104-6302 http://diskworld.wharton.upenn.edu
    • George A. Heintzelman
      ... Yes, but it makes sense in some contexts to take the sin of the dimensionless velocity (or 2 pi times it, anyway). Mass fraction I don t have a good
      Message 41 of 41 , Oct 8, 2001
      • 0 Attachment
        Kevin Lynch wrote:
        > "George A. Heintzelman" wrote:
        > > No. Amount<> is not dimensionless.
        >
        > I would also agree that it doesn't make sense to take sin(number of
        > apples), but that isn't a good argument for dimensionality, I don't
        > think: it makes no sense to take sin(mass fraction) in most cases, but
        > "mass fraction" is a dimensionless quantity in both the SI definition
        > and my different definition. Furthermore, in some systems, like the
        > natural units of particle physics, some quantities are dimensionless
        > (for example, velocity) that would in SI be dimensionful.

        Yes, but it makes sense in some contexts to take the sin of the
        dimensionless velocity (or 2 pi times it, anyway). Mass fraction I
        don't have a good argument for. It semes like another category of the
        clearly dimensionless stuff; it makes sense to multiply a dimensionful
        quantity by a mass fraction or other ratio, and get another
        smae-dimensioned quantity. I don't think this will usually hold for
        number of apples.

        > But I'm perfectly happy to accept the SI definition going forward from
        > here, because I don't think dimensionful/dimensionless is going to be a
        > useful distinction for building a unit framework for C++.
        >
        > > Angles, binomial coefficients, and such are examples of truly
        > > dimensionless numbers.
        >
        > I think we need to be more careful. Binomial coefficients (and other
        > constants such as the Bernoulli numbers, pi, e, etc...) are not only
        > dimensionless (by any reasonable definition); they are pure numbers; no
        > units can are attached. Angles may or may not be different.

        Hrm. I think I see what you're saying here, but I'll have to think
        about it a little more.


        > > I still think something needs to address the
        > > difference between an angle and other dimensionless units, but that's a
        > > different question.
        >
        > I'll give it a shot. I'm not attached to this description, so if you
        > have another approach feel free to convince me :-)
        >
        > "angular units" are not units in the sense that length and time units
        > are "units"; I will go so far as to say that, in the language that I've
        > been using, the names of "angular units" are just "tags" on pure
        > numbers, not actual units. In the SI, we have the radian as an "angular
        > uit", but it is a special case, and isn't like any other unit in the SI:
        > "when one expresses the values of derived quantities involving plane
        > angle or solid angle, it often aids understanding if the special names
        > (or symbols) "radian" (rad) or
        > "steradian" (sr) are used in place of the number 1."
        > http://physics.nist.gov/Pubs/SP811/sec04.html#4.3
        >
        > But, radian can't really be 1! It can't be, because that would imply
        > that degree and grad are also 1.

        No, it wouldn't. It would imply that 'degree' = pi/180, and 'grad' =
        pi/200 (both dimensionless numbers). So I think this is all a
        consistent picture, but I'm still not sure I'm entirely sure whether
        this is the right way to encode this in a C++ library. Walter's 'Views'
        in SIUnits or something similar might be a better way to deal with it.

        > So, in summary, I am willing to accept Amount<> as dimensionful, and I
        > think it is better to not call angular measure a unit, since it violates
        > the algorithmic rules for units, but not the algorithm for tags. Or so
        > I think now.

        I think this is right, though I want to see how an actual
        implementation plays out. Time to go to work! I'm going to see if I can
        play a little with SIUnits and get something that does what we are
        talking about integrated with it.

        George Heintzelman
        georgeh@...
      Your message has been successfully submitted and would be delivered to recipients shortly.