Loading ...
Sorry, an error occurred while loading the content.
 

Vector storage efficiency

Expand Messages
  • Robbie Vogt
    Hi all, I have recently been experimenting with getting uBLAS to play nicely with some legacy code that uses simple C-arrays to represent vectors and matrices,
    Message 1 of 7 , Jun 21, 2004
      Hi all,

      I have recently been experimenting with getting uBLAS to play nicely
      with some legacy code that uses simple C-arrays to represent vectors and
      matrices, etc. The basic idea was to use plain C pointers for the
      interface to the outside world and wrap these into uBLAS vectors
      internally for performing calculations (with maintenance and readability
      benefits that result).

      As a result of trying to maintain reasonably comparable performance to
      hand-written code, I have been looking pretty closely at the
      (undocumented) array_adaptor<T> storage class and compiler
      optimisations. This has lead to some interesting (to me) insights.

      Firstly, and as a general comment on the relationship between
      ublas::vector<T,A> and all of the storage classes, I can't understand
      why both store the size of the array/vector. This seems unnecessary and
      wasteful. Unless I am missing something, it would seem that a more
      efficient and simple implementation would result if vector relied on its
      storage to provide and keep track of all size information.

      Secondly, it would seem that the idiom used to raise exceptions in uBLAS
      is a major hurdle for inlining optimisations (at least on VC7.1). This
      is demonstrated in one of the constructors for array_adaptor,

      BOOST_UBLAS_INLINE
      array_adaptor (const array_adaptor &a):
      size_ (a.size_), own_ (a.own_), data_ (a.data_) {
      if (own_)
      // Raising exceptions abstracted as requested during review.
      // throw std::bad_alloc ();
      external_logic ().raise ();
      }

      VC7.1 refuses to inline this function due to the external_logic ().raise
      () statement. The problem is that the compiler chooses to inline this
      statement (a constructor call followed by a method call, both of which
      are relatively costly) which produces code it seemingly decides is too
      big to inline. This is a noticeable hit in performance if many
      temporaries have to be created and destroyed, particularly if (own_) is
      guaranteed to be false. By removing the statement, the constructor
      inlines beautifully (as expected).

      Although I haven't checked in any detail, I suspect that any uBLAS
      function that uses this idiom of raising an exception will become
      ineligible for further inlining, regardless of how infrequently the
      exception is actually thrown. Combining this with the extra overhead of
      the stack unwinding code, it seems to me that uBLAS needs to be very
      careful (and minimal) in its use of exceptions. Would it be feasible to
      have a configuration option to prevent uBLAS from using any exceptions?
      (Maybe by requiring all uses of exceptions to be in BOOST_UBLAS_CHECK
      macros?)

      While these are seemingly little issues, I remind you that uBLAS relies
      heavily on compiler optimisations to be efficient, and to help the
      compiler do its job effectively the little details can become important.

      TIA,
      Robbie Vogt.
    • Ian McCulloch
      Hi, ... [...] ... The raise() member functions are declared inline (by virtue of being defined inside the class definition), which is probably a bug - there
      Message 2 of 7 , Jun 22, 2004
        Hi,

        Robbie Vogt wrote:

        > Hi all,
        >
        [...]
        > Secondly, it would seem that the idiom used to raise exceptions in uBLAS
        > is a major hurdle for inlining optimisations (at least on VC7.1). This
        > is demonstrated in one of the constructors for array_adaptor,
        >
        > BOOST_UBLAS_INLINE
        > array_adaptor (const array_adaptor &a):
        > size_ (a.size_), own_ (a.own_), data_ (a.data_) {
        > if (own_)
        > // Raising exceptions abstracted as requested during
        > review. // throw std::bad_alloc ();
        > external_logic ().raise ();
        > }
        >
        > VC7.1 refuses to inline this function due to the external_logic ().raise
        > () statement. The problem is that the compiler chooses to inline this
        > statement (a constructor call followed by a method call, both of which
        > are relatively costly) which produces code it seemingly decides is too
        > big to inline. This is a noticeable hit in performance if many
        > temporaries have to be created and destroyed, particularly if (own_) is
        > guaranteed to be false. By removing the statement, the constructor
        > inlines beautifully (as expected).
        >
        > Although I haven't checked in any detail, I suspect that any uBLAS
        > function that uses this idiom of raising an exception will become
        > ineligible for further inlining, regardless of how infrequently the
        > exception is actually thrown. Combining this with the extra overhead of
        > the stack unwinding code, it seems to me that uBLAS needs to be very
        > careful (and minimal) in its use of exceptions. Would it be feasible to
        > have a configuration option to prevent uBLAS from using any exceptions?
        > (Maybe by requiring all uses of exceptions to be in BOOST_UBLAS_CHECK
        > macros?)

        The raise() member functions are declared inline (by virtue of being defined
        inside the class definition), which is probably a bug - there are several
        compilers where throw statements are big and there isn't really any reason
        to inline a throw.

        There is BOOST_NO_EXCEPTIONS, but I think that is intended for the compiler
        configuration, not a user setting? Anyway, mostly uBLAS uses exceptions
        for serious errors where you would probably want to end the program anyway;
        if BOOST_NO_EXCEPTIONS is defined, then it will call abort() instead. So,
        (aside from the inlining problem) there is no overhead issues with
        exceptions.

        >
        > While these are seemingly little issues, I remind you that uBLAS relies
        > heavily on compiler optimisations to be efficient, and to help the
        > compiler do its job effectively the little details can become important.

        Yes, little things do matter. Recently I've been trying to help a colleague
        solve a problem where operator*(std::complex<double>, std::complex<double>)
        is not being inlined in a hand-rolled matrix-matrix multiply. It has quite
        a catastrophic effect!

        Cheers,
        Ian
      • Michael Stevens
        Hi Robbie, ... Sounds like a lot of hard work! Does array_adaptor really give more performance then bounded_array or did you need it for C array compatibility?
        Message 3 of 7 , Jun 22, 2004
          Hi Robbie,

          On Tuesday 22 June 2004 08:56, Robbie Vogt wrote:
          >
          > As a result of trying to maintain reasonably comparable performance to
          > hand-written code, I have been looking pretty closely at the
          > (undocumented) array_adaptor<T> storage class and compiler
          > optimisations. This has lead to some interesting (to me) insights.
          >
          Sounds like a lot of hard work! Does array_adaptor really give more
          performance then bounded_array or did you need it for C array compatibility?

          > Firstly, and as a general comment on the relationship between
          > ublas::vector<T,A> and all of the storage classes, I can't understand
          > why both store the size of the array/vector. This seems unnecessary and
          > wasteful. Unless I am missing something, it would seem that a more
          > efficient and simple implementation would result if vector relied on its
          > storage to provide and keep track of all size information.
          I have never understood why vector maintains its own size either! It was
          probably intended that underling storage concept would not need to store
          size. I assume it is now redundant but I have never tried making any changes
          so I am not sure of the consequences. If anyone would like to work this
          through and suggest a patch that would be good!
          >
          > Secondly, it would seem that the idiom used to raise exceptions in uBLAS
          > is a major hurdle for inlining optimisations (at least on VC7.1). This
          > is demonstrated in one of the constructors for array_adaptor,
          >
          > BOOST_UBLAS_INLINE
          > array_adaptor (const array_adaptor &a):
          > size_ (a.size_), own_ (a.own_), data_ (a.data_) {
          > if (own_)
          > // Raising exceptions abstracted as requested during
          > review. // throw std::bad_alloc ();
          > external_logic ().raise ();
          > }
          >
          > VC7.1 refuses to inline this function due to the external_logic ().raise
          > () statement. The problem is that the compiler chooses to inline this
          > statement (a constructor call followed by a method call, both of which
          > are relatively costly) which produces code it seemingly decides is too
          > big to inline. This is a noticeable hit in performance if many
          > temporaries have to be created and destroyed, particularly if (own_) is
          > guaranteed to be false. By removing the statement, the constructor
          > inlines beautifully (as expected).
          This is really interesting. I would be good to know if other compilers also
          have difficulty with this idiom. Did you have a look to see what happends i
          the exception is thrown directly (with throw). Can VC7.1 inline in this case?
          >
          > Although I haven't checked in any detail, I suspect that any uBLAS
          > function that uses this idiom of raising an exception will become
          > ineligible for further inlining, regardless of how infrequently the
          > exception is actually thrown. Combining this with the extra overhead of
          > the stack unwinding code, it seems to me that uBLAS needs to be very
          > careful (and minimal) in its use of exceptions. Would it be feasible to
          > have a configuration option to prevent uBLAS from using any exceptions?
          > (Maybe by requiring all uses of exceptions to be in BOOST_UBLAS_CHECK
          > macros?)
          The whole external_logic ().raise () mechanism was added to provide a clean
          way to use uBLAS with and without exceptions. Have a look at exception.hpp
          which is controlled by the config macro BOOST_NO_EXCEPTIONS or
          BOOST_UBLAS_USE_EXCEPTIONS. It may be worth checking the effect of disabling
          the exceptions. One further possibility is that raise() is defined as virtual
          and this may be preventing the compiler inlining.


          >
          > While these are seemingly little issues, I remind you that uBLAS relies
          > heavily on compiler optimisations to be efficient, and to help the
          > compiler do its job effectively the little details can become important.
          Agreed. It is often surprising what small changes can result in serious
          compiler problems!

          Michael
        • Robbie Vogt
          Hi Michael, ... The C array compatibility was the main reason, but yes, array_adaptor is definitely better performing than bounded_array. I may have been a
          Message 4 of 7 , Jun 23, 2004
            Hi Michael,

            Michael Stevens wrote:

            >Hi Robbie,
            >
            >On Tuesday 22 June 2004 08:56, Robbie Vogt wrote:
            > >
            > > As a result of trying to maintain reasonably comparable performance to
            > > hand-written code, I have been looking pretty closely at the
            > > (undocumented) array_adaptor<T> storage class and compiler
            > > optimisations. This has lead to some interesting (to me) insights.
            > >
            >Sounds like a lot of hard work! Does array_adaptor really give more
            >performance then bounded_array or did you need it for C array compatibility?
            >
            >
            The C array compatibility was the main reason, but yes, array_adaptor is
            definitely better performing than bounded_array. I may have been a
            little misleading in that I am using array_adaptor only for wrapping
            existing C arrays, for temporaries, etc in calculations I am using
            standard unbounded_arrays. For my purposes, even array_adaptor is a
            little too heavy weight as the memory is never owned by the vector and
            will never be resized, etc. I am thinking of implementing a very basic
            storage class for my purposes.

            > > Firstly, and as a general comment on the relationship between
            > > ublas::vector<T,A> and all of the storage classes, I can't understand
            > > why both store the size of the array/vector. This seems unnecessary and
            > > wasteful. Unless I am missing something, it would seem that a more
            > > efficient and simple implementation would result if vector relied on its
            > > storage to provide and keep track of all size information.
            >I have never understood why vector maintains its own size either! It was
            >probably intended that underling storage concept would not need to store
            >size. I assume it is now redundant but I have never tried making any changes
            >so I am not sure of the consequences. If anyone would like to work this
            >through and suggest a patch that would be good!
            >
            >
            I would like to investigate this further also; I will let you know how
            this progresses. Since posting, I suspect problems may arise for
            matrices and for some sparse types (at least if the size is only
            recorded in the storage classes).

            > >
            > > Secondly, it would seem that the idiom used to raise exceptions in uBLAS
            > > is a major hurdle for inlining optimisations (at least on VC7.1). This
            > > is demonstrated in one of the constructors for array_adaptor,
            > >
            > > BOOST_UBLAS_INLINE
            > > array_adaptor (const array_adaptor &a):
            > > size_ (a.size_), own_ (a.own_), data_ (a.data_) {
            > > if (own_)
            > > // Raising exceptions abstracted as requested during
            > > review. // throw std::bad_alloc ();
            > > external_logic ().raise ();
            > > }
            > >
            > > VC7.1 refuses to inline this function due to the external_logic ().raise
            > > () statement. The problem is that the compiler chooses to inline this
            > > statement (a constructor call followed by a method call, both of which
            > > are relatively costly) which produces code it seemingly decides is too
            > > big to inline. This is a noticeable hit in performance if many
            > > temporaries have to be created and destroyed, particularly if (own_) is
            > > guaranteed to be false. By removing the statement, the constructor
            > > inlines beautifully (as expected).
            >This is really interesting. I would be good to know if other compilers also
            >have difficulty with this idiom. Did you have a look to see what happends i
            >the exception is thrown directly (with throw). Can VC7.1 inline in this case?
            >
            >
            It would seem not. I may be able to experiment with gcc soon to see how
            it responds (I expect it to fair a little better, without any particular
            reasons/justifications).

            >[...]
            >The whole external_logic ().raise () mechanism was added to provide a clean
            >way to use uBLAS with and without exceptions. Have a look at exception.hpp
            >which is controlled by the config macro BOOST_NO_EXCEPTIONS or
            >BOOST_UBLAS_USE_EXCEPTIONS. It may be worth checking the effect of disabling
            >the exceptions. One further possibility is that raise() is defined as virtual
            >and this may be preventing the compiler inlining.
            >
            >
            >
            BOOST_NO_EXCEPTIONS definitely helps (thanks also to Ian for suggesting
            this) but mostly when I move the throwing logic to a standalone
            function, i.e.

            void external_logic_raise ()
            {
            external_logic().raise();
            }

            and calling this. This function ends up too big to inline, so the
            storage_adaptor constructor calling *it* is small enough to inline. It
            would seem the facto that raise() is virtual doesn't make a difference,
            which makes sense as it is not called polymorphically.

            Looking at the produced code more closely the biggest problem actually
            seems to be the construction of a temporary std::string to pass as a
            default parameter to the external_logic constructor. Removing this
            temporary however does not seem as effective as the stand alone function...

            Robbie.
          • Michael Stevens
            Hi Robbie, ... std::string I think you have hit the nail on the head. Makes sense that its construction cannot be easily inlined! I think the best solution
            Message 5 of 7 , Jun 23, 2004
              Hi Robbie,
              >
              > Looking at the produced code more closely the biggest problem actually
              > seems to be the construction of a temporary std::string to pass as a
              > default parameter to the external_logic constructor. Removing this
              > temporary however does not seem as effective as the stand alone function...

              "std::string" I think you have hit the nail on the head. Makes sense that its
              construction cannot be easily inlined!

              I think the best solution would be a change to the exception types so they are
              based on std::exception directly rather then std::runtime_error and
              std::logic_error. This would break the std::string dependency completely.

              Michael
            • Ian McCulloch
              Hi, On Tue, 22 Jun 2004, Michael Stevens wrote: [...] ... The only compiler I have investigated this thorougly is the (now defunct) KAI compiler, where a throw
              Message 6 of 7 , Jun 23, 2004
                Hi,

                On Tue, 22 Jun 2004, Michael Stevens wrote:

                [...]

                > > Secondly, it would seem that the idiom used to raise exceptions in uBLAS
                > > is a major hurdle for inlining optimisations (at least on VC7.1). This
                > > is demonstrated in one of the constructors for array_adaptor,
                > >
                > > BOOST_UBLAS_INLINE
                > > array_adaptor (const array_adaptor &a):
                > > size_ (a.size_), own_ (a.own_), data_ (a.data_) {
                > > if (own_)
                > > // Raising exceptions abstracted as requested during
                > > review. // throw std::bad_alloc ();
                > > external_logic ().raise ();
                > > }
                > >
                > > VC7.1 refuses to inline this function due to the external_logic ().raise
                > > () statement. The problem is that the compiler chooses to inline this
                > > statement (a constructor call followed by a method call, both of which
                > > are relatively costly) which produces code it seemingly decides is too
                > > big to inline. This is a noticeable hit in performance if many
                > > temporaries have to be created and destroyed, particularly if (own_) is
                > > guaranteed to be false. By removing the statement, the constructor
                > > inlines beautifully (as expected).
                > This is really interesting. I would be good to know if other compilers also
                > have difficulty with this idiom. Did you have a look to see what happends i
                > the exception is thrown directly (with throw). Can VC7.1 inline in this case?

                The only compiler I have investigated this thorougly is the (now defunct)
                KAI compiler, where a throw statement is quite expensive (although it will
                still inline it if you specify aggressive compiler options).

                Why not simply make raise() a non-inline function?

                > >
                > > Although I haven't checked in any detail, I suspect that any uBLAS
                > > function that uses this idiom of raising an exception will become
                > > ineligible for further inlining, regardless of how infrequently the
                > > exception is actually thrown. Combining this with the extra overhead of
                > > the stack unwinding code, it seems to me that uBLAS needs to be very
                > > careful (and minimal) in its use of exceptions. Would it be feasible to
                > > have a configuration option to prevent uBLAS from using any exceptions?
                > > (Maybe by requiring all uses of exceptions to be in BOOST_UBLAS_CHECK
                > > macros?)
                > The whole external_logic ().raise () mechanism was added to provide a clean
                > way to use uBLAS with and without exceptions. Have a look at exception.hpp
                > which is controlled by the config macro BOOST_NO_EXCEPTIONS or
                > BOOST_UBLAS_USE_EXCEPTIONS. It may be worth checking the effect of disabling
                > the exceptions. One further possibility is that raise() is defined as virtual
                > and this may be preventing the compiler inlining.

                Interesting. A call to a virtual function shouldn't prevent inlining (it
                is simply an indirect call via the vtable, only a couple of
                additional instructions). My guess is the compiler is actually looking
                through the virtual function and inlining raise() itself (which it can do
                if it knows the dynamic type - which it trivially does in this case), but
                choking on the throw().

                Cheers,
                Ian
              • Michael Stevens
                After Robbies incite regarding std:string I have commited a changed to exception.hpp I have changed the constructor interface to using the const char* This is
                Message 7 of 7 , Jun 23, 2004
                  After Robbies incite regarding std:string I have commited a changed to
                  exception.hpp

                  I have changed the constructor interface to using the const char*
                  This is a less drastic change then the change to using std::exception I
                  suggested before.

                  In the normal case there is no longer any dependancy on std::string. If
                  BOOST_UBLAS_USE_EXCEPTIONS then a temporary std::string still needs to be
                  constructed.

                  The change passes regression tests but does mean the exception interface is a
                  little different. What do people think?

                  Michael
                Your message has been successfully submitted and would be delivered to recipients shortly.