Loading ...
Sorry, an error occurred while loading the content.

Re: [ublas-dev] ublas dev meeting

Expand Messages
  • Ian McCulloch
    Hi, ... That week is unfortunately looking to be quite busy, but I think I will be able to make it, at least for one or two days. I might need to be back in
    Message 1 of 14 , Oct 8, 2004
    • 0 Attachment
      Hi,

      On Fri, 8 Oct 2004, Toon Knapen wrote:

      >
      > Hi all,
      >
      > The ublas dev meeting is scheduled from 25 to 27th of octobre. Currently
      > only 4 persons have confirmed (Michael Stevens, Gunther Winkler, Karl
      > Meerbergen and Toon Knapen). If you would like to join us, please
      > confirm quickly.

      That week is unfortunately looking to be quite busy, but I think I will be
      able to make it, at least for one or two days. I might need to be back in
      Aachen on the 27th.

      If you like, I can prepare a talk on my current project, which is an
      application where the data structures are well fitted by nested
      sparse and/or dense matrices. Although it doesn't use uBLAS some of the
      techniques might be of interest, including BLAS integration, interfaces,
      generics, etc. And especially, what works well versus what I would do
      differently if I could.

      Cheers,
      Ian
    • Karl Meerbergen
      ... Hi Ian, Welcome to the meeting! Any topic that may make ublas easier to use, easier to extend, more efficient, any suggestion on new algorithms, or
      Message 2 of 14 , Oct 8, 2004
      • 0 Attachment
        Ian McCulloch wrote:

        >Hi,
        >
        >On Fri, 8 Oct 2004, Toon Knapen wrote:
        >
        >
        >
        >>Hi all,
        >>
        >>The ublas dev meeting is scheduled from 25 to 27th of octobre. Currently
        >>only 4 persons have confirmed (Michael Stevens, Gunther Winkler, Karl
        >>Meerbergen and Toon Knapen). If you would like to join us, please
        >>confirm quickly.
        >>
        >>
        >
        >That week is unfortunately looking to be quite busy, but I think I will be
        >able to make it, at least for one or two days. I might need to be back in
        >Aachen on the 27th.
        >
        >If you like, I can prepare a talk on my current project, which is an
        >application where the data structures are well fitted by nested
        >sparse and/or dense matrices. Although it doesn't use uBLAS some of the
        >techniques might be of interest, including BLAS integration, interfaces,
        >generics, etc. And especially, what works well versus what I would do
        >differently if I could.
        >
        >Cheers,
        >Ian
        >
        >
        >
        >

        Hi Ian,

        Welcome to the meeting!

        Any topic that may make ublas easier to use, easier to extend, more
        efficient, any suggestion on new algorithms, or techniques are welcome.
        BLAS interfaces are of particular interest, since it has been mentioned
        several times. I will also talk about this.

        Try and avoid to dwell to deep into applications and get straight to the
        point, i.e. give a clear message from which we can all learn, and make
        ublas better.

        Thanks for your interest,

        Karl
      • Ian McCulloch
        ... A * conj(B) is part of BLAS, although not for all possible variants of row column storage of A and B. A * B is not, but
        Message 3 of 14 , Oct 8, 2004
        • 0 Attachment
          Peter Schmitteckert wrote:

          >
          > Salut,
          >
          > On Fri, 8 Oct 2004, Matthias Troyer wrote:
          >
          >> In my experience there is only one magic word: performance. As long as
          >> ublas is not competitive with blas on ense vectors and matrices it will
          >> not be adopted widely. One way to achieve this could be to just call
          >> blas where possible.
          >
          >
          > There's another magic word: simplicity.
          > As long as it is simple to start with,
          > there will be the usual 'vendor lock in'
          > you have in software industrie. :)
          >
          > I"ve chosen ublas not for performance reasons, it was
          > for "easy to use" reason. For performance I just
          > wrote my own BLAS wrappers.
          >
          > The next magic word is "flexibility". There are a lot of
          > BLAS type operations, that are just not part of BLAS,
          > eg. A * conj(B), or A<double> * B<std::complex<double>>.

          A * conj(B) is part of BLAS, although not for all possible variants of row
          column storage of A and B. A<double> * B<std::complex<double>> is not, but
          except for small matrices it would be faster to construct a temporary
          matrix of std::complex and call zgemm.

          >
          > Best wishes
          > Peter
          >
          > P.S. Is there any hope to include the ATLAS machinery into the uBLAS
          > source code? Any expert?

          Well, I am far from an expert on this, but IMHO that would be either a huge
          duplication of effort (to reimplement ATLAS in C++) or a maintenance
          nightmare (to import the existing C source as is and keep it in sync with
          future versions of ATLAS). Better would be to encourage ATLAS to implement
          some BLAS extensions for things like complex*real multiply. These are
          already implemented in reference form in extended BLAS (http://crd.lbl.gov
          ~xiaoye/XBLAS/) but these are a few years old now and I never heard any
          followup plan.

          Cheers,
          Ian
        • Kresimir Fresl
          ... Some time ago Joerg began to play with this idea using ATLAS bindings. You can find traces of it in functional.hpp. Only ATLAS/BLAS function dot() was used
          Message 4 of 14 , Oct 8, 2004
          • 0 Attachment
            Matthias Troyer wrote:

            > It seems that ublas dense matrix multiplies are substantially slower
            > than calls to the blas function dgemm, and this limits the performance
            > of some of our codes (unless we call blas directly). Will it be
            > possible that the ublas prod() function and similar functions dispatch
            > to blas whenever possible?

            Some time ago Joerg began to play with this idea using
            ATLAS bindings. You can find traces of it in functional.hpp.
            Only ATLAS/BLAS function dot() was used for vector inner product.

            ublas::prod() is a bit more convoluted. Namely, prod() `returns'
            matrix which is a result of multiplication, while in dgemm()
            this matrix is a parameter. Therefore dgemm() must in fact
            be called in matrix<>::operator=.

            Regards,

            fres
          • Peter Schmitteckert
            Salut, ... As as as I understand it it, A * conj(B) is not part of BLAS, there is only N , T , and C (hermitian conjugated). One can simulate a conjugated
            Message 5 of 14 , Oct 11, 2004
            • 0 Attachment
              Salut,

              On Fri, 8 Oct 2004, Ian McCulloch wrote:

              > A * conj(B) is part of BLAS, although not for all possible variants of row


              As as as I understand it it, A * conj(B) is not part of BLAS, there
              is only 'N', 'T', and 'C' (hermitian conjugated).
              One can simulate a conjugated BLAS, if one is willing to interchange
              columns vs. row major storage.

              C^T = ( A * conj(B) )^T = B^H * A^T,

              but then

              C^T = ( A^{H} * B )^T = B^T * A^{*},

              fails, which is annoying at least for complex DMRG.

              > column storage of A and B. A<double> * B<std::complex<double>> is not, but
              > except for small matrices it would be faster to construct a temporary
              > matrix of std::complex and call zgemm.

              But a BLAS implementation wouldn't need this extra copy, and there is
              an operation overhead of a factor of 2, since the imaginary part of
              the real matrix is zero, plus unneeded load/store memory operation for
              transferring all these unneded zeroes.

              > Well, I am far from an expert on this, but IMHO that would be either a huge
              > duplication of effort (to reimplement ATLAS in C++) or a maintenance
              > nightmare (to import the existing C source as is and keep it in sync with
              > future versions of ATLAS). Better would be to encourage ATLAS to implement
              > some BLAS extensions for things like complex*real multiply. These are
              > already implemented in reference form in extended BLAS (http://crd.lbl.gov
              > ~xiaoye/XBLAS/) but these are a few years old now and I never heard any
              > followup plan.

              I hoped that there would be a way to incorporate the ATLAS kernels
              as buildings blocks, but my only ATLAS knowlesge comes from the
              looking at a few ATLAS web pages.
              I guess that for small matrices there could be a huge performace win
              due to avoided overhead, and an increased flexibility.


              Best wishes,
              Peter
            • Karl Meerbergen
              ... There are the blas bindings in boost-sandbox/numeric/bindings/.. These can be used to link with atlas. It is possible to (almost) automatically map some
              Message 6 of 14 , Oct 11, 2004
              • 0 Attachment
                Peter Schmitteckert wrote:

                >
                >
                >>Well, I am far from an expert on this, but IMHO that would be either a huge
                >>duplication of effort (to reimplement ATLAS in C++) or a maintenance
                >>nightmare (to import the existing C source as is and keep it in sync with
                >>future versions of ATLAS). Better would be to encourage ATLAS to implement
                >>some BLAS extensions for things like complex*real multiply. These are
                >>already implemented in reference form in extended BLAS (http://crd.lbl.gov
                >>~xiaoye/XBLAS/) but these are a few years old now and I never heard any
                >>followup plan.
                >>
                >>
                >
                >I hoped that there would be a way to incorporate the ATLAS kernels
                >as buildings blocks, but my only ATLAS knowlesge comes from the
                >looking at a few ATLAS web pages.
                >I guess that for small matrices there could be a huge performace win
                >due to avoided overhead, and an increased flexibility.
                >
                >
                >Best wishes,
                >Peter
                >
                >
                >
                >

                There are the blas bindings in boost-sandbox/numeric/bindings/..
                These can be used to link with atlas. It is possible to (almost)
                automatically map some ublas expressions to blas calls. I want to talk
                about this in the ublas dev-meeting.

                Best regards,

                Karl
              • Peter Schmitteckert
                Salut, ... Of course, I call BLAS whenever possible in production runs, but there are just blas operations which are not part of the BLAS standard. For those
                Message 7 of 14 , Oct 11, 2004
                • 0 Attachment
                  Salut,

                  On Mon, 11 Oct 2004, Karl Meerbergen wrote:


                  > There are the blas bindings in boost-sandbox/numeric/bindings/..
                  > These can be used to link with atlas. It is possible to (almost)
                  > automatically map some ublas expressions to blas calls. I want to talk
                  > about this in the ublas dev-meeting.

                  Of course, I call BLAS whenever possible in production runs,
                  but there are just blas operations which are not part
                  of the BLAS standard. For those it would be nice to have a
                  decent peformance.

                  Best wishes,
                  Peter

                  _______________________________________

                  Dr. Peter Schmitteckert
                  TKM / CFN University of Karlsruhe
                  ++49 721 608 3363
                • Ian McCulloch
                  ... Yes, of the 16 possible variants of normal/transposed/conjugate/hermitian for the two right hand parameters, only 9 are natively supported. Another 6 can
                  Message 8 of 14 , Oct 11, 2004
                  • 0 Attachment
                    Peter Schmitteckert wrote:

                    >
                    > Salut,
                    >
                    > On Fri, 8 Oct 2004, Ian McCulloch wrote:
                    >
                    >> A * conj(B) is part of BLAS, although not for all possible variants of
                    >> row
                    >
                    >
                    > As as as I understand it it, A * conj(B) is not part of BLAS, there
                    > is only 'N', 'T', and 'C' (hermitian conjugated).
                    > One can simulate a conjugated BLAS, if one is willing to interchange
                    > columns vs. row major storage.
                    >
                    > C^T = ( A * conj(B) )^T = B^H * A^T,
                    >
                    > but then
                    >
                    > C^T = ( A^{H} * B )^T = B^T * A^{*},
                    >
                    > fails, which is annoying at least for complex DMRG.

                    Yes, of the 16 possible variants of normal/transposed/conjugate/hermitian
                    for the two right hand parameters, only 9 are natively supported. Another
                    6 can be treated at the cost of an in-place conjugation of the result
                    matrix, and the last variant requires an in-place transpose. I don't know
                    how large the crossover size is at which point these operations become
                    efficient, probably quite small (especially for the conjugation).

                    >
                    >> column storage of A and B. A<double> * B<std::complex<double>> is not,
                    >> but except for small matrices it would be faster to construct a temporary
                    >> matrix of std::complex and call zgemm.
                    >
                    > But a BLAS implementation wouldn't need this extra copy, and there is
                    > an operation overhead of a factor of 2, since the imaginary part of
                    > the real matrix is zero, plus unneeded load/store memory operation for
                    > transferring all these unneded zeroes.
                    >
                    >> Well, I am far from an expert on this, but IMHO that would be either a
                    >> huge duplication of effort (to reimplement ATLAS in C++) or a maintenance
                    >> nightmare (to import the existing C source as is and keep it in sync with
                    >> future versions of ATLAS). Better would be to encourage ATLAS to
                    >> implement
                    >> some BLAS extensions for things like complex*real multiply. These are
                    >> already implemented in reference form in extended BLAS
                    >> (http://crd.lbl.gov ~xiaoye/XBLAS/) but these are a few years old now and
                    >> I never heard any followup plan.
                    >
                    > I hoped that there would be a way to incorporate the ATLAS kernels
                    > as buildings blocks, but my only ATLAS knowlesge comes from the
                    > looking at a few ATLAS web pages.
                    > I guess that for small matrices there could be a huge performace win
                    > due to avoided overhead, and an increased flexibility.

                    ATLAS defines a block size which is used internally. For large enough
                    matrices, they are copied into a blocked format which is more cache
                    efficient. A nice idea for a high performance library would be to use this
                    already optimized ATLAS format for temporary matrices. But, ATLAS doesn't
                    document this format, and they probably wouldn't want to either. My
                    suspicion is the performance win of saving a copy is too small to bother
                    with. But maybe I'm wrong... does anyone here know any of the ATLAS
                    developers?

                    Cheers,
                    Ian
                  Your message has been successfully submitted and would be delivered to recipients shortly.