Loading ...
Sorry, an error occurred while loading the content.

[Cheetahtemplate-discuss] Unicode woes

Expand Messages
  • Jean-Baptiste Quenot
    Dear all, I m using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to set template variables that reference other template instances. And when the
    Message 1 of 8 , Jan 12, 2009
    • 0 Attachment
      Dear all,

      I'm using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to
      set template variables that reference other template instances. And
      when the nested template contains Unicode strings, Cheetah complains.

      Here is a detailed example with a set of three files: testFailing.py
      contains the Python code, main.tmpl is the main template, and
      other.tmpl is the nested template:

      -------------------------------------------------------------------------
      testFailing.py
      -----------------------------------------------------------------------
      # -*- coding: utf8 -*-

      """
      This Python snippet shows that Cheetah calls str(t) even when t is a
      Cheetah Template containing unicode chunks
      """

      from Cheetah.Template import Template

      t = Template.compile(file="main.tmpl")
      otherT = Template.compile(file="other.tmpl")
      other = otherT()
      t.other = other

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String'
      t.other.v = u'Unicode String'
      print unicode(t())

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String with eacute é'
      t.other.v = u'Unicode String'
      print unicode(t())

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String with eacute é'
      t.other.v = u'Unicode String with eacute é'
      print unicode(t())


      -------------------------------------------------------------------------
      main.tmpl
      -----------------------------------------------------------------------
      Main file with $v

      $other


      -------------------------------------------------------------------------
      other.tmpl
      -----------------------------------------------------------------------
      Other file with $v



      Here is the execution output:

      $ python testFailing.py
      ------------------------------------------------------------------------
      Main file with Unicode String

      Other file with Unicode String


      ------------------------------------------------------------------------
      Main file with Unicode String with eacute é

      Other file with Unicode String


      ------------------------------------------------------------------------
      Traceback (most recent call last):
      File "testFailing.py", line 27, in <module>
      print unicode(t())
      File "/var/lib/python-support/python2.5/Cheetah/Template.py", line
      981, in __str__
      def __str__(self): return getattr(self, mainMethName)()
      File "main_tmpl.py", line 93, in respond
      File "/var/lib/python-support/python2.5/Cheetah/Filters.py", line
      51, in filter
      filtered = str(val)
      UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
      position 43: ordinal not in range(128)


      As you can see, Cheetah.Filters.Filter is trying to cast the template
      instance of "other.tmpl" which has Unicode content to a String. If I
      cast the template beforehand to Unicode it works:

      -------------------------------------------------------------------------
      testWorking.py
      -----------------------------------------------------------------------
      # -*- coding: utf8 -*-

      from Cheetah.Template import Template

      t = Template.compile(file="main.tmpl")
      otherT = Template.compile(file="other.tmpl")
      other = otherT()

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String'
      other.v = u'Unicode String'
      t.other = unicode(other)
      print unicode(t())

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String with eacute é'
      other.v = u'Unicode String'
      t.other = unicode(other)
      print unicode(t())

      print "------------------------------------------------------------------------"
      t.v = u'Unicode String with eacute é'
      other.v = u'Unicode String with eacute é'
      t.other = unicode(other)
      print unicode(t())


      Execution output:

      $ python testWorking.py
      ------------------------------------------------------------------------
      Main file with Unicode String

      Other file with Unicode String


      ------------------------------------------------------------------------
      Main file with Unicode String with eacute é

      Other file with Unicode String


      ------------------------------------------------------------------------
      Main file with Unicode String with eacute é

      Other file with Unicode String with eacute é


      However if I add a Unicode char in main.tmpl, there is another failure:


      ------------------------------------------------------------------------
      main.tmpl
      ------------------------------------------------------------------------
      Main file with $v and another eacute é

      $other


      Execution output:

      $ python testWorking.py
      ------------------------------------------------------------------------
      Traceback (most recent call last):
      File "testWorking.py", line 13, in <module>
      print unicode(t())
      File "/var/lib/python-support/python2.5/Cheetah/Template.py", line
      981, in __str__
      def __str__(self): return getattr(self, mainMethName)()
      File "main_tmpl.py", line 100, in respond
      File "/var/lib/python-support/python2.5/Cheetah/DummyTransaction.py",
      line 31, in getvalue
      return ''.join(outputChunks)
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
      20: ordinal not in range(128)


      Maybe something more elaborate could be done in Filter to avoid this
      nasty side effect.

      If possible I'd like to avoid hardcoding UTF-8 in sitecustomize.py,
      this is an ugly hack!

      Thanks in advance for your help!
      --
      Jean-Baptiste Quenot
      http://jbq.caraldi.com/

      ------------------------------------------------------------------------------
      This SF.net email is sponsored by:
      SourcForge Community
      SourceForge wants to tell your story.
      http://p.sf.net/sfu/sf-spreadtheword
      _______________________________________________
      Cheetahtemplate-discuss mailing list
      Cheetahtemplate-discuss@...
      https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss
    • R. Tyler Ballance
      ... Interesting to say the least, can you zip up your templates and your test cases and either email them to me directly or pass it along to the list. I m not
      Message 2 of 8 , Mar 19, 2009
      • 0 Attachment
        On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
        > Dear all,
        >
        > I'm using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to
        > set template variables that reference other template instances. And
        > when the nested template contains Unicode strings, Cheetah complains.
        >
        > Here is a detailed example with a set of three files: testFailing.py
        > contains the Python code, main.tmpl is the main template, and
        > other.tmpl is the nested template:

        Interesting to say the least, can you zip up your templates and your
        test cases and either email them to me directly or pass it along to the
        list.

        I'm not sure if I can fix this in time for v2.1.1, but some of the
        unicode woes of Cheetah have come back to bite us in the ass at Slide as
        well, so I'd like to squash those that we know about if possible.

        Nice work tracking down the issue though, impressive (I'll buy you a
        beer if you stop by downtown San Francisco some time :))


        Cheers


        >
        > -------------------------------------------------------------------------
        > testFailing.py
        > -----------------------------------------------------------------------
        > # -*- coding: utf8 -*-
        >
        > """
        > This Python snippet shows that Cheetah calls str(t) even when t is a
        > Cheetah Template containing unicode chunks
        > """
        >
        > from Cheetah.Template import Template
        >
        > t = Template.compile(file="main.tmpl")
        > otherT = Template.compile(file="other.tmpl")
        > other = otherT()
        > t.other = other
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String'
        > t.other.v = u'Unicode String'
        > print unicode(t())
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String with eacute ?'
        > t.other.v = u'Unicode String'
        > print unicode(t())
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String with eacute ?'
        > t.other.v = u'Unicode String with eacute ?'
        > print unicode(t())
        >
        >
        > -------------------------------------------------------------------------
        > main.tmpl
        > -----------------------------------------------------------------------
        > Main file with $v
        >
        > $other
        >
        >
        > -------------------------------------------------------------------------
        > other.tmpl
        > -----------------------------------------------------------------------
        > Other file with $v
        >
        >
        >
        > Here is the execution output:
        >
        > $ python testFailing.py
        > ------------------------------------------------------------------------
        > Main file with Unicode String
        >
        > Other file with Unicode String
        >
        >
        > ------------------------------------------------------------------------
        > Main file with Unicode String with eacute ?
        >
        > Other file with Unicode String
        >
        >
        > ------------------------------------------------------------------------
        > Traceback (most recent call last):
        > File "testFailing.py", line 27, in <module>
        > print unicode(t())
        > File "/var/lib/python-support/python2.5/Cheetah/Template.py", line
        > 981, in __str__
        > def __str__(self): return getattr(self, mainMethName)()
        > File "main_tmpl.py", line 93, in respond
        > File "/var/lib/python-support/python2.5/Cheetah/Filters.py", line
        > 51, in filter
        > filtered = str(val)
        > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
        > position 43: ordinal not in range(128)
        >
        >
        > As you can see, Cheetah.Filters.Filter is trying to cast the template
        > instance of "other.tmpl" which has Unicode content to a String. If I
        > cast the template beforehand to Unicode it works:
        >
        > -------------------------------------------------------------------------
        > testWorking.py
        > -----------------------------------------------------------------------
        > # -*- coding: utf8 -*-
        >
        > from Cheetah.Template import Template
        >
        > t = Template.compile(file="main.tmpl")
        > otherT = Template.compile(file="other.tmpl")
        > other = otherT()
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String'
        > other.v = u'Unicode String'
        > t.other = unicode(other)
        > print unicode(t())
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String with eacute ?'
        > other.v = u'Unicode String'
        > t.other = unicode(other)
        > print unicode(t())
        >
        > print "------------------------------------------------------------------------"
        > t.v = u'Unicode String with eacute ?'
        > other.v = u'Unicode String with eacute ?'
        > t.other = unicode(other)
        > print unicode(t())
        >
        >
        > Execution output:
        >
        > $ python testWorking.py
        > ------------------------------------------------------------------------
        > Main file with Unicode String
        >
        > Other file with Unicode String
        >
        >
        > ------------------------------------------------------------------------
        > Main file with Unicode String with eacute ?
        >
        > Other file with Unicode String
        >
        >
        > ------------------------------------------------------------------------
        > Main file with Unicode String with eacute ?
        >
        > Other file with Unicode String with eacute ?
        >
        >
        > However if I add a Unicode char in main.tmpl, there is another failure:
        >
        >
        > ------------------------------------------------------------------------
        > main.tmpl
        > ------------------------------------------------------------------------
        > Main file with $v and another eacute ?
        >
        > $other
        >
        >
        > Execution output:
        >
        > $ python testWorking.py
        > ------------------------------------------------------------------------
        > Traceback (most recent call last):
        > File "testWorking.py", line 13, in <module>
        > print unicode(t())
        > File "/var/lib/python-support/python2.5/Cheetah/Template.py", line
        > 981, in __str__
        > def __str__(self): return getattr(self, mainMethName)()
        > File "main_tmpl.py", line 100, in respond
        > File "/var/lib/python-support/python2.5/Cheetah/DummyTransaction.py",
        > line 31, in getvalue
        > return ''.join(outputChunks)
        > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
        > 20: ordinal not in range(128)
        >
        >
        > Maybe something more elaborate could be done in Filter to avoid this
        > nasty side effect.
        >
        > If possible I'd like to avoid hardcoding UTF-8 in sitecustomize.py,
        > this is an ugly hack!
        >
        > Thanks in advance for your help!
        > --
        > Jean-Baptiste Quenot
        > http://jbq.caraldi.com/
        >
        > ------------------------------------------------------------------------------
        > This SF.net email is sponsored by:
        > SourcForge Community
        > SourceForge wants to tell your story.
        > http://p.sf.net/sfu/sf-spreadtheword
        > _______________________________________________
        > Cheetahtemplate-discuss mailing list
        > Cheetahtemplate-discuss@...
        > https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss

        --
        -R. Tyler Ballance
        Slide, Inc.
      • Jean-Baptiste Quenot
        2009/3/20 R. Tyler Ballance : On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote: Here is a detailed example with a set
        Message 3 of 8 , Mar 20, 2009
        • 0 Attachment
          2009/3/20 R. Tyler Ballance <tyler@...>:
          > On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
          >> Here is a detailed example with a set of three files: testFailing.py
          >> contains the Python code, main.tmpl is the main template, and
          >> other.tmpl is the nested template:
          >
          > Interesting to say the least, can you zip up your templates and your
          > test cases and either email them to me directly or pass it along to the
          > list.

          I reworked the tests to be self-contained, and to change only one
          thing at a time between tests.

          Here is how I run them:

          1) Make sure sitecustomize.py does not call sys.setdefaultencoding(),
          otherwise all the tests pass. sys.getdefaultencoding() must return
          'ascii'.

          2) for i in * ; do echo; echo; echo "Running $i"; env -
          LC_CTYPE=en_US.UTF-8 python $i ; done

          Please find attached both the test files, and the output of the above
          command recorded with "script". Don't ask me why, if I redirect
          output to a file, I get other error messages, maybe because in that
          case the LC environment variables are ignored.

          > Nice work tracking down the issue though, impressive (I'll buy you a
          > beer if you stop by downtown San Francisco some time :))

          That's nice of you, I love this city, I wish I could go! Last time I
          went was in 1990 :-|

          Cheers,
          --
          Jean-Baptiste Quenot
          http://jbq.caraldi.com/
        • R. Tyler Ballance
          ... The tests work great! I reformatted them into proper PyUnit tests and added them into src/Tests/Unicode.py in the next branch of the Git repository :
          Message 4 of 8 , Mar 23, 2009
          • 0 Attachment
            On Fri, Mar 20, 2009 at 01:00:46PM +0100, Jean-Baptiste Quenot wrote:
            > 2009/3/20 R. Tyler Ballance <tyler@...>:
            > > On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
            > >> Here is a detailed example with a set of three files: testFailing.py
            > >> contains the Python code, main.tmpl is the main template, and
            > >> other.tmpl is the nested template:
            > >
            > > Interesting to say the least, can you zip up your templates and your
            > > test cases and either email them to me directly or pass it along to the
            > > list.
            >
            > I reworked the tests to be self-contained, and to change only one
            > thing at a time between tests.

            The tests work great! I reformatted them into "proper" PyUnit tests and
            added them into src/Tests/Unicode.py in the "next" branch of the Git
            repository : http://github.com/rtyler/cheetah/tree/next

            I haven't fixed them yet, but I will update this thread when I do (the
            bug doesn't look like it will be a fun one to squash)


            Cheers
            --
            -R. Tyler Ballance
            Slide, Inc.
          • R. Tyler Ballance
            I ve included far more explaination in the patch than I likely should have, but I feel it s important to share the how s and why of the issue at hand. Anyways,
            Message 5 of 8 , Mar 25, 2009
            • 0 Attachment
              I've included far more explaination in the patch than I likely should have, but I feel it's important
              to share the how's and why of the issue at hand.

              Anyways, this failing test has been fixed:
              class JPQ_UTF8_Test3(unittest.TestCase):
              def runTest(self):
              t = Template.compile(source="""Main file with |$v|

              $other""")

              otherT = Template.compile(source="Other template with |$v|")
              other = otherT()
              t.other = other

              t.v = u'Unicode String with eacute é'
              t.other.v = u'Unicode String and an eacute é'

              assert unicode(t())


              See the patch for more details, but the fix is relatively logical and should be safe for everybody (famous last words!).

              Cheers,
              -R. Tyler Ballance

              ------------------------------------------------------------------------------
              _______________________________________________
              Cheetahtemplate-discuss mailing list
              Cheetahtemplate-discuss@...
              https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss
            • R. Tyler Ballance
              ... (Please see my important questions below) I ve been pondering the last test you provided (#5) for a few days, testing my understanding of Unicode, and the
              Message 6 of 8 , Mar 26, 2009
              • 0 Attachment
                On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
                > I'm using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to
                > set template variables that reference other template instances. And
                > when the nested template contains Unicode strings, Cheetah complains.
                >
                > Here is a detailed example with a set of three files: testFailing.py
                > contains the Python code, main.tmpl is the main template, and
                > other.tmpl is the nested template:
                (Please see my important questions below)

                I've been pondering the last test you provided (#5) for a few days,
                testing my understanding of Unicode, and the "magic" and insanity of
                string objects in Python.

                The conclusion I've come to is that the *only* way we will ever have
                sane support for Unicode is by re-defining what the "#encoding"
                directive means and treating all source code within Cheetah as unicode.


                What #encoding *currently* means, is that the generated Python will have a
                "-*- encoding" directive at the top of the module (nothing else).

                What I *want* is for ALL source within Cheetah to be handled as
                unicode() objects instead of a mix of unicode() and str() objects. This
                means the #encoding directive would be repurposed specifying the proper
                text encoding to decode the source from, i.e.:
                source = '''#encoding utf-8\n Oh hello there'''
                Would result in:
                source = source.decode('utf-8')


                The patch at the bottom of this email implements this functionality, but
                as my luck would have it, fixes Jean-Baptiste's issue while breaking the
                #encoding unit tests :)


                The main questions for you all are:
                * If/how you use the #encoding directive
                * If/how you use the #unicode directive
                * Would you be willing to help beta test this change?


                If you are interested in checking out this change and how it affects
                your code, I've committed and pushed it to the "unicode" branch
                (http://github.com/rtyler/cheetah/tree/unicode) which, pending community
                review, will likely get folded down into "next" for a v2.2 release.


                Please give me some feedback :D



                diff --git a/src/Compiler.py b/src/Compiler.py
                index 1d00ff6..33c8811 100644
                --- a/src/Compiler.py
                +++ b/src/Compiler.py
                @@ -1551,7 +1551,7 @@ class ModuleCompiler(SettingsManager, GenUtils):

                if source and file:
                raise TypeError("Cannot compile from a source string AND file.")
                - elif isinstance(file, (str, unicode)): # it's a filename.
                + elif isinstance(file, basestring): # it's a filename.
                f = open(file) # Raises IOError.
                source = f.read()
                f.close()
                @@ -1578,8 +1578,9 @@ class ModuleCompiler(SettingsManager, GenUtils):

                else:
                unicodeMatch = unicodeDirectiveRE.search(source)
                + encodingMatch = encodingDirectiveRE.match(source)
                if unicodeMatch:
                - if encodingDirectiveRE.match(source):
                + if encodingMatch:
                raise ParseError(
                self, "#encoding and #unicode are mutually exclusive! "
                "Use one or the other.")
                @@ -1587,8 +1588,13 @@ class ModuleCompiler(SettingsManager, GenUtils):
                if isinstance(source, str):
                encoding = unicodeMatch.group(1) or 'ascii'
                source = unicode(source, encoding)
                -
                - #print encoding
                + elif encodingMatch:
                + encodings = encodingMatch.groups()
                + if len(encodings):
                + encoding = encodings[0]
                + source = source.decode(encoding)
                + else:
                + source = unicode(source)

                if source.find('#indent') != -1: #@@TR: undocumented hack
                source = indentize(source)
                @@ -1807,6 +1813,8 @@ class ModuleCompiler(SettingsManager, GenUtils):
                self._moduleShBang = shBang

                def setModuleEncoding(self, encoding):
                + #print ('setModuleEncodiing

                --
                -R. Tyler Ballance
                Slide, Inc.
              • Jean-Baptiste Quenot
                Excellent, I m using the unicode branch now (using in my application without the insane sys.setdefaultencoding() in sitecustomize.py. Works great for me. ...
                Message 7 of 8 , Mar 27, 2009
                • 0 Attachment
                  Excellent, I'm using the unicode branch now (using in my application
                  without the insane sys.setdefaultencoding() in sitecustomize.py.
                  Works great for me.

                  2009/3/27 R. Tyler Ballance <tyler@...>:
                  > On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
                  >> I'm using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to
                  >> set template variables that reference other template instances.  And
                  >> when the nested template contains Unicode strings, Cheetah complains.
                  >>
                  >> Here is a detailed example with a set of three files: testFailing.py
                  >> contains the Python code, main.tmpl is the main template, and
                  >> other.tmpl is the nested template:
                  > (Please see my important questions below)
                  >
                  > I've been pondering the last test you provided (#5) for a few days,
                  > testing my understanding of Unicode, and the "magic" and insanity of
                  > string objects in Python.
                  >
                  > The conclusion I've come to is that the *only* way we will ever have
                  > sane support for Unicode is by re-defining what the "#encoding"
                  > directive means and treating all source code within Cheetah as unicode.
                  >
                  >
                  > What #encoding *currently* means, is that the generated Python will have a
                  > "-*- encoding" directive at the top of the module (nothing else).
                  >
                  > What I *want* is for ALL source within Cheetah to be handled as
                  > unicode() objects instead of a mix of unicode() and str() objects. This
                  > means the #encoding directive would be repurposed specifying the proper
                  > text encoding to decode the source from, i.e.:
                  >        source = '''#encoding utf-8\n Oh hello there'''
                  > Would result in:
                  >        source = source.decode('utf-8')
                  >
                  >
                  > The patch at the bottom of this email implements this functionality, but
                  > as my luck would have it, fixes Jean-Baptiste's issue while breaking the
                  > #encoding unit tests :)
                  >
                  >
                  > The main questions for you all are:
                  >        * If/how you use the #encoding directive
                  >        * If/how you use the #unicode directive
                  >        * Would you be willing to help beta test this change?
                  >
                  >
                  > If you are interested in checking out this change and how it affects
                  > your code, I've committed and pushed it to the "unicode" branch
                  > (http://github.com/rtyler/cheetah/tree/unicode) which, pending community
                  > review, will likely get folded down into "next" for a v2.2 release.
                  >
                  >
                  > Please give me some feedback :D
                  >
                  >
                  >
                  > diff --git a/src/Compiler.py b/src/Compiler.py
                  > index 1d00ff6..33c8811 100644
                  > --- a/src/Compiler.py
                  > +++ b/src/Compiler.py
                  > @@ -1551,7 +1551,7 @@ class ModuleCompiler(SettingsManager, GenUtils):
                  >
                  >         if source and file:
                  >             raise TypeError("Cannot compile from a source string AND file.")
                  > -        elif isinstance(file, (str, unicode)): # it's a filename.
                  > +        elif isinstance(file, basestring): # it's a filename.
                  >             f = open(file) # Raises IOError.
                  >             source = f.read()
                  >             f.close()
                  > @@ -1578,8 +1578,9 @@ class ModuleCompiler(SettingsManager, GenUtils):
                  >
                  >         else:
                  >             unicodeMatch = unicodeDirectiveRE.search(source)
                  > +            encodingMatch = encodingDirectiveRE.match(source)
                  >             if unicodeMatch:
                  > -                if encodingDirectiveRE.match(source):
                  > +                if encodingMatch:
                  >                     raise ParseError(
                  >                         self, "#encoding and #unicode are mutually exclusive! "
                  >                         "Use one or the other.")
                  > @@ -1587,8 +1588,13 @@ class ModuleCompiler(SettingsManager, GenUtils):
                  >                 if isinstance(source, str):
                  >                     encoding = unicodeMatch.group(1) or 'ascii'
                  >                     source = unicode(source, encoding)
                  > -
                  > -                #print encoding
                  > +            elif encodingMatch:
                  > +                encodings = encodingMatch.groups()
                  > +                if len(encodings):
                  > +                    encoding = encodings[0]
                  > +                    source = source.decode(encoding)
                  > +            else:
                  > +                source = unicode(source)
                  >
                  >         if source.find('#indent') != -1: #@@TR: undocumented hack
                  >             source = indentize(source)
                  > @@ -1807,6 +1813,8 @@ class ModuleCompiler(SettingsManager, GenUtils):
                  >         self._moduleShBang = shBang
                  >
                  >     def setModuleEncoding(self, encoding):
                  > +        #print ('setModuleEncodiing
                  >
                  > --
                  > -R. Tyler Ballance
                  > Slide, Inc.
                  >



                  --
                  Jean-Baptiste Quenot
                  http://jbq.caraldi.com/

                  ------------------------------------------------------------------------------
                  _______________________________________________
                  Cheetahtemplate-discuss mailing list
                  Cheetahtemplate-discuss@...
                  https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss
                • R. Tyler Ballance
                  ... That s wonderful :) I m curious however what sort schedule we should put the unicode branch on in order to have it properly tested by as many people as
                  Message 8 of 8 , Mar 29, 2009
                  • 0 Attachment
                    On Fri, Mar 27, 2009 at 02:52:57PM +0100, Jean-Baptiste Quenot wrote:
                    > Excellent, I'm using the unicode branch now (using in my application
                    > without the insane sys.setdefaultencoding() in sitecustomize.py.
                    > Works great for me.

                    That's wonderful :)

                    I'm curious however what sort schedule we should put the unicode branch
                    on in order to have it properly tested by as many people as possible
                    before we release such a wide spread change.

                    I fixed one of the issues in the "next" branch so that will go out on
                    April 16th, but the other will have to wait for the "unicode" branch.

                    I'm wondering if those redhat chaps could test it? Our test suite is by
                    no means conclusive and I fear they're not enough to properly vet a
                    release prior to packaging.


                    Cheers

                    > 2009/3/27 R. Tyler Ballance <tyler@...>:
                    > > On Mon, Jan 12, 2009 at 10:07:25PM +0100, Jean-Baptiste Quenot wrote:
                    > >> I'm using Cheetah 2.0.1 with Python 2.5.2. In my Python code I use to
                    > >> set template variables that reference other template instances. ?And
                    > >> when the nested template contains Unicode strings, Cheetah complains.
                    > >>
                    > >> Here is a detailed example with a set of three files: testFailing.py
                    > >> contains the Python code, main.tmpl is the main template, and
                    > >> other.tmpl is the nested template:
                    > > (Please see my important questions below)
                    > >
                    > > I've been pondering the last test you provided (#5) for a few days,
                    > > testing my understanding of Unicode, and the "magic" and insanity of
                    > > string objects in Python.
                    > >
                    > > The conclusion I've come to is that the *only* way we will ever have
                    > > sane support for Unicode is by re-defining what the "#encoding"
                    > > directive means and treating all source code within Cheetah as unicode.
                    > >
                    > >
                    > > What #encoding *currently* means, is that the generated Python will have a
                    > > "-*- encoding" directive at the top of the module (nothing else).
                    > >
                    > > What I *want* is for ALL source within Cheetah to be handled as
                    > > unicode() objects instead of a mix of unicode() and str() objects. This
                    > > means the #encoding directive would be repurposed specifying the proper
                    > > text encoding to decode the source from, i.e.:
                    > > ? ? ? ?source = '''#encoding utf-8\n Oh hello there'''
                    > > Would result in:
                    > > ? ? ? ?source = source.decode('utf-8')
                    > >
                    > >
                    > > The patch at the bottom of this email implements this functionality, but
                    > > as my luck would have it, fixes Jean-Baptiste's issue while breaking the
                    > > #encoding unit tests :)
                    > >
                    > >
                    > > The main questions for you all are:
                    > > ? ? ? ?* If/how you use the #encoding directive
                    > > ? ? ? ?* If/how you use the #unicode directive
                    > > ? ? ? ?* Would you be willing to help beta test this change?
                    > >
                    > >
                    > > If you are interested in checking out this change and how it affects
                    > > your code, I've committed and pushed it to the "unicode" branch
                    > > (http://github.com/rtyler/cheetah/tree/unicode) which, pending community
                    > > review, will likely get folded down into "next" for a v2.2 release.
                    > >
                    > >
                    > > Please give me some feedback :D
                    > >
                    > >
                    > >
                    > > diff --git a/src/Compiler.py b/src/Compiler.py
                    > > index 1d00ff6..33c8811 100644
                    > > --- a/src/Compiler.py
                    > > +++ b/src/Compiler.py
                    > > @@ -1551,7 +1551,7 @@ class ModuleCompiler(SettingsManager, GenUtils):
                    > >
                    > > ? ? ? ? if source and file:
                    > > ? ? ? ? ? ? raise TypeError("Cannot compile from a source string AND file.")
                    > > - ? ? ? ?elif isinstance(file, (str, unicode)): # it's a filename.
                    > > + ? ? ? ?elif isinstance(file, basestring): # it's a filename.
                    > > ? ? ? ? ? ? f = open(file) # Raises IOError.
                    > > ? ? ? ? ? ? source = f.read()
                    > > ? ? ? ? ? ? f.close()
                    > > @@ -1578,8 +1578,9 @@ class ModuleCompiler(SettingsManager, GenUtils):
                    > >
                    > > ? ? ? ? else:
                    > > ? ? ? ? ? ? unicodeMatch = unicodeDirectiveRE.search(source)
                    > > + ? ? ? ? ? ?encodingMatch = encodingDirectiveRE.match(source)
                    > > ? ? ? ? ? ? if unicodeMatch:
                    > > - ? ? ? ? ? ? ? ?if encodingDirectiveRE.match(source):
                    > > + ? ? ? ? ? ? ? ?if encodingMatch:
                    > > ? ? ? ? ? ? ? ? ? ? raise ParseError(
                    > > ? ? ? ? ? ? ? ? ? ? ? ? self, "#encoding and #unicode are mutually exclusive! "
                    > > ? ? ? ? ? ? ? ? ? ? ? ? "Use one or the other.")
                    > > @@ -1587,8 +1588,13 @@ class ModuleCompiler(SettingsManager, GenUtils):
                    > > ? ? ? ? ? ? ? ? if isinstance(source, str):
                    > > ? ? ? ? ? ? ? ? ? ? encoding = unicodeMatch.group(1) or 'ascii'
                    > > ? ? ? ? ? ? ? ? ? ? source = unicode(source, encoding)
                    > > -
                    > > - ? ? ? ? ? ? ? ?#print encoding
                    > > + ? ? ? ? ? ?elif encodingMatch:
                    > > + ? ? ? ? ? ? ? ?encodings = encodingMatch.groups()
                    > > + ? ? ? ? ? ? ? ?if len(encodings):
                    > > + ? ? ? ? ? ? ? ? ? ?encoding = encodings[0]
                    > > + ? ? ? ? ? ? ? ? ? ?source = source.decode(encoding)
                    > > + ? ? ? ? ? ?else:
                    > > + ? ? ? ? ? ? ? ?source = unicode(source)
                    > >
                    > > ? ? ? ? if source.find('#indent') != -1: #@@TR: undocumented hack
                    > > ? ? ? ? ? ? source = indentize(source)
                    > > @@ -1807,6 +1813,8 @@ class ModuleCompiler(SettingsManager, GenUtils):
                    > > ? ? ? ? self._moduleShBang = shBang
                    > >
                    > > ? ? def setModuleEncoding(self, encoding):
                    > > + ? ? ? ?#print ('setModuleEncodiing
                    > >
                    > > --
                    > > -R. Tyler Ballance
                    > > Slide, Inc.
                    > >
                    >
                    >
                    >
                    > --
                    > Jean-Baptiste Quenot
                    > http://jbq.caraldi.com/
                    >
                    > ------------------------------------------------------------------------------
                    > _______________________________________________
                    > Cheetahtemplate-discuss mailing list
                    > Cheetahtemplate-discuss@...
                    > https://lists.sourceforge.net/lists/listinfo/cheetahtemplate-discuss

                    --
                    -R. Tyler Ballance
                    Slide, Inc.
                  Your message has been successfully submitted and would be delivered to recipients shortly.