Loading ...
Sorry, an error occurred while loading the content.

Re: I have a one question at a method.(changes to source)

Expand Messages
  • enventa2000
    ... Errrr, my bad. This patch was throwing webalizer into an infinite loop every time it found a % character. Not quite the intended result. I have now
    Message 1 of 5 , Aug 21, 2004
    • 0 Attachment
      > I have made a small "patch".

      Errrr, my bad. This "patch" was throwing webalizer into an infinite
      loop every time it found a '%' character. Not quite the intended
      result. I have now converted the "else if" block into an "if" block
      and moved it a bit below. Now it works correctly:


      diff webalizer-2.01-10/webalizer.c webalizer-2.01-10_bueno/webalizer.c
      170,172d169
      < int chars_unicode = 0; /* counter for
      unicode strings */
      < int is_unicode = 0; /* Boolean for unicode
      strings */
      <
      1823d1819
      < is_unicode=0;
      1833,1839c1829
      < if (*cp1=='%')
      < {
      < is_unicode=1;
      < chars_unicode=0;
      < }
      < chars_unicode++;
      < if ( chars_unicode!=3 && is_unicode!=0 )
      *cp2++=tolower(*cp1); /* normal character if not unicode */
      ---
      > *cp2++=tolower(*cp1); /* normal
      character */
      diff webalizer-2.01-10/webalizer.h webalizer-2.01-10_bueno/webalizer.h
      222,224d221
      < extern int chars_unicode; /* counter for unicode
      strings */
      < extern int is_unicode; /* Boolean for unicode
      strings */
      <



      --- In webalizer@yahoogroups.com, Enric Naval <enventa2000@y...>
      wrote:
      > Hello:
      >
      > The URL encoding (using "&") doesn't distinguish upper
      > case from lower case. So, "&2B" is the same as "$2b".
      > Changing everything to lower case doesn't change
      > anything. Mr.Barret probably does this to make
      > translation faster, but the encoding remains the same.
      >
      >
      > Problem comes with this encoding, am I right?:
      > "%8eo" --- lower o "o"
      > "%8eO" --- upper o "O"
      >
      > I have made a small "patch". I have added some lines,
      > so the line numbers are wrong. You can add the
      > modifications to your source by hand.
      >
      > The first added line defines a int variable
      > (chars_unicode) to count how many characters are we
      > far from the last "%" character found. The second
      > added line defines an int variable (is_unicode) that
      > will be used as if it was a boolean. By default the
      > compiler will set it to zero (false). The "else if"
      > block is only entered when a "%" character is found in
      > the string. It sets "is_unicode to 1 (true), and
      > resets the unicode counter to zero. Before line 1829 I
      > increase the unicode counter, and in line 1829 itself
      > I have added a condition to prevent lower casing in
      > characters three positions away from "%". This way
      > this code will perform this transformation, where "%",
      > "8" and "E" have had "tolowercase" executed in them,
      > but "O" hasn't, because chars_unicode is equal to 3:
      >
      > "%8EO" --- "%8eO"
      >
      > This allows URL encoding to be lowercased, but
      > prevents these unicode strings from being translated
      > too. Please let me know if it worked correctly for
      > you!
      >
      > You can make the changes by hand. I have also made a
      > small patch that makes things a little bit better,
      > defining variables in webalizer.h, so the program
      > doesn't have to define them every time it executes
      > srch_string:
      >
      > http://griho.udl.es/webalizer/unicode.patch.txt
      >
      >
      > 1800 void srch_string(char *ptr)
      > 1801 {
      > int chars_unicode;
      > int is_unicode;
      > 1820 while (*cp1!='&' && *cp1!=0)
      > 1821 {
      > 1822 if (*cp1=='"' || *cp1==',' || *cp1=='?')
      > 1823 { cp1++; continue; }
      >
      > else if (*cp1=='%')
      > {
      > is_unicode=1;
      > chars_unicode=0;
      > }
      > 1824 else
      > 1825 {
      > 1826 if (*cp1=='+') *cp1=' ';
      >
      > 1827 if (sp_flg && *cp1==' ') { cp1++;
      > continue; }
      > 1828 if (*cp1==' ') sp_flg=1; else sp_flg=0;
      >
      > chars_unicode++;
      > 1829 if ( chars_unicode!=3 && !is_unicode )
      > *cp2++=tolower(*cp1);
      > 1830 cp1++;
      > 1831 }
      > 1832 }
      >
      >
      >
      > Here you have the patch if you want to copy&paste:
      >
      > diff webalizer-2.01-10/webalizer.c
      > webalizer-2.01-10_original/webalizer.c
      > 170,172d169
      > < int chars_unicode =0; /*
      > counter for unicode strings */
      > < int is_unicode; =0; /*
      > Boolean for unicode strings */
      > <
      > 1823d1819
      > < is_unicode=0;
      > 1828,1832d1823
      > < else if (*cp1=='%')
      > < {
      > < is_unicode=1;
      > < chars_unicode=0;
      > < }
      > 1838,1839c1829
      > < chars_unicode++;
      > < if ( chars_unicode!=3 && !is_unicode )
      > *cp2++=tolower(*cp1); /* normal character if not
      > unicode */
      > ---
      > > *cp2++=tolower(*cp1);
      > /* normal character */
      > diff webalizer-2.01-10/webalizer.h
      > webalizer-2.01-10_original/webalizer.h
      > 222,224d221
      > < extern int chars_unicode; /*
      > counter for unicode strings */
      > < extern int is_unicode; /*
      > Boolean for unicode strings */
      > <
      >
      >
      >
      >
      >
      >
      > --- hideyuki nakano <hnakano@f...> wrote:
      >
      > > Hi, all
      > >
      > > I have a one question.
      > >
      > > Serach string is URLencoded at access_log as
      > > follows.
      > > ¡¡
      > > ¡¡A) %83%8b%83p%83%93%8eO%90%a2
      > >
      > > The above string is changed through webalizer as
      > > follows.
      > >
      > > B) %83%8b%83p%83%93%8eo%90%a2
      > >
      > > Difference betwee A and B is O OR o.
      > >
      > > Through webalizer A-Z is changed a-z.
      > >
      > > Is it method? or Have I a way out.
      > >
      > > Help.
      > >
      > >
      > >
      > >
      > > ================ webalizer.c(V2.01)
      > > ====================
      > > 1796 /*********************************************/
      > > 1797 /* SRCH_STRING - get search strings from ref */
      > > 1798 /*********************************************/
      > > 1799
      > > 1800 void srch_string(char *ptr)
      > > 1801 {
      > > 1820 while (*cp1!='&' && *cp1!=0)
      > > 1821 {
      > > 1822 if (*cp1=='"' || *cp1==',' || *cp1=='?')
      > > 1823 { cp1++; continue; }
      > >
      > > 1824 else
      > > 1825 {
      > > 1826 if (*cp1=='+') *cp1=' ';
      > >
      > > 1827 if (sp_flg && *cp1==' ') { cp1++;
      > > continue; }
      > > 1828 if (*cp1==' ') sp_flg=1; else
      > > sp_flg=0;
      > > ¡ú¡ú¡ú1829 *cp2++=tolower(*cp1);
      > >
      > > 1830 cp1++;
      > > 1831 }
      > > 1832 }
      > >
      > ========================================================
      > >
      > >
      > >
      > >
      > >
      >
      >
      > =====
      > Enric Naval
      > Estudiante de Informática de Gestión en la Udl (Lleida)
      > GRIHO webalizer.conf
      > http://griho.udl.es/webalizer/webalizer.conf.txt
    • enventa2000
      Sorry again. This time the patch works correctly, and I have tested it in several different logs. This is the last message about this. I have been able to see
      Message 2 of 5 , Aug 21, 2004
      • 0 Attachment
        Sorry again. This time the patch works correctly, and I have tested it
        in several different logs. This is the last message about this.

        I have been able to see that half the people searchs for "AIPO" while
        half the other searches for "aipo".


        http://griho.udl.es/webalizer/unicode.patch.txt


        diff webalizer-2.01-10_bueno/webalizer.c webalizer-2.
        01-10_unicode/webalizer.c
        169a170,172
        > int chars_unicode = 0; /* counter for
        unicode strings */
        > int is_unicode = 0; /* Boolean for unicode
        strings */
        >
        1819a1823,1824
        > is_unicode=0;
        > chars_unicode=0;
        1829c1834,1841
        < *cp2++=tolower(*cp1); /* normal
        character */
        ---
        > if (*cp1=='%')
        > {
        > is_unicode=1;
        > chars_unicode=0;
        > }
        > if ( chars_unicode!=3 && is_unicode!=0 ) {
        *cp2++=tolower(*cp1); } /* normal character if not unicode */
        > else *cp2++=*cp1;
        > chars_unicode++;
        Only in webalizer-2.01-10_unicode/: webalizer.c~
        diff webalizer-2.01-10_bueno/webalizer.h webalizer-2.
        01-10_unicode/webalizer.h
        221a222,224
        > extern int chars_unicode; /* counter for unicode
        strings */
        > extern int is_unicode; /* Boolean for unicode
        strings */
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.