Loading ...
Sorry, an error occurred while loading the content.

question regarding content search in array with PHP

Expand Messages
  • ik
    Hello list, I have an array that for the example will look as follows: linux, windows, mac os, beos, os/2, freebsd, netbsd, irix .... And a string: linux
    Message 1 of 5 , Dec 18, 2007
    • 0 Attachment
      Hello list,

      I have an array that for the example will look as follows:

      linux, windows, mac os, beos, os/2, freebsd, netbsd, irix ....

      And a string:

      "linux kernel 2.6.24"

      Now I wish to find the closest value of the string in the array -> "linux".
      But the problem as you can see, is that the string contain more
      content then the array value.
      I can think on many unefficient ways to find the part of string as an
      array value, but I'm looking for an efficient way to do it. specially
      if I need to serve a lot of strings on after the other...
      The problem is, that I must use PHP (because I can think on an answer
      using Perl, that I haven't tested).

      Any ideas how I can do it in an efficient way ?

      Thanks,
      Ido
      --
      http://ik.homelinux.org/
    • Omer Zak
      Since you want to use PHP, I assume that you want to do this when processing Web requests. Therefore, I would store the array as a DB (such as MySQL) table.
      Message 2 of 5 , Dec 18, 2007
      • 0 Attachment
        Since you want to use PHP, I assume that you want to do this when
        processing Web requests.
        Therefore, I would store the array as a DB (such as MySQL) table.

        The table will have two columns:
        - Word (indexed, non-unique key)
        - FullString

        For 'linux', you'll have
        'linux','linux'

        For 'mac os', you'll have
        'mac','mac os'
        'os','mac os'

        Then, when you get the string from the user, split it into words and
        query for full strings corresponding to each word (using, for example:
        SELECT FullString FROM StringsTable WHERE Word='os';
        Of course, you should use the appropriate prepared queries to guard
        against SQL injection vulnerabilities).

        Now you'll have small number of full strings to be considered, and can
        apply your complicated and inefficient heuristic algorithm to determine
        which full string corresponds best to the user's string.

        --- Omer

        On Tue, 2007-12-18 at 19:06 +0200, ik wrote:
        > Hello list,
        >
        > I have an array that for the example will look as follows:
        >
        > linux, windows, mac os, beos, os/2, freebsd, netbsd, irix ....
        >
        > And a string:
        >
        > "linux kernel 2.6.24"
        >
        > Now I wish to find the closest value of the string in the array -> "linux".
        > But the problem as you can see, is that the string contain more
        > content then the array value.
        > I can think on many unefficient ways to find the part of string as an
        > array value, but I'm looking for an efficient way to do it. specially
        > if I need to serve a lot of strings on after the other...
        > The problem is, that I must use PHP (because I can think on an answer
        > using Perl, that I haven't tested).
        >
        > Any ideas how I can do it in an efficient way ?

        --
        MS-Windows is the Pal-Kal of the PC world.
        My own blog is at http://www.zak.co.il/tddpirate/

        My opinions, as expressed in this E-mail message, are mine alone.
        They do not represent the official policy of any organization with which
        I may be affiliated in any way.
        WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
      • Chen Shapira
        If you don t need persistence for the array, don t use a DB such as MySQL. DBs write everything to the disk. They are highly optimized to never ever lose data,
        Message 3 of 5 , Dec 18, 2007
        • 0 Attachment
          If you don't need persistence for the array, don't use a DB such as
          MySQL. DBs write everything to the disk. They are highly optimized to
          never ever lose data, so they write to disk a lot. Then they read from
          disk. IO operations are slow. So slow that you are probably better off
          with the old algorithm.
          If you want to go the DB way (and I certainly hope you'll find a
          better way), try an in-memory database. As an Oracle DBA, I'll
          recommend TimesTen - Oracle's in-memory database, but I'm sure you can
          find an open-source solution as well.

          Chen


          On Dec 18, 2007 9:45 AM, Omer Zak <w1@...> wrote:
          >
          >
          >
          >
          >
          >
          > Since you want to use PHP, I assume that you want to do this when
          > processing Web requests.
          > Therefore, I would store the array as a DB (such as MySQL) table.
          >
          > The table will have two columns:
          > - Word (indexed, non-unique key)
          > - FullString
          >
          > For 'linux', you'll have
          > 'linux','linux'
          >
          > For 'mac os', you'll have
          > 'mac','mac os'
          > 'os','mac os'
          >
          > Then, when you get the string from the user, split it into words and
          > query for full strings corresponding to each word (using, for example:
          > SELECT FullString FROM StringsTable WHERE Word='os';
          > Of course, you should use the appropriate prepared queries to guard
          > against SQL injection vulnerabilities).
          >
          > Now you'll have small number of full strings to be considered, and can
          > apply your complicated and inefficient heuristic algorithm to determine
          > which full string corresponds best to the user's string.
          >
          > --- Omer
          >
          >
          > On Tue, 2007-12-18 at 19:06 +0200, ik wrote:
          > > Hello list,
          > >
          > > I have an array that for the example will look as follows:
          > >
          > > linux, windows, mac os, beos, os/2, freebsd, netbsd, irix ....
          > >
          > > And a string:
          > >
          > > "linux kernel 2.6.24"
          > >
          > > Now I wish to find the closest value of the string in the array ->
          > "linux".
          > > But the problem as you can see, is that the string contain more
          > > content then the array value.
          > > I can think on many unefficient ways to find the part of string as an
          > > array value, but I'm looking for an efficient way to do it. specially
          > > if I need to serve a lot of strings on after the other...
          > > The problem is, that I must use PHP (because I can think on an answer
          > > using Perl, that I haven't tested).
          > >
          > > Any ideas how I can do it in an efficient way ?
          >
          > --
          > MS-Windows is the Pal-Kal of the PC world.
          > My own blog is at http://www.zak.co.il/tddpirate/
          >
          > My opinions, as expressed in this E-mail message, are mine alone.
          > They do not represent the official policy of any organization with which
          > I may be affiliated in any way.
          > WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
          >
          >
        • ik
          First of all thank you for the answers. I found a solution thanks to Oded Arbel, without using database, and it is O(2*N) (if I remember the subject of
          Message 4 of 5 , Dec 19, 2007
          • 0 Attachment
            First of all thank you for the answers.

            I found a solution thanks to Oded Arbel, without using database, and
            it is O(2*N) (if I remember the subject of complexity correctly).

            It works as follows:

            foreach ($names as $name_index => $name_value) {
            foreach ($namelist as $list_index => $list_value) {
            if (strncmp($list_value, $name_value, strlen($list_value)) == 0) {
            $name_found[] = $name_value;
            }
            }
            }

            Ido

            On Dec 19, 2007 6:20 AM, Chen Shapira <cshapi@...> wrote:
            >
            >
            >
            >
            >
            >
            > If you don't need persistence for the array, don't use a DB such as
            > MySQL. DBs write everything to the disk. They are highly optimized to
            > never ever lose data, so they write to disk a lot. Then they read from
            > disk. IO operations are slow. So slow that you are probably better off
            > with the old algorithm.
            > If you want to go the DB way (and I certainly hope you'll find a
            > better way), try an in-memory database. As an Oracle DBA, I'll
            > recommend TimesTen - Oracle's in-memory database, but I'm sure you can
            > find an open-source solution as well.
            >
            > Chen
            >
            >
            >
            > On Dec 18, 2007 9:45 AM, Omer Zak <w1@...> wrote:
            > >
            > >
            > >
            > >
            > >
            > >
            > > Since you want to use PHP, I assume that you want to do this when
            > > processing Web requests.
            > > Therefore, I would store the array as a DB (such as MySQL) table.
            > >
            > > The table will have two columns:
            > > - Word (indexed, non-unique key)
            > > - FullString
            > >
            > > For 'linux', you'll have
            > > 'linux','linux'
            > >
            > > For 'mac os', you'll have
            > > 'mac','mac os'
            > > 'os','mac os'
            > >
            > > Then, when you get the string from the user, split it into words and
            > > query for full strings corresponding to each word (using, for example:
            > > SELECT FullString FROM StringsTable WHERE Word='os';
            > > Of course, you should use the appropriate prepared queries to guard
            > > against SQL injection vulnerabilities).
            > >
            > > Now you'll have small number of full strings to be considered, and can
            > > apply your complicated and inefficient heuristic algorithm to determine
            > > which full string corresponds best to the user's string.
            > >
            > > --- Omer
            > >
            > >
            > > On Tue, 2007-12-18 at 19:06 +0200, ik wrote:
            > > > Hello list,
            > > >
            > > > I have an array that for the example will look as follows:
            > > >
            > > > linux, windows, mac os, beos, os/2, freebsd, netbsd, irix ....
            > > >
            > > > And a string:
            > > >
            > > > "linux kernel 2.6.24"
            > > >
            > > > Now I wish to find the closest value of the string in the array ->
            > > "linux".
            > > > But the problem as you can see, is that the string contain more
            > > > content then the array value.
            > > > I can think on many unefficient ways to find the part of string as an
            > > > array value, but I'm looking for an efficient way to do it. specially
            > > > if I need to serve a lot of strings on after the other...
            > > > The problem is, that I must use PHP (because I can think on an answer
            > > > using Perl, that I haven't tested).
            > > >
            > > > Any ideas how I can do it in an efficient way ?
            > >
            > > --
            > > MS-Windows is the Pal-Kal of the PC world.
            > > My own blog is at http://www.zak.co.il/tddpirate/
            > >
            > > My opinions, as expressed in this E-mail message, are mine alone.
            > > They do not represent the official policy of any organization with which
            > > I may be affiliated in any way.
            > > WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html
            > >
            > >
            >



            --
            http://ik.homelinux.org/
          • Shlomi Fish
            ... O(2*N) complexity is equivalent to O(N). Maybe you ve meant O(N^2)? Your code s complexity seems to be O(N^2) or even O(N*M) where N and M are different
            Message 5 of 5 , Dec 22, 2007
            • 0 Attachment
              On Wednesday 19 December 2007, ik wrote:
              > First of all thank you for the answers.
              >
              > I found a solution thanks to Oded Arbel, without using database, and
              > it is O(2*N) (if I remember the subject of complexity correctly).
              >

              O(2*N) complexity is equivalent to O(N). Maybe you've meant O(N^2)? Your
              code's complexity seems to be O(N^2) or even O(N*M) where N and M are
              different scales.

              Now for some comments:

              > It works as follows:
              >
              > foreach ($names as $name_index => $name_value) {

              You're not making use of $name_index, so I think you can simply use:

              <<<
              foreach ($names as $name_value) {
              >>>

              > foreach ($namelist as $list_index => $list_value) {
              > if (strncmp($list_value, $name_value, strlen($list_value)) == 0) {
              > $name_found[] = $name_value;
              > }
              > }

              This code will add $name_value to $name_found multiple times - once for each
              matching value in $namelist. If you don't want that, add a "break".

              BTW, I found this code a bit hard to understand. I would choose a better
              variable naming if I were you.

              Regards,

              Shlomi Fish

              > }
              >
              > Ido
              >
              > On Dec 19, 2007 6:20 AM, Chen Shapira <cshapi@...> wrote:
              > > If you don't need persistence for the array, don't use a DB such as
              > > MySQL. DBs write everything to the disk. They are highly optimized to
              > > never ever lose data, so they write to disk a lot. Then they read from
              > > disk. IO operations are slow. So slow that you are probably better off
              > > with the old algorithm.
              > > If you want to go the DB way (and I certainly hope you'll find a
              > > better way), try an in-memory database. As an Oracle DBA, I'll
              > > recommend TimesTen - Oracle's in-memory database, but I'm sure you can
              > > find an open-source solution as well.
              > >
              > > Chen
              > >
              > > On Dec 18, 2007 9:45 AM, Omer Zak <w1@...> wrote:
              > > > Since you want to use PHP, I assume that you want to do this when
              > > > processing Web requests.
              > > > Therefore, I would store the array as a DB (such as MySQL) table.
              > > >
              > > > The table will have two columns:
              > > > - Word (indexed, non-unique key)
              > > > - FullString
              > > >
              > > > For 'linux', you'll have
              > > > 'linux','linux'
              > > >
              > > > For 'mac os', you'll have
              > > > 'mac','mac os'
              > > > 'os','mac os'
              > > >
              > > > Then, when you get the string from the user, split it into words and
              > > > query for full strings corresponding to each word (using, for example:
              > > > SELECT FullString FROM StringsTable WHERE Word='os';
              > > > Of course, you should use the appropriate prepared queries to guard
              > > > against SQL injection vulnerabilities).
              > > >
              > > > Now you'll have small number of full strings to be considered, and can
              > > > apply your complicated and inefficient heuristic algorithm to
              > > > determine which full string corresponds best to the user's string.
              > > >
              > > > --- Omer
              > > >
              > > > On Tue, 2007-12-18 at 19:06 +0200, ik wrote:
              > > > > Hello list,
              > > > >
              > > > > I have an array that for the example will look as follows:
              > > > >
              > > > > linux, windows, mac os, beos, os/2, freebsd, netbsd, irix ....
              > > > >
              > > > > And a string:
              > > > >
              > > > > "linux kernel 2.6.24"
              > > > >
              > > > > Now I wish to find the closest value of the string in the array ->
              > > >
              > > > "linux".
              > > >
              > > > > But the problem as you can see, is that the string contain more
              > > > > content then the array value.
              > > > > I can think on many unefficient ways to find the part of string as
              > > > > an array value, but I'm looking for an efficient way to do it.
              > > > > specially if I need to serve a lot of strings on after the other...
              > > > > The problem is, that I must use PHP (because I can think on an
              > > > > answer using Perl, that I haven't tested).
              > > > >
              > > > > Any ideas how I can do it in an efficient way ?
              > > >
              > > > --
              > > > MS-Windows is the Pal-Kal of the PC world.
              > > > My own blog is at http://www.zak.co.il/tddpirate/
              > > >
              > > > My opinions, as expressed in this E-mail message, are mine alone.
              > > > They do not represent the official policy of any organization with
              > > > which I may be affiliated in any way.
              > > > WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html

              ---------------------------------------------------------------------
              Shlomi Fish shlomif@...
              Homepage: http://www.shlomifish.org/

              I'm not an actor - I just play one on T.V.
            Your message has been successfully submitted and would be delivered to recipients shortly.