Loading ...
Sorry, an error occurred while loading the content.
 

[ai-geostats] Pareto vs Lognormal distribution

Expand Messages
  • Beatrice Mare-Jones
    Hello list I am a PhD student looking at developing a statistical model to predict the size-distribution of an area s oil and gas fields. It is clear that
    Message 1 of 6 , Aug 31, 2005
      Hello list

      I am a PhD student looking at developing a statistical model to predict
      the size-distribution of an area's oil and gas fields.

      It is clear that previous investigators prefer either a Pareto power law
      or a lognormal distribution to approximate field-size distributions.

      The data I am using does not look like it comes from a Pareto distribution
      - which I explain as being a result of undersampling - which previous
      investigators have reported - that undersampling occurs because the small
      fields are not sampled or recoded. However by using basin-modelling
      software to simulate oil and gas fields (for the same basin that my
      discovered empirical data comes from) I notice that this sample is also
      undersampled - that is fields under a certain size are not being simulated
      - which is probably due to the resolution of my input data but what is
      interesting is that the undersampling actually occurs throughout all the
      size ranges - including the medium to larger sizes - which I would not
      have expected. Like the discovery dataset (n = 25) the simulated dataset
      (n = 140) looks like it is more from a lognormal distribution than a
      Pareto distribution.

      My conclusion is that without being able to say that a Pareto is better
      than a lognormal and vise-versa it appears only logical to use both
      distributions.

      Geologically there does not seems to be a reason why a modal size (greater
      than what is detectable by exploration methods) of fields should exist -
      which would be the case if the data was from a lognormal distribution -
      except if the distribution is highly right skewed (at the small field
      size) and the mode is actually just below the detection of size.

      Geologically there does seem reason for fields to become so small that
      they become entities (that trap oil and gas) - and this relationship may
      be better approximated by a Pareto.


      The Pareto and lognormal form is similar but maybe one is better to
      approximate field sizes than the other.
      My question is do you think a Pareto distribution better approximates an
      oil and gas size distribution than a lognormal (or vise-versa) and if so
      why.


      I am currently working on goodness of fit test to throw some more light on
      this - but if anyone has any thing to say I'd appreciate some comments.

      Thank you,

      Kind regards

      Beatrice

      Geological and Nuclear Sciences
      New Zealand
      www.gns.cri.nz
    • Syed Shibli
      May I suggest that you look at other analogous datasets with n 25, e.g. the North Sea basin or Gulf of Mexico, before making some firm conclusions about
      Message 2 of 6 , Sep 1, 2005
        May I suggest that you look at other analogous datasets with n > 25, e.g. the North Sea basin or Gulf of Mexico, before making some firm conclusions about whether Pareto or Lognormal works best. A lot of this information is in the public domain, one can browse the Websites of the UK DTI, Norwegian Petroleum Directorate, or the Danish Energy Agency for public domain field information. Certainly, at first blush, the log normal seems to make more sense than the other, you have your few giant fields (Brent, Statfjord, Ekofisk, etc), lots of middle sized fields, and many more small pimples in the North Sea that have yet to be developed.

        Cheers

        Syed

        On Wednesday, August 31, 2005, at 11:33PM, Beatrice Mare-Jones <B.Mare-Jones@...> wrote:

        >Hello list
        >
        >I am a PhD student looking at developing a statistical model to predict
        >the size-distribution of an area's oil and gas fields.
        >
        >It is clear that previous investigators prefer either a Pareto power law
        >or a lognormal distribution to approximate field-size distributions.
        >
        >The data I am using does not look like it comes from a Pareto distribution
        >- which I explain as being a result of undersampling - which previous
        >investigators have reported - that undersampling occurs because the small
        >fields are not sampled or recoded. However by using basin-modelling
        >software to simulate oil and gas fields (for the same basin that my
        >discovered empirical data comes from) I notice that this sample is also
        >undersampled - that is fields under a certain size are not being simulated
        >- which is probably due to the resolution of my input data but what is
        >interesting is that the undersampling actually occurs throughout all the
        >size ranges - including the medium to larger sizes - which I would not
        >have expected. Like the discovery dataset (n = 25) the simulated dataset
        >(n = 140) looks like it is more from a lognormal distribution than a
        >Pareto distribution.
        >
        >My conclusion is that without being able to say that a Pareto is better
        >than a lognormal and vise-versa it appears only logical to use both
        >distributions.
        >
        >Geologically there does not seems to be a reason why a modal size (greater
        >than what is detectable by exploration methods) of fields should exist -
        >which would be the case if the data was from a lognormal distribution -
        >except if the distribution is highly right skewed (at the small field
        >size) and the mode is actually just below the detection of size.
        >
        >Geologically there does seem reason for fields to become so small that
        >they become entities (that trap oil and gas) - and this relationship may
        >be better approximated by a Pareto.
        >
        >
        >The Pareto and lognormal form is similar but maybe one is better to
        >approximate field sizes than the other.
        >My question is do you think a Pareto distribution better approximates an
        >oil and gas size distribution than a lognormal (or vise-versa) and if so
        >why.
        >
        >
        >I am currently working on goodness of fit test to throw some more light on
        >this - but if anyone has any thing to say I'd appreciate some comments.
        >
        >Thank you,
        >
        >Kind regards
        >
        >Beatrice
        >
        >Geological and Nuclear Sciences
        >New Zealand
        >www.gns.cri.nz
        >
        >
        >* By using the ai-geostats mailing list you agree to follow its rules
        >( see http://www.ai-geostats.org/help_ai-geostats.htm )
        >
        >* To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...
        >
        >Signoff ai-geostats
        >
      • Chris Hlavka
        Beatrice - This is a vexing problem that I ve tried to deal with in sizes of features in satellite imagery (Hlavka, C. A. and J. L. Dungan, 2002. Areal
        Message 3 of 6 , Sep 1, 2005
          Re: [ai-geostats] Pareto vs Lognormal distribution
          Beatrice - This is a vexing problem that I've tried to deal with in sizes of features in satellite imagery (Hlavka, C. A. and J. L. Dungan, 2002.  Areal estimates of fragmented land cover - effects of pixel size and model-based corrections. International Journal of Remote Sensing23(4): 711-724.)  The affine (count versus continuous) nature of the digital imagery is at least part of the problem.  I've used probability plots to assess type of distribution.

          In gas field work, there is evidence that the apparent lognormality of field-sizes is due to lower rates of discovery of smaller fields than larger fields - especially for older surveys.   It has been noted that newer field data was closer to Pareto than older data and thus inferred that the actual distribution is Pareto.  -- Chris



          Hello list

          I am a PhD student looking at developing a statistical model to predict
          the size-distribution of an area's oil and gas fields.

          It is clear that previous investigators prefer either a Pareto power law
          or a lognormal distribution to approximate field-size distributions.

          The data I am using does not look like it comes from a Pareto distribution
          - which I explain as being a result of undersampling - which previous
          investigators have reported - that undersampling occurs because the small
          fields are not sampled or recoded.  However by using basin-modelling
          software to simulate oil and gas fields (for the same basin that my
          discovered empirical data comes from) I notice that this sample is also
          undersampled - that is fields under a certain size are not being simulated
          - which is probably due to the resolution of my input data but what is
          interesting is that the undersampling actually occurs throughout all the
          size ranges - including the medium to larger sizes - which I would not
          have expected.  Like the discovery dataset (n = 25)  the simulated dataset
          (n = 140) looks like it is more from a lognormal distribution than a
          Pareto distribution.

          My conclusion is that without being able to say that a Pareto is better
          than a lognormal and vise-versa it appears only logical to use both
          distributions.

          Geologically there does not seems to be a reason why a modal size (greater
          than what is detectable by exploration methods) of fields should exist  -
          which would be the case if the data was from a  lognormal distribution -
          except if the distribution is highly right skewed (at the small field
          size) and the mode is actually just below the detection of size.

          Geologically there does seem reason for fields to become so small that
          they become entities (that trap oil and gas)  - and this relationship may
          be better approximated by a Pareto.


          The Pareto and lognormal form is similar but maybe one is better to
          approximate field sizes than the other.
          My question is do you think a Pareto distribution better approximates an
          oil and gas size distribution than a  lognormal (or vise-versa) and if so
          why.


          I am currently working on goodness of fit test to throw some more light on
          this - but if anyone has any thing to say I'd appreciate some comments.

          Thank you,

          Kind regards

          Beatrice

          Geological and Nuclear Sciences
          New Zealand
          www.gns.cri.nz


          * By using the ai-geostats mailing list you agree to follow its rules
          ( see http://www.ai-geostats.org/help_ai-geostats.htm )

          * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...

          Signoff ai-geostats


          -- 
          
          ***************************************
          Chris Hlavka
          NASA/Ames Research Center 242-4
          Moffett Field, CA 94035-1000
          (650)604-3328  FAX 604-4680
          Christine.A.Hlavka@...
          ***************************************
        • Ted Harding
          I m intruding into foreign territory here, since I don t have experience in exploring for gas fields etc., (though I have had to deal with patches of
          Message 4 of 6 , Sep 2, 2005
            I'm intruding into foreign territory here, since I don't have
            experience in exploring for gas fields etc., (though I have had
            to deal with patches of contamination, which is the same sort
            of thing on a small scale). So apologies if I blunder around
            and tread on toes!

            Be that as it may, a point which has occurred to me in reading
            this thread is that the distribution being observed is the
            distribution of size conditional on being discovered, and the
            probability of being discovered may be expected to increase
            with size.

            So the frequency f(x) of occurrence of a size x in nature is
            attenuated by a factor equal to the probability that an item
            of size x will be discovered -- g(x) say. The frequency of x
            in the observed data is f(x | D), say, and so

            f(x | D) = f(x)*g(x)

            from which f(x) = f(x | D)/g(x). From this point, provided
            there is a reasonaboly justifiable model for g(x) (to within
            a constant of proportionality, e.g. simply g(x) = x), you can
            "demodulate" the observed data to infer the "wild" data.

            There has for many decades been a similar problem in classical
            geometrical probability (the ancestor of spatial statistics
            and morphometry), namely to infer the distribution of (e.g.)
            areas of cells given the observed distribution of the sizes
            of transects by lines, or of counts of sampling points intersecting
            them, leading to an integral equation.

            Maybe all this is old hat in the areas you are investigating,
            but since it did not seem to be even implicit in the discussion
            so far I thought I would bring it to the surface.

            Best wishes to all,
            Ted.

            On 01-Sep-05 Chris Hlavka wrote:
            > Beatrice - This is a vexing problem that I've tried to deal with in
            > sizes of features in satellite imagery (Hlavka, C. A. and J. L.
            > Dungan, 2002. Areal estimates of fragmented land cover - effects of
            > pixel size and model-based corrections. International Journal of
            > Remote Sensing23(4): 711-724.) The affine (count versus continuous)
            > nature of the digital imagery is at least part of the problem. I've
            > used probability plots to assess type of distribution.
            >
            > In gas field work, there is evidence that the apparent lognormality
            > of field-sizes is due to lower rates of discovery of smaller fields
            > than larger fields - especially for older surveys. It has been
            > noted that newer field data was closer to Pareto than older data and
            > thus inferred that the actual distribution is Pareto. -- Chris
            >
            >
            >
            >>Hello list
            >>
            >>I am a PhD student looking at developing a statistical model to predict
            >>the size-distribution of an area's oil and gas fields.
            >>
            >>It is clear that previous investigators prefer either a Pareto power
            >>law
            >>or a lognormal distribution to approximate field-size distributions.
            >>
            >>The data I am using does not look like it comes from a Pareto
            >>distribution
            >>- which I explain as being a result of undersampling - which previous
            >>investigators have reported - that undersampling occurs because the
            >>small
            >>fields are not sampled or recoded. However by using basin-modelling
            >>software to simulate oil and gas fields (for the same basin that my
            >>discovered empirical data comes from) I notice that this sample is also
            >>undersampled - that is fields under a certain size are not being
            >>simulated
            >>- which is probably due to the resolution of my input data but what is
            >>interesting is that the undersampling actually occurs throughout all
            >>the
            >>size ranges - including the medium to larger sizes - which I would not
            >>have expected. Like the discovery dataset (n = 25) the simulated
            >>dataset
            >>(n = 140) looks like it is more from a lognormal distribution than a
            >>Pareto distribution.
            >>
            >>My conclusion is that without being able to say that a Pareto is better
            >>than a lognormal and vise-versa it appears only logical to use both
            >>distributions.
            >>
            >>Geologically there does not seems to be a reason why a modal size
            >>(greater
            >>than what is detectable by exploration methods) of fields should exist
            >>-
            >>which would be the case if the data was from a lognormal distribution
            >>-
            >>except if the distribution is highly right skewed (at the small field
            >>size) and the mode is actually just below the detection of size.
            >>
            >>Geologically there does seem reason for fields to become so small that
            >>they become entities (that trap oil and gas) - and this relationship
            >>may
            >>be better approximated by a Pareto.
            >>
            >>
            >>The Pareto and lognormal form is similar but maybe one is better to
            >>approximate field sizes than the other.
            >>My question is do you think a Pareto distribution better approximates
            >>an
            >>oil and gas size distribution than a lognormal (or vise-versa) and if
            >>so
            >>why.
            >>
            >>
            >>I am currently working on goodness of fit test to throw some more light
            >>on
            >>this - but if anyone has any thing to say I'd appreciate some comments.
            >>
            >>Thank you,
            >>
            >>Kind regards
            >>
            >>Beatrice
            >>
            >>Geological and Nuclear Sciences
            >>New Zealand
            >>www.gns.cri.nz
            >>
            >>
            >>* By using the ai-geostats mailing list you agree to follow its rules
            >>( see http://www.ai-geostats.org/help_ai-geostats.htm )
            >>
            >>* To unsubscribe to ai-geostats, send the following in the subject
            >>or in the body (plain text format) of an email message to
            >>sympa@...
            >>
            >>Signoff ai-geostats
            >
            >
            > --
            > ***************************************
            > Chris Hlavka
            > NASA/Ames Research Center 242-4
            > Moffett Field, CA 94035-1000
            > (650)604-3328 FAX 604-4680
            > Christine.A.Hlavka@...
            > ***************************************


            --------------------------------------------------------------------
            E-Mail: (Ted Harding) <Ted.Harding@...>
            Fax-to-email: +44 (0)870 094 0861
            Date: 02-Sep-05 Time: 09:16:02
            ------------------------------ XFMail ------------------------------
          • Beatrice Mare-Jones
            Hi Chris Thank you for your reply. And thank you for your paper reference - I ll take a look at your probability plots. Yes the apparent lognormality of oil
            Message 5 of 6 , Sep 2, 2005
              Hi Chris

              Thank you for your reply. And thank you for your paper reference - I'll
              take a look at your probability plots.

              Yes the apparent lognormality of oil and gas fields which moves more to a
              Pareto form with progressive and mature exploration is explained by
              undersampling at the low end that eventually gets sampled as the economics
              of an area and technology make smaller fields viable.

              However I am surprised that my simulated oil and gas fields - based on
              basin modelling - and no economic and exploration-process involvement also
              produces lognormal populations of fields. And that teh undersampling is
              obvious throughout most of the size ranges - not just the small size end,

              I think Syed's suggesting to use a larger dataset from a mature area is a
              good way of seeing what the distribution is more like.


              Kind regards


              Beatrice

              Hydrocarbons Group
              Institute of Geological and Nuclear Sciences Limited
              69 Gracefield Road, Lower Hutt, WELLINGTON
              NEW ZEALAND
              64 4 570 4821
              b.mare-jones@...
              www.gns.cri.nz
            • Beatrice Mare-Jones
              HI Ted Thanks for your reply. Yes you are correct. The probability that a field will be discovered is the product of f(x) that it is from a natural abundance
              Message 6 of 6 , Sep 2, 2005
                HI Ted

                Thanks for your reply.

                Yes you are correct. The probability that a field will be discovered is
                the product of f(x) that it is from a natural abundance (of its parent
                population) and as a function of its size g(x). And yes the larger the
                field is the greater the probability that it is found.

                I looked at described the probability that a field will be discovered
                conditional to its size for my empirical dataset as a discovery sequence -
                and although there is a first order trend of decreasing size with
                increasing discovery sequence there were lots of perturbations - for
                example the 3rd largest field was discovered 23rd in the discovery
                sequence. therefore describing the discovery sequence may not be a
                straightforward function. I may start off with a simplistic model to
                demodulate the observed data but may have more success with an integral as
                you have mentioned for the early geometrical work.

                The small dataset I have is probably not suitable to describe the
                discovery sequence theoretically - and I will use an analogue area to
                establish the discovery sequence function .

                Kind regards


                Beatrice


                Hydrocarbons Group
                Institute of Geological and Nuclear Sciences Limited
                69 Gracefield Road, Lower Hutt, WELLINGTON
                NEW ZEALAND
                64 4 570 4821
                b.mare-jones@...
                www.gns.cri.nz





                (Ted Harding) <Ted.Harding@...>
                02/09/2005 20:56
                Please respond to ted.harding


                To: Chris Hlavka <chlavka@...>
                cc: ai-geostats@..., Beatrice Mare-Jones <B.Mare-Jones@...>
                Subject: Re: [ai-geostats] Pareto vs Lognormal distribution


                I'm intruding into foreign territory here, since I don't have
                experience in exploring for gas fields etc., (though I have had
                to deal with patches of contamination, which is the same sort
                of thing on a small scale). So apologies if I blunder around
                and tread on toes!

                Be that as it may, a point which has occurred to me in reading
                this thread is that the distribution being observed is the
                distribution of size conditional on being discovered, and the
                probability of being discovered may be expected to increase
                with size.

                So the frequency f(x) of occurrence of a size x in nature is
                attenuated by a factor equal to the probability that an item
                of size x will be discovered -- g(x) say. The frequency of x
                in the observed data is f(x | D), say, and so

                f(x | D) = f(x)*g(x)

                from which f(x) = f(x | D)/g(x). From this point, provided
                there is a reasonaboly justifiable model for g(x) (to within
                a constant of proportionality, e.g. simply g(x) = x), you can
                "demodulate" the observed data to infer the "wild" data.

                There has for many decades been a similar problem in classical
                geometrical probability (the ancestor of spatial statistics
                and morphometry), namely to infer the distribution of (e.g.)
                areas of cells given the observed distribution of the sizes
                of transects by lines, or of counts of sampling points intersecting
                them, leading to an integral equation.

                Maybe all this is old hat in the areas you are investigating,
                but since it did not seem to be even implicit in the discussion
                so far I thought I would bring it to the surface.

                Best wishes to all,
                Ted.

                On 01-Sep-05 Chris Hlavka wrote:
                > Beatrice - This is a vexing problem that I've tried to deal with in
                > sizes of features in satellite imagery (Hlavka, C. A. and J. L.
                > Dungan, 2002. Areal estimates of fragmented land cover - effects of
                > pixel size and model-based corrections. International Journal of
                > Remote Sensing23(4): 711-724.) The affine (count versus continuous)
                > nature of the digital imagery is at least part of the problem. I've
                > used probability plots to assess type of distribution.
                >
                > In gas field work, there is evidence that the apparent lognormality
                > of field-sizes is due to lower rates of discovery of smaller fields
                > than larger fields - especially for older surveys. It has been
                > noted that newer field data was closer to Pareto than older data and
                > thus inferred that the actual distribution is Pareto. -- Chris
                >
                >
                >
                >>Hello list
                >>
                >>I am a PhD student looking at developing a statistical model to predict
                >>the size-distribution of an area's oil and gas fields.
                >>
                >>It is clear that previous investigators prefer either a Pareto power
                >>law
                >>or a lognormal distribution to approximate field-size distributions.
                >>
                >>The data I am using does not look like it comes from a Pareto
                >>distribution
                >>- which I explain as being a result of undersampling - which previous
                >>investigators have reported - that undersampling occurs because the
                >>small
                >>fields are not sampled or recoded. However by using basin-modelling
                >>software to simulate oil and gas fields (for the same basin that my
                >>discovered empirical data comes from) I notice that this sample is also
                >>undersampled - that is fields under a certain size are not being
                >>simulated
                >>- which is probably due to the resolution of my input data but what is
                >>interesting is that the undersampling actually occurs throughout all
                >>the
                >>size ranges - including the medium to larger sizes - which I would not
                >>have expected. Like the discovery dataset (n = 25) the simulated
                >>dataset
                >>(n = 140) looks like it is more from a lognormal distribution than a
                >>Pareto distribution.
                >>
                >>My conclusion is that without being able to say that a Pareto is better
                >>than a lognormal and vise-versa it appears only logical to use both
                >>distributions.
                >>
                >>Geologically there does not seems to be a reason why a modal size
                >>(greater
                >>than what is detectable by exploration methods) of fields should exist
                >>-
                >>which would be the case if the data was from a lognormal distribution
                >>-
                >>except if the distribution is highly right skewed (at the small field
                >>size) and the mode is actually just below the detection of size.
                >>
                >>Geologically there does seem reason for fields to become so small that
                >>they become entities (that trap oil and gas) - and this relationship
                >>may
                >>be better approximated by a Pareto.
                >>
                >>
                >>The Pareto and lognormal form is similar but maybe one is better to
                >>approximate field sizes than the other.
                >>My question is do you think a Pareto distribution better approximates
                >>an
                >>oil and gas size distribution than a lognormal (or vise-versa) and if
                >>so
                >>why.
                >>
                >>
                >>I am currently working on goodness of fit test to throw some more light
                >>on
                >>this - but if anyone has any thing to say I'd appreciate some comments.
                >>
                >>Thank you,
                >>
                >>Kind regards
                >>
                >>Beatrice
                >>
                >>Geological and Nuclear Sciences
                >>New Zealand
                >>www.gns.cri.nz
                >>
                >>
                >>* By using the ai-geostats mailing list you agree to follow its rules
                >>( see http://www.ai-geostats.org/help_ai-geostats.htm )
                >>
                >>* To unsubscribe to ai-geostats, send the following in the subject
                >>or in the body (plain text format) of an email message to
                >>sympa@...
                >>
                >>Signoff ai-geostats
                >
                >
                > --
                > ***************************************
                > Chris Hlavka
                > NASA/Ames Research Center 242-4
                > Moffett Field, CA 94035-1000
                > (650)604-3328 FAX 604-4680
                > Christine.A.Hlavka@...
                > ***************************************


                --------------------------------------------------------------------
                E-Mail: (Ted Harding) <Ted.Harding@...>
                Fax-to-email: +44 (0)870 094 0861
                Date: 02-Sep-05 Time: 09:16:02
                ------------------------------ XFMail ------------------------------

                * By using the ai-geostats mailing list you agree to follow its rules
                ( see http://www.ai-geostats.org/help_ai-geostats.htm )

                * To unsubscribe to ai-geostats, send the following in the subject or in
                the body (plain text format) of an email message to sympa@...

                Signoff ai-geostats
              Your message has been successfully submitted and would be delivered to recipients shortly.