Loading ...
Sorry, an error occurred while loading the content.
 

Re: Open science software/experiment repository

Expand Messages
  • Ken
    By the way, I want to emphasize that I am very much in favor of releasing code voluntarily and agree with Jean-Baptise that it is often very helpful. I just
    Message 1 of 18 , Jan 18, 2013
      By the way, I want to emphasize that I am very much in favor of releasing code voluntarily and agree with Jean-Baptise that it is often very helpful. I just don't want a universal requirement.

      --- In neat@yahoogroups.com, "Ken" wrote:
      >
      >
      >
      > Maybe I can elaborate a little more on why I don't think code submission should be a requirement. I appear to be on the minority side of this issue so I know it's an uphill battle but here's my reasoning:
      >
      > As soon as you make something a requirement, you are adding to bureaucracy. Inevitably, someone will be charged with checking that you satisfied the requirement. Even if we start out saying you "only" have to submit the code but no one has to run it, eventually an incident will be exposed when someone submitted incorrect or even fraudulent code, and then the code checks will become increasingly onerous. And it's likely those code checks will be done by reviewers, which means that people who already don't spend enough time reading our papers will now spend even less time as they have to dedicate new effort to looking at a giant impenetrable volume of spaghetti code.
      >
      > I've sat on enough executive committees and editorial boards at this point to know that things will only get stricter and more out of control. Requirements never get lighter, they only get heavier. These committees often make decisions with broad implications after short, rancorous arguments that ratchet up the paranoia about how someone might cheat the system, leading to more and more burdensome checks and requirements. As Jeff points out, as if having to write 15-page project description wasn't enough, now NSF requires proposals to include an additional 2-page data management plan. Not to mention the new proliferation of endless plagiarism checks throughout the academic world. So once we open this Pandora's box, don't think that someone won't ultimately be trying to run your code and blaming you if it doesn't work for them.
      >
      > As far as the upsides, such as Jeff's example of how helpful it is to have the code to reproduce results, if you really think about it, it's not so clear cut. In fact, Jeff, it's conceivable that you (or someone like you) would not have made your discovery had the code been available to begin with, because you would have simply run it and seen (misleadingly) that it works. It's only because it wasn't available and you thereby were forced to try to reproduce the concept for yourself that you learned the concept doesn't work as advertised. It may even be the case that having specific experimental code universally available leads the community to more deception (and thus to be more naive) because we would trust results that ultimately depend on obscure idiosyncrasies hidden in arcane code bases rather than on the supposed main idea. After all, we all know you can get virtually anything to "work" if you finesse it enough.
      >
      > Not only that, but more deeply it seems like making code a requirement is wishful thinking: We're trying to get a quick fix for a problem (i.e. deciding whether an idea is promising) that is only really solved by good old fashioned time and effort. That is, what really validates a method is not the one time it was run to give a published result, but rather the numerous reimplementations by independent parties over many subsequent years. The main reason you can trust NEAT is not because you can download my original pole balancing experiment and run it, but because of all the excellent independent packages like SharpNEAT, ANJI, and many others in which I had no hand at all.
      >
      > Moreover, why should we even care about the code if the idea is uninspiring anyway? By forcing everyone to submit code we are implicitly suggesting that what matters is simply that something worked rather than that it is an interesting idea. In my view the biggest problem in reviewing today is not some kind of systematic flaw in experimental design but rather an unhealthy obsession with experimental results to the complete exclusion of the main idea. Jeff, you say that "reviewers tend to be pretty good about this" when it comes to unfairly demanding excessive comparisons, but if that's true, it's only in our own community. In the larger machine learning community, reviewers are biased against EC and would love another excuse simply to say, "Why didn't you compare to X" as a way of avoiding actually needing to understand the central idea. We will be perpetuating that kind of culture and hence our own ostracization and defunding.
      >
      > Let's leave it voluntary and stop the senseless proliferation of endless requirements that technology tempts us to perpetuate upon ourselves. We are ultimately only hurting ourselves as we forget that well-intended requirements steal from the precious little time in life we have to actually sit and think, which is the most important thing we do as scientists.
      >
      > ken
      >
      >
      > --- In neat@yahoogroups.com, Jeff Clune wrote:
      > >
      > > Hello all,
      > >
      > > I agree with JBM that code should be submitted, and I don't mind it being a requirement. Submitting the code does not mean it has to compile the day after submission, be maintained, or even be clean/commented. It's just a written record of what happened in the experiment. If you are publishing the experiment and trying to describe exactly what you did, the code is the equivalent of a very detailed explanation of the experiment that allows others to replicate it if they wish.
      > >
      > > For example, Jean-Baptiste and I tried and failed to replicate the results of a famous, high-impact study. It was hard for us to know if there were bugs in our code, if we had the wrong parameters, or if the result itself was suspicious (e.g. a bug in their code). Eventually we got the code from the authors and we suddenly were able to replicate the results and learn under which conditions the effect in question occurs. That never would have happened without their code. In our case the authors were still alive and willing to share their code, but both of those conditions don't hold in a lot of cases (especially with older work). I think science is better of if we're able to see exactly how a result came to be.
      > >
      > > I do agree that there may be some disincentive for businesses to publish. But how many businesses publish papers about things they don't really want to share completely? I don't run into too many influential papers like that. And maybe those businesses can publish in other journals, or get an exception from the editors.
      > >
      > > As for the accumulation of a lot of worthless code, is that really a big problem? If they papers are worthless, they're already accumulating…this is just a link in the paper that can be ignored, no?
      > >
      > > I do agree that it may create a pressure on future papers to include previous algorithms more as controls, because the reviewers can say "the code is posted, why didn't you use it?" I think we just have to resist that as reviewers…because everyone knows what a pain it is to get someone else's code running in their domain. Reviewers tend to be pretty good about this.
      > >
      > > In any case, it's an interesting discussion….and likely the future whether we like it or not. My guess is that a lot of journals will follow the lead of the top journals (plus the NSF now requires such archiving, which is what is really driving it in a lot of cases).
      > >
      > > Best regards,
      > > Jeff Clune
      > >
      > > Assistant Professor
      > > Computer Science
      > > University of Wyoming
      > > jclune@
      > > jeffclune.com
      > >
      > > On Jan 18, 2013, at 7:21 AM, wrote:
      > >
      > > > Hi Ken,
      > > >
      > > > > More practically,
      > > > > it will mean that reviewers will start rejecting papers based on supposed
      > > > > code glitches in required code submissions. That will give reviewers a
      > > > > whole new justification for completely ignoring the main of idea of the
      > > > > paper.
      > > > >
      > > > > Students will also become overwhelmed with code emergencies (since
      > > > > publication will depend now upon strangers running your code) and code
      > > > > cleanups demanded by random strangers who may themselves be at fault or
      > > > > simply ignorant. Furthermore, in a few years we will have massive
      > > > > repositories of junk code (think of the hundreds of papers published only
      > > > > at GECCO in one year) that is so out of date that it requires deprecated
      > > > > compilers and computing platforms even to run. But we will be required to
      > > > > maintain these repositories, which will become increasingly ignored,
      > > > > simply to maintain the appearance of rigor.
      > > >
      > > > I think there is a large difference between reviewing the code and
      > > > requesting that the authors make it available. I don't expect reviewers to
      > > > run the code when it's available. First, re-launching experiments may
      > > > require weeks of computations on large clusters. Second, I fully
      > > > understand that releasing "clean code" is a huge job and not our main job
      > > > (Science).
      > > >
      > > > However, I view the code as the detailed, technical description of a
      > > > paper. It contains all the parameters and all the implementation details,
      > > > including those that the authors thought were not important (but that may
      > > > be actually important). We don't need to be able to compile the code in
      > > > the future, we only need to be able to read it.
      > > >
      > > > When I reuse an algorithm described in a paper, I often re-implement it.
      > > > It's a good exercise and it helps keeping my framework clean and
      > > > consistent. But I always find very helpful to have have access to the
      > > > source code so that I can understand the small details and see the
      > > > "tricks" used by the authors. For instance, I'm very happy that Sebastian
      > > > and you released the source code of the node placement algorithm for
      > > > HyperNEAT: thanks to it, we were able to reimplement it in our framework
      > > > in a few hours/days. Otherwise, it would have been much more difficult.
      > > >
      > > > Best regards,
      > > > -- JBM
      > > >
      > > >
      > >
      >
    • Oliver Coleman
      Hi all, just to keep flogging this debate for all its worth, I wrote a blog post that attempts to boil down the arguments and counter-arguments (presented here
      Message 2 of 18 , Feb 13, 2013
        Hi all, just to keep flogging this debate for all its worth, I wrote a blog post that attempts to boil down the arguments and counter-arguments (presented here and my own) for and against (requiring) submission of code along with papers: http://ojcoleman.com/content/open-science-what-about-source-code

        Let me know if you feel I've ripped off any statements you've made in the discussion in this group and want to be acknowledged for them in the post. :) Most of the arguments against are from Ken, and they're the most blatantly copied. (I've linked to this thread from the post so people can easily find the source of an argument and read it in the authors words.)



        On 19 January 2013 14:37, Ken <kstanley@...> wrote:
         



        By the way, I want to emphasize that I am very much in favor of releasing code voluntarily and agree with Jean-Baptise that it is often very helpful. I just don't want a universal requirement.



        --- In neat@yahoogroups.com, "Ken" wrote:
        >
        >
        >
        > Maybe I can elaborate a little more on why I don't think code submission should be a requirement. I appear to be on the minority side of this issue so I know it's an uphill battle but here's my reasoning:
        >
        > As soon as you make something a requirement, you are adding to bureaucracy. Inevitably, someone will be charged with checking that you satisfied the requirement. Even if we start out saying you "only" have to submit the code but no one has to run it, eventually an incident will be exposed when someone submitted incorrect or even fraudulent code, and then the code checks will become increasingly onerous. And it's likely those code checks will be done by reviewers, which means that people who already don't spend enough time reading our papers will now spend even less time as they have to dedicate new effort to looking at a giant impenetrable volume of spaghetti code.
        >
        > I've sat on enough executive committees and editorial boards at this point to know that things will only get stricter and more out of control. Requirements never get lighter, they only get heavier. These committees often make decisions with broad implications after short, rancorous arguments that ratchet up the paranoia about how someone might cheat the system, leading to more and more burdensome checks and requirements. As Jeff points out, as if having to write 15-page project description wasn't enough, now NSF requires proposals to include an additional 2-page data management plan. Not to mention the new proliferation of endless plagiarism checks throughout the academic world. So once we open this Pandora's box, don't think that someone won't ultimately be trying to run your code and blaming you if it doesn't work for them.
        >
        > As far as the upsides, such as Jeff's example of how helpful it is to have the code to reproduce results, if you really think about it, it's not so clear cut. In fact, Jeff, it's conceivable that you (or someone like you) would not have made your discovery had the code been available to begin with, because you would have simply run it and seen (misleadingly) that it works. It's only because it wasn't available and you thereby were forced to try to reproduce the concept for yourself that you learned the concept doesn't work as advertised. It may even be the case that having specific experimental code universally available leads the community to more deception (and thus to be more naive) because we would trust results that ultimately depend on obscure idiosyncrasies hidden in arcane code bases rather than on the supposed main idea. After all, we all know you can get virtually anything to "work" if you finesse it enough.
        >
        > Not only that, but more deeply it seems like making code a requirement is wishful thinking: We're trying to get a quick fix for a problem (i.e. deciding whether an idea is promising) that is only really solved by good old fashioned time and effort. That is, what really validates a method is not the one time it was run to give a published result, but rather the numerous reimplementations by independent parties over many subsequent years. The main reason you can trust NEAT is not because you can download my original pole balancing experiment and run it, but because of all the excellent independent packages like SharpNEAT, ANJI, and many others in which I had no hand at all.
        >
        > Moreover, why should we even care about the code if the idea is uninspiring anyway? By forcing everyone to submit code we are implicitly suggesting that what matters is simply that something worked rather than that it is an interesting idea. In my view the biggest problem in reviewing today is not some kind of systematic flaw in experimental design but rather an unhealthy obsession with experimental results to the complete exclusion of the main idea. Jeff, you say that "reviewers tend to be pretty good about this" when it comes to unfairly demanding excessive comparisons, but if that's true, it's only in our own community. In the larger machine learning community, reviewers are biased against EC and would love another excuse simply to say, "Why didn't you compare to X" as a way of avoiding actually needing to understand the central idea. We will be perpetuating that kind of culture and hence our own ostracization and defunding.
        >
        > Let's leave it voluntary and stop the senseless proliferation of endless requirements that technology tempts us to perpetuate upon ourselves. We are ultimately only hurting ourselves as we forget that well-intended requirements steal from the precious little time in life we have to actually sit and think, which is the most important thing we do as scientists.
        >
        > ken
        >
        >
        > --- In neat@yahoogroups.com, Jeff Clune wrote:
        > >
        > > Hello all,
        > >
        > > I agree with JBM that code should be submitted, and I don't mind it being a requirement. Submitting the code does not mean it has to compile the day after submission, be maintained, or even be clean/commented. It's just a written record of what happened in the experiment. If you are publishing the experiment and trying to describe exactly what you did, the code is the equivalent of a very detailed explanation of the experiment that allows others to replicate it if they wish.
        > >
        > > For example, Jean-Baptiste and I tried and failed to replicate the results of a famous, high-impact study. It was hard for us to know if there were bugs in our code, if we had the wrong parameters, or if the result itself was suspicious (e.g. a bug in their code). Eventually we got the code from the authors and we suddenly were able to replicate the results and learn under which conditions the effect in question occurs. That never would have happened without their code. In our case the authors were still alive and willing to share their code, but both of those conditions don't hold in a lot of cases (especially with older work). I think science is better of if we're able to see exactly how a result came to be.
        > >
        > > I do agree that there may be some disincentive for businesses to publish. But how many businesses publish papers about things they don't really want to share completely? I don't run into too many influential papers like that. And maybe those businesses can publish in other journals, or get an exception from the editors.
        > >
        > > As for the accumulation of a lot of worthless code, is that really a big problem? If they papers are worthless, they're already accumulating…this is just a link in the paper that can be ignored, no?
        > >
        > > I do agree that it may create a pressure on future papers to include previous algorithms more as controls, because the reviewers can say "the code is posted, why didn't you use it?" I think we just have to resist that as reviewers…because everyone knows what a pain it is to get someone else's code running in their domain. Reviewers tend to be pretty good about this.
        > >
        > > In any case, it's an interesting discussion….and likely the future whether we like it or not. My guess is that a lot of journals will follow the lead of the top journals (plus the NSF now requires such archiving, which is what is really driving it in a lot of cases).
        > >
        > > Best regards,
        > > Jeff Clune
        > >
        > > Assistant Professor
        > > Computer Science
        > > University of Wyoming
        > > jclune@
        > > jeffclune.com
        > >
        > > On Jan 18, 2013, at 7:21 AM, wrote:
        > >
        > > > Hi Ken,
        > > >
        > > > > More practically,
        > > > > it will mean that reviewers will start rejecting papers based on supposed
        > > > > code glitches in required code submissions. That will give reviewers a
        > > > > whole new justification for completely ignoring the main of idea of the
        > > > > paper.
        > > > >
        > > > > Students will also become overwhelmed with code emergencies (since
        > > > > publication will depend now upon strangers running your code) and code
        > > > > cleanups demanded by random strangers who may themselves be at fault or
        > > > > simply ignorant. Furthermore, in a few years we will have massive
        > > > > repositories of junk code (think of the hundreds of papers published only
        > > > > at GECCO in one year) that is so out of date that it requires deprecated
        > > > > compilers and computing platforms even to run. But we will be required to
        > > > > maintain these repositories, which will become increasingly ignored,
        > > > > simply to maintain the appearance of rigor.
        > > >
        > > > I think there is a large difference between reviewing the code and
        > > > requesting that the authors make it available. I don't expect reviewers to
        > > > run the code when it's available. First, re-launching experiments may
        > > > require weeks of computations on large clusters. Second, I fully
        > > > understand that releasing "clean code" is a huge job and not our main job
        > > > (Science).
        > > >
        > > > However, I view the code as the detailed, technical description of a
        > > > paper. It contains all the parameters and all the implementation details,
        > > > including those that the authors thought were not important (but that may
        > > > be actually important). We don't need to be able to compile the code in
        > > > the future, we only need to be able to read it.
        > > >
        > > > When I reuse an algorithm described in a paper, I often re-implement it.
        > > > It's a good exercise and it helps keeping my framework clean and
        > > > consistent. But I always find very helpful to have have access to the
        > > > source code so that I can understand the small details and see the
        > > > "tricks" used by the authors. For instance, I'm very happy that Sebastian
        > > > and you released the source code of the node placement algorithm for
        > > > HyperNEAT: thanks to it, we were able to reimplement it in our framework
        > > > in a few hours/days. Otherwise, it would have been much more difficult.
        > > >
        > > > Best regards,
        > > > -- JBM
        > > >
        > > >
        > >
        >


      • Ken
        Hi Oliver, I enjoyed what you put together there. It s a nice summary of a debate that many people probably would not even view as a debatable topic. I
        Message 3 of 18 , Feb 13, 2013
          Hi Oliver, I enjoyed what you put together there. It's a nice summary of a debate that many people probably would not even view as a debatable topic. I appreciate my view getting a fair hearing.

          While I also appreciate that you take a neutral point of view in the article, I still wanted to comment a bit on some of the points in favor of code sharing that you present. One thing I notice in your summary is that some of the "for" arguments seem more like arguments for sharing code as opposed to arguments for *requiring* sharing code. Of course no one can be against sharing code (leaving aside whether to require it or not) and explaining why it's a good idea is easy.

          For example, I think just about everyone is for eating vegetables. But very few people would want the government to *require* us to eat vegetables. Yet no matter how persuasively you counter with evidence that vegetables are good for us, it doesn't really address the real locus of the debate, which is on the unintended consequences of being forced to eat vegetables by the government.

          If you look at your counter-arguments to the "against" points, they often hinge on the idea that the bad stuff need not come true. For example:

          "the code does not need to be able to be run by a reviewer."

          "It doesn't even need to be particularly "clean" or commented"

          "There needn't be any requirement to maintain the old code."

          And you acknowledge: "Re-implementing the model in a new piece of code provides a lot more certainty that the idea works as described in the paper."

          But with all this stuff that need not happen (and presumably that you are therefore saying should not happen), and the fact that reimplementation is still important even with all this code being provided, it raises again the question of what the point is of making this whole exercise a requirement anyway? Life will hardly be better for the community if all you are required to do is send a zip file with some incomprehensible and uncompilable filler inside it to pass the "code submission" requirement. Why fool ourselves into thinking otherwise?

          I like how it is now where you can tell how serious an author is with his or her code by knowing that they released it voluntarily. If authors don't release code, it gives us a useful signal that someone should try reimplementing the idea if it sounds at all interesting, or perhaps that the authors didn't think their own idea significant enough to warrant releasing it. Such useful signals will be gone with the new system. Instead, everyone will be able to pretend they cared enough to "share" their code.

          Anyway, when it comes to bureaucracy, I think slippery slopes look more and more slippery the older you get. When I was younger I wouldn't have thought about it as much, but now that I've been on so many committees and boards I can see that adding new rules is almost a force of nature. Just in my fallible opinion, if you ever decide to implement a rule but feel a need to preface it with a number of caveats about what it "need not" imply, get ready for all those need nots to come true. But please forgive my cynicism - I'm usually an optimist - just not on the subject of bureaucracy. I plan to continue eating vegetables and releasing code in any case. :)

          Best,

          ken

          --- In neat@yahoogroups.com, Oliver Coleman wrote:
          >
          > Hi all, just to keep flogging this debate for all its worth, I wrote a blog
          > post that attempts to boil down the arguments and counter-arguments
          > (presented here and my own) for and against (requiring) submission of code
          > along with papers:
          > http://ojcoleman.com/content/open-science-what-about-source-code
          >
          > Let me know if you feel I've ripped off any statements you've made in the
          > discussion in this group and want to be acknowledged for them in the post.
          > :) Most of the arguments against are from Ken, and they're the most
          > blatantly copied. (I've linked to this thread from the post so people can
          > easily find the source of an argument and read it in the authors words.)
          >
          > T: 0421 972 953
          > E: oliver.coleman@...
          > W: http://ojcoleman.com
          >
          >
          >
          > On 19 January 2013 14:37, Ken wrote:
          >
          > > **
          > >
          > >
          > >
          > >
          > > By the way, I want to emphasize that I am very much in favor of releasing
          > > code voluntarily and agree with Jean-Baptise that it is often very helpful.
          > > I just don't want a universal requirement.
          > >
          > >
          > > --- In neat@yahoogroups.com, "Ken" wrote:
          > > >
          > > >
          > > >
          > > > Maybe I can elaborate a little more on why I don't think code submission
          > > should be a requirement. I appear to be on the minority side of this issue
          > > so I know it's an uphill battle but here's my reasoning:
          > > >
          > > > As soon as you make something a requirement, you are adding to
          > > bureaucracy. Inevitably, someone will be charged with checking that you
          > > satisfied the requirement. Even if we start out saying you "only" have to
          > > submit the code but no one has to run it, eventually an incident will be
          > > exposed when someone submitted incorrect or even fraudulent code, and then
          > > the code checks will become increasingly onerous. And it's likely those
          > > code checks will be done by reviewers, which means that people who already
          > > don't spend enough time reading our papers will now spend even less time as
          > > they have to dedicate new effort to looking at a giant impenetrable volume
          > > of spaghetti code.
          > > >
          > > > I've sat on enough executive committees and editorial boards at this
          > > point to know that things will only get stricter and more out of control.
          > > Requirements never get lighter, they only get heavier. These committees
          > > often make decisions with broad implications after short, rancorous
          > > arguments that ratchet up the paranoia about how someone might cheat the
          > > system, leading to more and more burdensome checks and requirements. As
          > > Jeff points out, as if having to write 15-page project description wasn't
          > > enough, now NSF requires proposals to include an additional 2-page data
          > > management plan. Not to mention the new proliferation of endless plagiarism
          > > checks throughout the academic world. So once we open this Pandora's box,
          > > don't think that someone won't ultimately be trying to run your code and
          > > blaming you if it doesn't work for them.
          > > >
          > > > As far as the upsides, such as Jeff's example of how helpful it is to
          > > have the code to reproduce results, if you really think about it, it's not
          > > so clear cut. In fact, Jeff, it's conceivable that you (or someone like
          > > you) would not have made your discovery had the code been available to
          > > begin with, because you would have simply run it and seen (misleadingly)
          > > that it works. It's only because it wasn't available and you thereby were
          > > forced to try to reproduce the concept for yourself that you learned the
          > > concept doesn't work as advertised. It may even be the case that having
          > > specific experimental code universally available leads the community to
          > > more deception (and thus to be more naive) because we would trust results
          > > that ultimately depend on obscure idiosyncrasies hidden in arcane code
          > > bases rather than on the supposed main idea. After all, we all know you can
          > > get virtually anything to "work" if you finesse it enough.
          > > >
          > > > Not only that, but more deeply it seems like making code a requirement
          > > is wishful thinking: We're trying to get a quick fix for a problem (i.e.
          > > deciding whether an idea is promising) that is only really solved by good
          > > old fashioned time and effort. That is, what really validates a method is
          > > not the one time it was run to give a published result, but rather the
          > > numerous reimplementations by independent parties over many subsequent
          > > years. The main reason you can trust NEAT is not because you can download
          > > my original pole balancing experiment and run it, but because of all the
          > > excellent independent packages like SharpNEAT, ANJI, and many others in
          > > which I had no hand at all.
          > > >
          > > > Moreover, why should we even care about the code if the idea is
          > > uninspiring anyway? By forcing everyone to submit code we are implicitly
          > > suggesting that what matters is simply that something worked rather than
          > > that it is an interesting idea. In my view the biggest problem in reviewing
          > > today is not some kind of systematic flaw in experimental design but rather
          > > an unhealthy obsession with experimental results to the complete exclusion
          > > of the main idea. Jeff, you say that "reviewers tend to be pretty good
          > > about this" when it comes to unfairly demanding excessive comparisons, but
          > > if that's true, it's only in our own community. In the larger machine
          > > learning community, reviewers are biased against EC and would love another
          > > excuse simply to say, "Why didn't you compare to X" as a way of avoiding
          > > actually needing to understand the central idea. We will be perpetuating
          > > that kind of culture and hence our own ostracization and defunding.
          > > >
          > > > Let's leave it voluntary and stop the senseless proliferation of endless
          > > requirements that technology tempts us to perpetuate upon ourselves. We are
          > > ultimately only hurting ourselves as we forget that well-intended
          > > requirements steal from the precious little time in life we have to
          > > actually sit and think, which is the most important thing we do as
          > > scientists.
          > > >
          > > > ken
          > > >
          > > >
          > > > --- In neat@yahoogroups.com, Jeff Clune wrote:
          > > > >
          > > > > Hello all,
          > > > >
          > > > > I agree with JBM that code should be submitted, and I don't mind it
          > > being a requirement. Submitting the code does not mean it has to compile
          > > the day after submission, be maintained, or even be clean/commented. It's
          > > just a written record of what happened in the experiment. If you are
          > > publishing the experiment and trying to describe exactly what you did, the
          > > code is the equivalent of a very detailed explanation of the experiment
          > > that allows others to replicate it if they wish.
          > > > >
          > > > > For example, Jean-Baptiste and I tried and failed to replicate the
          > > results of a famous, high-impact study. It was hard for us to know if there
          > > were bugs in our code, if we had the wrong parameters, or if the result
          > > itself was suspicious (e.g. a bug in their code). Eventually we got the
          > > code from the authors and we suddenly were able to replicate the results
          > > and learn under which conditions the effect in question occurs. That never
          > > would have happened without their code. In our case the authors were still
          > > alive and willing to share their code, but both of those conditions don't
          > > hold in a lot of cases (especially with older work). I think science is
          > > better of if we're able to see exactly how a result came to be.
          > > > >
          > > > > I do agree that there may be some disincentive for businesses to
          > > publish. But how many businesses publish papers about things they don't
          > > really want to share completely? I don't run into too many influential
          > > papers like that. And maybe those businesses can publish in other journals,
          > > or get an exception from the editors.
          > > > >
          > > > > As for the accumulation of a lot of worthless code, is that really a
          > > big problem? If they papers are worthless, they're already
          > > accumulating…this is just a link in the paper that can be ignored, no?
          > > > >
          > > > > I do agree that it may create a pressure on future papers to include
          > > previous algorithms more as controls, because the reviewers can say "the
          > > code is posted, why didn't you use it?" I think we just have to resist that
          > > as reviewers…because everyone knows what a pain it is to get someone else's
          > > code running in their domain. Reviewers tend to be pretty good about this.
          > > > >
          > > > > In any case, it's an interesting discussion….and likely the future
          > > whether we like it or not. My guess is that a lot of journals will follow
          > > the lead of the top journals (plus the NSF now requires such archiving,
          > > which is what is really driving it in a lot of cases).
          > > > >
          > > > > Best regards,
          > > > > Jeff Clune
          > > > >
          > > > > Assistant Professor
          > > > > Computer Science
          > > > > University of Wyoming
          > > > > jclune@
          > > > > jeffclune.com
          > > > >
          > > > > On Jan 18, 2013, at 7:21 AM, wrote:
          > > > >
          > > > > > Hi Ken,
          > > > > >
          > > > > > > More practically,
          > > > > > > it will mean that reviewers will start rejecting papers based on
          > > supposed
          > > > > > > code glitches in required code submissions. That will give
          > > reviewers a
          > > > > > > whole new justification for completely ignoring the main of idea
          > > of the
          > > > > > > paper.
          > > > > > >
          > > > > > > Students will also become overwhelmed with code emergencies (since
          > > > > > > publication will depend now upon strangers running your code) and
          > > code
          > > > > > > cleanups demanded by random strangers who may themselves be at
          > > fault or
          > > > > > > simply ignorant. Furthermore, in a few years we will have massive
          > > > > > > repositories of junk code (think of the hundreds of papers
          > > published only
          > > > > > > at GECCO in one year) that is so out of date that it requires
          > > deprecated
          > > > > > > compilers and computing platforms even to run. But we will be
          > > required to
          > > > > > > maintain these repositories, which will become increasingly
          > > ignored,
          > > > > > > simply to maintain the appearance of rigor.
          > > > > >
          > > > > > I think there is a large difference between reviewing the code and
          > > > > > requesting that the authors make it available. I don't expect
          > > reviewers to
          > > > > > run the code when it's available. First, re-launching experiments may
          > > > > > require weeks of computations on large clusters. Second, I fully
          > > > > > understand that releasing "clean code" is a huge job and not our
          > > main job
          > > > > > (Science).
          > > > > >
          > > > > > However, I view the code as the detailed, technical description of a
          > > > > > paper. It contains all the parameters and all the implementation
          > > details,
          > > > > > including those that the authors thought were not important (but
          > > that may
          > > > > > be actually important). We don't need to be able to compile the code
          > > in
          > > > > > the future, we only need to be able to read it.
          > > > > >
          > > > > > When I reuse an algorithm described in a paper, I often re-implement
          > > it.
          > > > > > It's a good exercise and it helps keeping my framework clean and
          > > > > > consistent. But I always find very helpful to have have access to the
          > > > > > source code so that I can understand the small details and see the
          > > > > > "tricks" used by the authors. For instance, I'm very happy that
          > > Sebastian
          > > > > > and you released the source code of the node placement algorithm for
          > > > > > HyperNEAT: thanks to it, we were able to reimplement it in our
          > > framework
          > > > > > in a few hours/days. Otherwise, it would have been much more
          > > difficult.
          > > > > >
          > > > > > Best regards,
          > > > > > -- JBM
          > > > > >
          > > > > >
          > > > >
          > > >
          > >
          > >
          > >
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.