Loading ...
Sorry, an error occurred while loading the content.

1683Cross Generational Simulating/Comparisons

Expand Messages
  • bchaikin@aol.com
    Jan 23, 2003
    • 0 Attachment
      Over at apbr_analysis, we've discussed the cross generational
      comparison of players and teams and I think we reached the conclusion
      that such comparisons are difficult or impossible to ground-truth. 
      There were some dissenters who we may just kill off. 

      now why would you want to go and do that? if everyone agreed with you, what fun would these discussion groups be?.....

      and who is "we"? i certainly don't agree with any conclusion stating that directly comparing players over the past 40-45 years in the NBA is "...difficult or impossible...". on the contrary, i find it fairly easy......

      i've read every single post on both discussions groups since this discussion on "...are the players of today better than the players of yesterday..." and, more exactly, "...how could we simulate cross-generational pro basketball..." began a number of weeks ago, patiently waiting for any kind of proof or evidence in terms of some discussion stating why players cannot be compared directly over the past decades, and while i have heard a number of people state it can't, or shouldn't be done, i haven't seen any definitive reasons why. i remain wholely unconvinced by the arguements presented so far, and would love to debate any discussion that actually trys to explains why it can't be done accurately using the actual stats...

      i've posted a number of similar questions to the group concerning this exact topic and they have to this point been completely ignored, such as:

      "...if players over the past 40-45 years (actually since the mid to late 1950s) cannot be compared directly, explain where the cutoff is, or cutoffs are, and why?..."

      can players be compared directly (or simulated directly against, using actual statistics) from the 1990s to the 1980s, or from the 1990s or 1980s to the 1970s, from any of those 3 decades to the 1960s, and from any of those four decades to the mid to late 1950s. if not, where does any cutoff exist and why?..."...

      if players/teams from the 1990s can be compared directly with players/teams from the 1980s, why? if not, why not? what is the reasoning, or evidence? same for the other decades? can players from the mid to late 1950s be compared directly to the players of the 1960s? if so, why? if not why not?...

      someone mentioned in previous postings a study by paleontologist stephen jay gould in his book "full house" (1996) where he talked about the disappearance of the .400 hitter in baseball as being due to a decrease in variation of batting averages (whose mean he states was stable over that same time) over the 130 years of baseball, and that that decrease in variation implys a general improvement of performance in the game of baseball over time, and that person posed the question of whether a .260 hitter in i think 1890 was the same as a .260 hitter today (gould would say no). since gould did a somewhat statistical study using standard deviations, and that was attempted by others here in this discussion group to try to find if an analogous situation in pro basketball existed using FG%s, lets look at what gould did, keeping in mind what we want to use his statistical evidence for - to ask can we compare directly basketball players of today to yesterday, are the players of today better than those of yesterday, and if so why, and can we simulate games between teams/players of today to yesterday, and if not where is any cutoff point...

      also i do not wish to convert this basketball discussion group to a baseball discussion group, but as i have yet to see a written source for a discussion about historical basketball as we are attempting to do, i'm guessing it is a good starting point...

      i've read all of the chapters in gould's book about his baseball batting average study (chaps 7-11, pages 77-132), and for those of you that haven't, here are some excerpts. since he has quoted bill james, i will also add in quotes from james and my own comments. keep in mind that while i like what he has shown, and it is a nice approach using statistical analysis, it has some flaws as this pertains to our discussion:

      "...Where else can you find a system (i.e. pro baseball) that has operated with unchanged rules for a century (thus permitting meaningful comparison throughout), and has kept a complete record of all actions and achievements subject to numeration?..." (page 78)

      "...The extinction of .400 hitting really measures the general improvement of play in professional baseball...." (page 79)

      "....More ink has been spilled on the disappearance of .400 hitting than on any other statistical trend in baseball's history. The particular explanations have been varied as their authors, but all agree on one underlying proposition: that the extinction of .400 hitting measures the worsening of something in baseball, and that the problem will therefore be solved when we determine what has gone wrong...." (page 80)

      "....The extinction of .400 hitting measures the general improvement of play...." (page 81).

      "....For almost every sport, the improvement in absolute records follows a definitive pattern with presumed causes central to my developing argument about .400 hitting. Improvement does not follow a linear path at a constant rate. Rather, times and records fall more rapidly early in the sequence and then slow remarkedly, sometimes reaching a plateau of no further advance. In other words, athletes eventually encounter some kind of barrier to future progress, and records stablize. Statisticians call such a barrier an aymptote; vernacular language might speak of a limit. In the terminology of this book, athletes reach a "righ wall" that stymies future improvement....." (page 93)

      "....For reasons never determined, batting averages declined steadily throughout the 1960s, reaching a nadir in the great pitchers years of 1968....." (page 104)

      "…that .400 hitting might have disappeared as a consequence of shrinking variation around (a) stable mean (i.e. .260 hitting)…"(page 105)

      "....the higher the value of the standard deviation, the more extensive, or spread out, the variation…"(page 106)…

      ".....variation (in baseball) decreases steadily through time, leading to the disappearance of .400 hitting as a consequence of shrinkage at the right tail of the distribution…"(page 107)

      "....we note in particular that (
      while looking at a graph of yearly averages of standard deviations of batting averages for regular players - starters - from 1875-1980), while standard deviations have been steadily dropping and irreversibly (1875-1980), the decline itself has decelerated over the years as baseball has stabilized - rapidly during the 19th century, more slowly during the 20th, and reaching a plateau by about 1940…"(page 107)

      i know that as you read this you can't see his graph, but what it plots is standard deviation in batting averages versus year, from 1875-1980, and while the decline (decrease in variation) is evident from 1875 to the mid 1930s, there is no decline (no change in variation) from the mid 1930s to 1980s, a span of some 45+ years, and if you extrapolate it to today (2003), i'm sure you'll see the same pattern, meaning the trend of no variation is for 65+ years...

      "....this analysis has uncovered something general, something beyond the peculiarity of an idiosyncratic system, some rule or principle that should help us to understand why .400 hitting has become extinct in baseball....." (page 110)

      ".....400 hitting disappears as a consequence of shrinking variation around a stable mean batting average...." (page 111)

      ".....the shrinkage of variation must be measuring a general improvement of play...." (page 112)

      "....complex systems improve when the best performers play by the same rules over extended periods of time. As systems improve, they equilibrate and variation decreases...." (page 112)

      ".....as play improves and bell curves march toward the right wall, variation must shrink as the right tail...." (page 116)…

      "....the disappearnce of .400 hitting marks the general improvement of play...." (page 120)

      "....a model that posits increasing excellence of play with decreasing variation when the best can no longer take such numerical advantage of the poorer quality in average performance..." (page 120)

      ".....I do recognize that some improvement might be attributed to changing conditions, rather than absolutely improving play…older infields were apparently lumpier and bumpier thatn the productions of good ground crews today - so some of the poorer fielding of early days may have resulted from lousy fields rather than lousy fielders. I also recognize that rising averages must be tied in large part to great improvements in the design of gloves, but better equipment represents a major theme of history, and one of the legitimate reasons underlying my claim for general improvement in play…"(page 121)

      up and to this last quote i've given you an idea of what gould is saying, but re-read this last quote. basically what gould is saying is that, yes, i admit that while the rules of the game have remained essentially the same, that the conditions in the game have indeed changed (not to mentioned changes in stadiums, favoring either the hitters or pitchers as bill james stresses but gould ignores) over the years, but guess what - i don't care, they actually legitimize my point, i.e.. that doesn't change anything. he is saying that yes conditions of the game have changed, but he does not admit that that may be part of the reason, if not a major part, for players improving over the years....

      this is very important for our discussion for basketball, where the conditions of the game haven't changed very much - compared to baseball - over the same time period of the mid 1950s to today, and people like bill james shows that the stats reveal that the players aren't any better (or worse) in baseball. example: basketball courts haven't changed size like fields have in baseball, and thus that avenue of skewing stats doesn't exist in basketball....

      remember above where gould's graph showed batting average standard deviation hadn't changed from the mid-1930s on? gould's arguments take into account the entire time period of 1875-1980, but it shows little if any difference from the mid 1950s to today. in basketball we are looking at the same time period, the mid to late 1950s to today, where the environment for athletes was the same for professional basketball players and professional baseball players. not many players in both sports lifted weights in the 1950s, travel in both sports was similar, mostly by train, blacks were just starting to be integrated into baseball and shortly after into basketball, treatments for physical injuries were the same, etc...

      so what gould is looking at is the entire process (the numbers), and not the players as a definitive group, i.e. are players better today than yesterday. he surmises this, but he is in fact looking at baseball as a whole, a single process, and not at how the conditions of the game over the years has affected the players (more accurately the players statistical output as gould is looking at the stats), unlike bill james, who has looked at the changing conditions of the game thru the years and how this has affected player statistical output, and has attempted to normalize numbers based on his findings...

      gould quotes james a number of times in his book, but its evident from reading these
      five chapters that he hasn't read james thoroughly. i think most would agree that bill james has probably done more for the understanding of baseball statistics over the years than has gould (and possibly more than anyone), especially when it comes to comparing players from different eras, and james has gone into detail about how the conditions of the game has changed (stadium size, lefty/righty, rule changes and outside influences) over the years...

      obviously gould hasn't read james most recent historical baseball abstract book (an update to his mid1980s book), kind of hard when six feet under, but the vast majority of the book is the same, so gould should have read all of james' mid-1980s edition.

      as an example, when discussing the difference in eras (as shown by the stats), james says (2001 historical abstract, page 481) when comparing eddia collins and rogers hornsby, "...the relationship between 1909 baseball and 1929 baseabll is like the relationship between 1960s and 1990s baseball. many hitters in the 1990s have better numbers (stats) than any hitter posted during the 1960s, not because the players have gotten better, but because the conditions of the game have swung in the hitters favor. same thing back then: many hitters from 1929 have better numbers than any hitter in 1909. this doesn't prove that they were better hitters...

      when discussing the poor hitting number of the 1960s, james says (page 258, same book) "...the most important causes of the 1960s game (poor hitting great pitching), i believe were stadium architecture and the lack of an enforcement mechanism regarding the height of the pitcher's mound. the newer parks in that era moved the fans further away creating more foul territory (i.e. more foul outs and lower batting averages - my words). almost every change in ballparks between 1930 and 1968 took hits out of the league...

      so how does this relate to basketball? i contend that unless someone can show me how the game is different, condition wise, between the years of the mid to late 1950s to today, as james has done for basketball, that like gould we can look at the numbers directly. gould only looked at the numbers and james has shown that this needs refinement based on how the comditions of the game have changed. i would love to see some arguments on how conditions in basketball have changed from the mid-to-late 1950s to today, thus having the stats affected in some way, such that the numbers cannot be compared directly, but i have yet to see this...

      in previous posts i have listed some of the major stats by five year periods since the mid 1950s, and shown similarities, again stating that the game has remained basically the same as a process over that time period. if you were to look at basketball from the late 1800s/early 1900s i would agree there's major differences, that the conditions of the game were vastly different, but unfortunately we can't even compare them because the stats are not available. baseball was quite different in the late 1800s compared to today...

      ".....Wade boggs would hit .400 every year against the pitching and fielding of the 1890s, while wee willie keeler would be lucky to crack .320 today...." (page 125)

      this is a preposterous statement. what would wade boggs have done when he got sick, or just a cold, back then? how about a pulled muscle? a broken finger? there were no antibiotics for simple cures for common ailments back then, no trainers, no x-tays for pictures even if you thought you had a broken bone. poorer nutrition, longer train rides (in terms of time spent) than plane rides, no weight training, you name it. gould is looking at this as being one single process and not taking into consideration the conditions of the game, which james has shown can be used to successfully normalize the eras (if you believe what james has done, which i do)...

      the bottom line is that boggs was one of the very best of his time and keeler was one of the very best in his time. like bill james points out, you look at how the people of the time revered the players, especially his peers, and both were the very best of their time according to their peers. its my contention that the players in the NBA in the mid to late 1950s are as good basketball players as the athletes of today - the best of their time playing at the highest level available. based on that i believe you can simulate the teams/players from the late 1950s and early 1960s to those of today directly, and challenge anyone to dispute this with some sort of statistical evidence...

      and as i state this remember that to simulate players of that time period of the late 1950s early 1960 to the players of today, you simply can't say pettit and yardley and  schayes couldn't play today because the players are bigger and stronger today, you also, if expected to be believed, have to consider today's players playing a season or two back then in that time period, where the players would not have the medical advances of today available, the scores of trainers, coaches, tv commercials and rap albums, whatever...

      until a study is done similar to what gould did, showing the standard deviation of not just FG% but a multitude of other stats have had convincingly shrinking variation thru time from the mid to late 1950s to today in pro hoops i certainly do not see any evidence that the game has changed to the point that the players cannot be compared directly based on their stats...

      ".....As the best batters sacrificed their .400 averages because variation declined while average play improved, the best pitchers lost their earned run averages below 1.50 because ordinary hitters became so good...." (page 126)

      pitchers of the 1900s and 1910s had extremely low ERAs because that was the style /condition of the game at that time, as discussed by james a number of times...

      ".....For some reason that no one understands, pitching took a dramatic upper hand that year....." (1968),

      again for someone who quotes bill james alot its apparent gould hasn't read him enough...

      "....Symmetrically shrinking variation in batting averages must record general improvement of play for two reasons - the first because systems manned by the best performers in competition, and working under the same rules through time, slowly discover optimal procedures and reduce their variation as all personnel learn and master the best ways: the second because the mean moves toward the right wall, thus leaving less space for the spread of variation…as variation shrinks because general play improves, .400 hitting disappears as a consequence of increasing excellenece in play...." (page 128)

      this is gould's conclusion. but how does it stand up to the test of time with barry bonds recent slugging percentage marks, and the complete repeated wipeout of babe ruth and roger maris's home run records by mcgwire, sosa, and bonds? as james would say, its the conditions of the game that are different, not the players...

      If cross-generational comparisons are all you are concerned with,
      yeah, I agree, stats are difficult to use.  Team wins and losses are
      about the only things comparable through time (and there are
      arguments on even that having to do with strength of the league).

      i have yet to read bill james book on win shares, but will soon,to try to see if a system can be made comparable to basketball. you would tend to think yes because there are so fewer players that account for the results of the games in basketball, bit that is merely an assumption...

      The better approach is to have some real team win-loss thing to
      calibrate against (MikeG phrased as "correlating contributions to
      team success").  That is difficult (not impossible) because, when the
      Bulls lost Jordan (the first time), they didn't decline much, but
      when the Spurs lost the Admiral, they dropped like a rock.  When MJ
      came back the Bulls were the best team in history, but they also got
      Dennis Rodman, so you have to account for him.  Doing the whole
      assessment of how a team does leads to an intractable problem of
      sorting out who is responsible for what by when certain players are
      in the lineup.  Jeff Sagarin tackled the problem from a pretty
      scientific angle and came up with Andrei Kirilenko as the 2nd best
      player in the league last year, which didn't quite pass the laugh (or
      smell) test.


      i haven't tried this yet, and haven't seen anything in print yet simlar to what james does for baseball (win shares), so i don't know if it would be difficult. i haven't seen sagarin's system - is it online and available to be critiqued?

      Plus/minus in the NBA is, I think, somewhat valuable.  It gets at exactly what I'm interested in -- how well does a team perform with different individuals in the game. 
      There are definite correlation problems with it in that some decent players only play with other bums, so they don't look as good.  Also, the fact that Jeff Sagarin's method used this information to come out with individual plus/minus ratings (isolating effects of players as best as possible) that seemed implausible, makes me question them.  BobC has never liked them.

      its not that i don't like them, its that they are meaningless, like the ubiquitous AST/TO ratio which tells you next to nothing (there are TOs that have nothing to do with passing, this is used because its simple to calculate, and some time ago someone gave it credence). the +/- ratio is an individual stat not comprised of anything an individual does. in hockey the supposition was that since the majority of the time players are on the same lines, and that the majority of skaters only play about 20 minutes or less of every 60 minute game, that the +/- could tell you something....

      well in basketball the best players play upwards of 75%-80% of each and every game
      and i don't care who you are - you could be an all NBA player or the best in the league, but if you play on a 20-62 team you will have a lousy +/- ratio, and even if bruce bowen has a great +/- ratio because his teammates are duncan, robinson, and the rest of the 58-24 spurs, it doesn't mean he is a great player. he could have a great +/- ratio but you put him on a different team and this one trick pony of a player would be worthless because he plays great individual defense but doesn't have to score, rebound, or pass on the spurs and he would have to do those on a lesser team for that team to win. you put duncan on pretty much any team and he will dramatically improve that team (i'm guessing everyone would agree with that, no?), but if his +/- ratio is close to bruce bowen's what does that tell you?...

      my DOS simulation uses the +/- ratio and all you need do is run a few full season sims to see it doesn't tell you anything of significance. i can even manipulate a team's sub patterns to show how a single player can play on the same team for 82 games, but have a vastly different +/- ratio depending on who he plays with (i.e the first team or the 2nd team, the best of the team's player's or the worst of that team's players). so what does that tell you about +/- ratio?...

      bob chaikin
      bchaikin@...





    • Show all 29 messages in this topic