Loading ...
Sorry, an error occurred while loading the content.

4463Re: csv? Re: [baseball-databank] Proposed changes to master table

Expand Messages
  • Ian Orr
    Dec 15, 2013
      I didn't have any problems when I imported the CSVs into PostgreSQL. I opened the 2013-12-10 Master.csv table in a text editor (instead of Excel), and it looks like most text entries don't have quotation marks, but those that contain a comma do. Below is Hank Aaron's row, for instance:

      1,aaronha01,,aaronha01h,1934,2,5,USA,AL,Mobile,,,,,,,Hank,Aaron,,Henry Louis,"Hammer,Hammerin' Hank,Bad Henry",180,72,R,R,4/13/1954,10/3/1976,,aaronha01,aaronha01,aaroh101,aaronha01,aaronha01

      I wonder if it might be easier for some people to parse the entries if all text entries were surrounded by quotation marks, but I can frankly say that I am hardly an authority in this matter.


      On Sun, Dec 15, 2013 at 3:48 PM, Sean Lahman <seanlahman@...> wrote:

      On Sun, Dec 15, 2013 at 4:05 PM, F. X. Flinn <fxflinn@...> wrote:


      It should be possible to supply a tab-delimited version or, even better, a strictly compliant csv file with " surrounding text fields.

      Either of these things are possible, of course.  I'll confess that I'm not aware of a csv standard that requires all text fields to be enclosed.  Are there use cases where that creates a problem?  What about csv rather than tsv? 

      I want to be accomoodating, but I don't want to create multiple variations of file types unless it's necessary. The overriding goal of this project has been to make the data available in the most portable open source format that's available, which people could import/convert for use with any DBMS (MySql, Oracle, dBase, SAP, etc.) or access using the most popular programming languages (python, php, perl, ruby, R, etc.)  

      I have always made the database available in Access format because, generally, the folks who work in Access would struggle with building it themselves.  For more than a dozen years this was the most downloaded format, and it still gets more interest than most of you might expect.  I have made an SQL version available in recent years because it has been so frequently requested, although the majority of people I talk to that work in MySQL or PostgreSQL prefer to import the CSV files.

      But my preference would be to provide a single version that's extremely portable rather than providing downolads in ten different varities.  

      What say you all?  Are there applications where the current csv formats are problematic?


      Sean Lahman

    • Show all 23 messages in this topic