Just Another Blog

Are you thinking what I'm thinking?

Monday, July 24, 2006

CSV Parser for PHP

fgetcsv() sucks because it doesn't handle quoted newlines. So I wrote a simple CSV parser class for PHP myself, based on the JavaScript version that I wrote earlier (it doesn't handle quoted newlines either). Not prefect (e.g. newlines are standardized to \n, and string matching works only for ASCII or UTF-8), but does what I want it to do.

/**
 * CSV file parser
 * Currently the string matching doesn't work
 * if the output encoding is not ASCII or UTF-8
 */
class CsvFileParser
{
    var $delimiter;         // Field delimiter
    var $enclosure;         // Field enclosure character
    var $inputEncoding;     // Input character encoding
    var $outputEncoding;    // Output character encoding
    var $data;              // CSV data as 2D array

    /**
     * Constructor
     */
    function CsvFileParser()
    {
        $this->delimiter = ",";
        $this->enclosure = '"';
        $this->inputEncoding = "ISO-8859-1";
        $this->outputEncoding = "ISO-8859-1";
        $this->data = array();
    }

    /**
     * Parse CSV from file
     * @param   content     The CSV filename
     * @param   hasBOM      Using BOM or not
     * @return Success or not
     */
    function ParseFromFile( $filename, $hasBOM = false )
    {
        if ( !is_readable($filename) )
        {
            return false;
        }
        return $this->ParseFromString( file_get_contents($filename), $hasBOM );
    }

    /**
     * Parse CSV from string
     * @param   content     The CSV string
     * @param   hasBOM      Using BOM or not
     * @return Success or not
     */
    function ParseFromString( $content, $hasBOM = false )
    {
        $content = iconv($this->inputEncoding, $this->outputEncoding, $content );
        $content = str_replace( "\r\n", "\n", $content );
        $content = str_replace( "\r", "\n", $content );
        if ( $hasBOM )                                // Remove the BOM (first 3 bytes)
        {
            $content = substr( $content, 3 );
        }
        if ( $content[strlen($content)-1] != "\n" )   // Make sure it always end with a newline
        {
            $content .= "\n";
        }

        // Parse the content character by character
        $row = array( "" );
        $idx = 0;
        $quoted = false;
        for ( $i = 0; $i < strlen($content); $i++ )
        {
            $ch = $content[$i];
            if ( $ch == $this->enclosure )
            {
                $quoted = !$quoted;
            }

            // End of line
            if ( $ch == "\n" && !$quoted )
            {
                // Remove enclosure delimiters
                for ( $k = 0; $k < count($row); $k++ )
                {
                    if ( $row[$k] != "" && $row[$k][0] == $this->enclosure )
                    {
                        $row[$k] = substr( $row[$k], 1, strlen($row[$k]) - 2 );
                    }
                    $row[$k] = str_replace( str_repeat($this->enclosure, 2), $this->enclosure, $row[$k] );
                }

                // Append row into table
                $this->data[] = $row;
                $row = array( "" );
                $idx = 0;
            }

            // End of field
            else if ( $ch == $this->delimiter && !$quoted )
            {
                $row[++$idx] = "";
            }

            // Inside the field
            else
            {
                $row[$idx] .= $ch;
            }
        }

        return true;
    }
}

12 Comments:

Note that troll and spam comments will be deleted without any notification.

  • At 11/09/2006 09:31:00 PM, Anonymous Anonymous said…

    This was exactly the thing I was looking for. Thank you for saving me so much time!

     
  • At 11/23/2006 11:06:00 PM, Blogger Jim said…

    hey, good job on your class :)... i used the processing concept of your class to write my own which handles multiline quotes, and quotes within quoted values. hope you don't mind :)

    http://zhuoqe.org/blog/2006/11/23/code-yard-and-php-csv-parser/

     
  • At 12/20/2007 01:25:00 AM, Anonymous hippydream said…

    Hi,

    you made a great job with that class.

    I have added some code to optionally change the output format and use associative arrays or even dynamically created objects (PHP5 required).

    I hope you don't mind and someone finds the changes useful :D

    View source code

    --- hippydream ---

     
  • At 12/20/2007 01:36:00 AM, Blogger minghong said…

    Of course I don't mind…! ;-)

     
  • At 1/11/2008 01:39:00 AM, Anonymous Anonymous said…

    Dear Sir:

    Hi. I have seen your posts on Wikipedia and think they are great. I know you are very knowledgable and hope you don't mind this question. I was wondering if you might have any idea where an American might buy software to translate English into bo-po-mo-fo pronunciation of Chinse. Does such a thing exist for Palm Pilots?

    Thanks
    Diane Michel

     
  • At 9/10/2008 06:19:00 AM, Anonymous Blaze shopping cart said…

    Here is a much faster and simpler function to parse a CSV file. (I benchmarked the original class at around 9 seconds for a 3mb file with 10,000 lines. This function does the same file in 0.60 seconds).

    define('MAX_CSV_LINE_LENGTH', 7000);
    //Uses fgetcsv() to parse a CSV file.
    function csvParseFile($filename, $separator = "\t") {

    $fp = fopen($filename, 'r');
    if ($fp === false) {
    return false;
    }

    $result = array();
    while ($line = fgetcsv($fp, MAX_CSV_LINE_LENGTH, $separator)) {
    array_push($result, $line);
    }

    fclose($fp);

    return $result;
    }

    You can contact me at trusty_jim@hotmail.com

     
  • At 9/10/2008 09:36:00 AM, Blogger minghong said…

    The point is that fgetcsv doesn't handle quoted newlines. So it's not about the speed…

     
  • At 3/02/2009 03:12:00 PM, Anonymous Anonymous said…

    Alternatively you could use this package at PEAR
    http://pear.php.net/package/File_CSV_DataSource

     
  • At 10/13/2009 07:55:00 AM, Blogger greg said…

    Hi, firstly many thanks about your job! Secondly, i am a CS student from Greece and i need some help, if yoy pleased. In my application(not for commercial use), the tables and their fields will be created dynamically. I used csv's headers for that purpose an i added another field "id". After the creation of the table i need to import the data to table. I use the following script:

    $k=0;
    $j=1;

    //Insert Data to table
    foreach ($csv->data as $key => $row){
    $k=0;
    foreach ($row as $value) {
    echo "
    $value
    ";
    $sql= "INSERT INTO inouts ('id','$cols[$k]')
    VALUES ($j,$value)";

    // Execute query
    $result=mysql_query($sql,$link);

    if(mysql_error()){
    echo mysql_error();
    }
    $k++;
    }
    $j++;
    }

    And i get the following error:


    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','DMUS')
    VALUES (1,Brighton)' at line 1


    4453
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT1')
    VALUES (1,4453)' at line 1


    5456
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT2')
    VALUES (1,5456)' at line 1

    etc...

    Thank you in advance, sorry about the huge post.

     
  • At 10/13/2009 10:23:00 AM, Blogger minghong said…

    You don't have single quotes around the string values.

    e.g. VALUES (1,Brighton) should be VALUES (1,'Brighton')

    Also, you should use mysql_real_escape_string() to escape the string values, so as to avoid SQL syntax error and SQL injection.

     
  • At 10/13/2009 08:50:00 PM, Blogger greg said…

    Thank you for your reply, i did some changes as you proposed but i still get the same error.
    &j=1;
    foreach ($csv->data as $key => $row){
    $k=0;
    foreach ($row as $value) {


    $sql= "INSERT INTO inouts ('id','$colmns')
    VALUES (
    '". mysql_real_escape_string($j) ."',
    '". mysql_real_escape_string($value) ."' )";

    // Execute query
    $result=mysql_query($sql,$link);
    if(mysql_error()){
    echo mysql_error(); }
    $k++;
    }
    $j++;
    }

     
  • At 10/13/2009 08:55:00 PM, Blogger greg said…

    I am sorry i missed that :

    $colmns=mysql_real_escape_string($cols[$k]);

    before the $sql statement.

     

Post a Comment

<< Home