CSV Parser for PHP
fgetcsv() sucks because it doesn't handle quoted newlines. So I wrote a simple CSV parser class for PHP myself, based on the JavaScript version that I wrote earlier (it doesn't handle quoted newlines either). Not prefect (e.g. newlines are standardized to \n, and string matching works only for ASCII or UTF-8), but does what I want it to do.
/**
* CSV file parser
* Currently the string matching doesn't work
* if the output encoding is not ASCII or UTF-8
*/
class CsvFileParser
{
var $delimiter; // Field delimiter
var $enclosure; // Field enclosure character
var $inputEncoding; // Input character encoding
var $outputEncoding; // Output character encoding
var $data; // CSV data as 2D array
/**
* Constructor
*/
function CsvFileParser()
{
$this->delimiter = ",";
$this->enclosure = '"';
$this->inputEncoding = "ISO-8859-1";
$this->outputEncoding = "ISO-8859-1";
$this->data = array();
}
/**
* Parse CSV from file
* @param content The CSV filename
* @param hasBOM Using BOM or not
* @return Success or not
*/
function ParseFromFile( $filename, $hasBOM = false )
{
if ( !is_readable($filename) )
{
return false;
}
return $this->ParseFromString( file_get_contents($filename), $hasBOM );
}
/**
* Parse CSV from string
* @param content The CSV string
* @param hasBOM Using BOM or not
* @return Success or not
*/
function ParseFromString( $content, $hasBOM = false )
{
$content = iconv($this->inputEncoding, $this->outputEncoding, $content );
$content = str_replace( "\r\n", "\n", $content );
$content = str_replace( "\r", "\n", $content );
if ( $hasBOM ) // Remove the BOM (first 3 bytes)
{
$content = substr( $content, 3 );
}
if ( $content[strlen($content)-1] != "\n" ) // Make sure it always end with a newline
{
$content .= "\n";
}
// Parse the content character by character
$row = array( "" );
$idx = 0;
$quoted = false;
for ( $i = 0; $i < strlen($content); $i++ )
{
$ch = $content[$i];
if ( $ch == $this->enclosure )
{
$quoted = !$quoted;
}
// End of line
if ( $ch == "\n" && !$quoted )
{
// Remove enclosure delimiters
for ( $k = 0; $k < count($row); $k++ )
{
if ( $row[$k] != "" && $row[$k][0] == $this->enclosure )
{
$row[$k] = substr( $row[$k], 1, strlen($row[$k]) - 2 );
}
$row[$k] = str_replace( str_repeat($this->enclosure, 2), $this->enclosure, $row[$k] );
}
// Append row into table
$this->data[] = $row;
$row = array( "" );
$idx = 0;
}
// End of field
else if ( $ch == $this->delimiter && !$quoted )
{
$row[++$idx] = "";
}
// Inside the field
else
{
$row[$idx] .= $ch;
}
}
return true;
}
}


12 Comments:
Note that troll and spam comments will be deleted without any notification.
At 11/09/2006 09:31:00 PM,
Anonymous said…
This was exactly the thing I was looking for. Thank you for saving me so much time!
At 11/23/2006 11:06:00 PM,
Jim said…
hey, good job on your class :)... i used the processing concept of your class to write my own which handles multiline quotes, and quotes within quoted values. hope you don't mind :)
http://zhuoqe.org/blog/2006/11/23/code-yard-and-php-csv-parser/
At 12/20/2007 01:25:00 AM,
hippydream said…
Hi,
you made a great job with that class.
I have added some code to optionally change the output format and use associative arrays or even dynamically created objects (PHP5 required).
I hope you don't mind and someone finds the changes useful :D
View source code
--- hippydream ---
At 12/20/2007 01:36:00 AM,
minghong said…
Of course I don't mind…! ;-)
At 1/11/2008 01:39:00 AM,
Anonymous said…
Dear Sir:
Hi. I have seen your posts on Wikipedia and think they are great. I know you are very knowledgable and hope you don't mind this question. I was wondering if you might have any idea where an American might buy software to translate English into bo-po-mo-fo pronunciation of Chinse. Does such a thing exist for Palm Pilots?
Thanks
Diane Michel
At 9/10/2008 06:19:00 AM,
Blaze shopping cart said…
Here is a much faster and simpler function to parse a CSV file. (I benchmarked the original class at around 9 seconds for a 3mb file with 10,000 lines. This function does the same file in 0.60 seconds).
define('MAX_CSV_LINE_LENGTH', 7000);
//Uses fgetcsv() to parse a CSV file.
function csvParseFile($filename, $separator = "\t") {
$fp = fopen($filename, 'r');
if ($fp === false) {
return false;
}
$result = array();
while ($line = fgetcsv($fp, MAX_CSV_LINE_LENGTH, $separator)) {
array_push($result, $line);
}
fclose($fp);
return $result;
}
You can contact me at trusty_jim@hotmail.com
At 9/10/2008 09:36:00 AM,
minghong said…
The point is that fgetcsv doesn't handle quoted newlines. So it's not about the speed…
At 3/02/2009 03:12:00 PM,
Anonymous said…
Alternatively you could use this package at PEAR
http://pear.php.net/package/File_CSV_DataSource
At 10/13/2009 07:55:00 AM,
greg said…
Hi, firstly many thanks about your job! Secondly, i am a CS student from Greece and i need some help, if yoy pleased. In my application(not for commercial use), the tables and their fields will be created dynamically. I used csv's headers for that purpose an i added another field "id". After the creation of the table i need to import the data to table. I use the following script:
$k=0;
$j=1;
//Insert Data to table
foreach ($csv->data as $key => $row){
$k=0;
foreach ($row as $value) {
echo "
$value
";
$sql= "INSERT INTO inouts ('id','$cols[$k]')
VALUES ($j,$value)";
// Execute query
$result=mysql_query($sql,$link);
if(mysql_error()){
echo mysql_error();
}
$k++;
}
$j++;
}
And i get the following error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','DMUS')
VALUES (1,Brighton)' at line 1
4453
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT1')
VALUES (1,4453)' at line 1
5456
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''id','INPUT2')
VALUES (1,5456)' at line 1
etc...
Thank you in advance, sorry about the huge post.
At 10/13/2009 10:23:00 AM,
minghong said…
You don't have single quotes around the string values.
e.g. VALUES (1,Brighton) should be VALUES (1,'Brighton')
Also, you should use mysql_real_escape_string() to escape the string values, so as to avoid SQL syntax error and SQL injection.
At 10/13/2009 08:50:00 PM,
greg said…
Thank you for your reply, i did some changes as you proposed but i still get the same error.
&j=1;
foreach ($csv->data as $key => $row){
$k=0;
foreach ($row as $value) {
$sql= "INSERT INTO inouts ('id','$colmns')
VALUES (
'". mysql_real_escape_string($j) ."',
'". mysql_real_escape_string($value) ."' )";
// Execute query
$result=mysql_query($sql,$link);
if(mysql_error()){
echo mysql_error(); }
$k++;
}
$j++;
}
At 10/13/2009 08:55:00 PM,
greg said…
I am sorry i missed that :
$colmns=mysql_real_escape_string($cols[$k]);
before the $sql statement.
Post a Comment
<< Home