5.3 KiB
PostgreSQL vs MySQL Parser Comparison Analysis
Overview
This document compares how the PostgreSQL and MySQL parsers in ChartDB handle SQL parsing, focusing on the differences that could cause the same SQL file to produce different results.
1. SQL Sanitization and Comment Handling
PostgreSQL Parser (postgresql-improved.ts)
Comment Removal Strategy:
- Order: Comments are removed FIRST, before any other processing
- Multi-line comments: Removed using regex:
/\/\*[\s\S]*?\*\//g - Single-line comments: Removed line-by-line, checking for
--while respecting string boundaries - String-aware: Preserves
--inside quoted strings
// PostgreSQL approach (lines 60-100)
// 1. First removes ALL multi-line comments
cleanedSQL = cleanedSQL.replace(/\/\*[\s\S]*?\*\//g, '');
// 2. Then processes single-line comments while respecting strings
for (let i = 0; i < line.length; i++) {
// Tracks if we're inside a string to avoid removing -- inside quotes
}
MySQL Parser (mysql-improved.ts)
Comment Removal Strategy:
- Order: Comments are sanitized but with special handling for problematic patterns
- Special handling: Specifically fixes multi-line comments that contain quotes or JSON
- Line-by-line: Processes comments line by line, removing lines that start with
--or#
// MySQL approach (lines 35-67)
// 1. First fixes specific problematic patterns
result = result.replace(/--\s*"[^"]*",?\s*\n\s*"[^"]*".*$/gm, function(match) {
return match.replace(/\n/g, ' ');
});
// 2. Then removes comment lines entirely
.map((line) => {
if (trimmed.startsWith('--') || trimmed.startsWith('#')) {
return '';
}
return line;
})
Key Difference: PostgreSQL removes ALL comments upfront, while MySQL tries to fix problematic comment patterns first, then removes comment lines.
2. Order of Operations
PostgreSQL Parser
- Preprocess SQL (removes all comments first)
- Split statements by semicolons (handles dollar quotes)
- Categorize statements (table, index, alter, etc.)
- Parse with node-sql-parser
- Fallback to regex if parser fails
- Extract relationships
MySQL Parser
- Validate syntax (checks for known issues)
- Sanitize SQL (fixes problematic patterns)
- Extract statements by semicolons
- Parse with node-sql-parser
- Fallback to regex if parser fails
- Process relationships
Key Difference: MySQL validates BEFORE sanitizing, while PostgreSQL sanitizes first. This means MySQL can detect and report issues that PostgreSQL might silently fix.
3. Multi-line Comment Handling
PostgreSQL
- Removes ALL multi-line comments using
[\s\S]*?pattern - No special handling for comments containing quotes or JSON
- Clean removal before any parsing
MySQL
- Specifically detects and fixes multi-line comments with quotes:
-- "Beliebt", "Empfohlen" -- This breaks MySQL parser - Detects JSON arrays in comments spanning lines:
-- [ "Ubuntu 22.04", "CentOS 8" ] -- This also breaks MySQL parser - Converts these to single-line comments before parsing
Key Difference: MySQL has specific handling for problematic comment patterns that PostgreSQL simply removes entirely.
4. Statement Splitting
PostgreSQL
- Handles PostgreSQL-specific dollar quotes (
$$ ... $$) - Tracks quote depth for proper splitting
- Supports function bodies with dollar quotes
MySQL
- Simple quote tracking (single, double, backtick)
- Handles escape sequences (
\) - No special quote constructs
5. Validation Approach
PostgreSQL
- No pre-validation
- Relies on parser and fallback regex
- Reports warnings for unsupported features
MySQL
- Pre-validates SQL before parsing
- Detects known problematic patterns:
- Multi-line comments with quotes
- JSON arrays in comments
- Inline REFERENCES (PostgreSQL syntax)
- Missing semicolons
- Can reject SQL before attempting to parse
6. Why Same SQL Gives Different Results
Example Problematic SQL:
CREATE TABLE products (
id INT PRIMARY KEY,
status VARCHAR(50), -- "active",
"inactive", "pending"
data JSON -- [
{"key": "value"},
{"key": "value2"}
]
);
PostgreSQL Result:
- Successfully parses (comments are removed entirely)
- Table created with proper columns
MySQL Result:
- Validation fails with errors:
- MULTILINE_COMMENT_QUOTE at line 3
- MULTILINE_JSON_COMMENT at line 5
- Import blocked unless validation is skipped
7. Recommendations
-
For Cross-Database Compatibility:
- Avoid multi-line comments with quotes or JSON
- Keep comments on single lines
- Use proper FOREIGN KEY syntax instead of inline REFERENCES
-
For MySQL Import:
- Fix validation errors before import
- Or use
skipValidation: trueoption if SQL is known to work
-
For PostgreSQL Import:
- Be aware that comments are stripped entirely
- Complex comments might hide syntax issues
Conclusion
The main difference is that PostgreSQL takes a "remove all comments first" approach, while MySQL tries to detect and handle problematic comment patterns. This makes PostgreSQL more forgiving but MySQL more explicit about potential issues. The same SQL file can succeed in PostgreSQL but fail in MySQL if it contains multi-line comments with special characters.