How to Clean Excel Data Before Running SQL Queries
Raw spreadsheet data is messy. Before you can use it in a database query, you need to clean it. Here's how to handle the common problems.
Problem 1: Duplicates
Your list has the same ID multiple times:
| SKU ID |
|---|
| SKU-001 |
| SKU-002 |
| SKU-001 |
| SKU-003 |
| SKU-002 |
Running this in an IN clause wastes processing time. Enable "Remove Duplicates" to get:
| SKU ID |
|---|
| SKU-001 |
| SKU-002 |
| SKU-003 |
Problem 2: Whitespace
Copy-paste from spreadsheets often adds hidden spaces:
| ID (with spaces) |
|---|
| 12345 |
| 67890 |
| 11223 |
These won't match your database records. "Trim Whitespace" fixes it:
| ID (trimmed) |
|---|
| 12345 |
| 67890 |
| 11223 |
Problem 3: Leading Zeros
Part numbers and ZIP codes often have leading zeros:
| Part Number |
|---|
| 00123 |
| 00456 |
| 00789 |
Excel sometimes drops them, or your database expects them. Two options:
Trim Leading Zeros: Removes them for integer comparison
| ID (trimmed) |
|---|
| 123 |
| 456 |
| 789 |
Fill Leading Zeros: Pads to a consistent length (e.g., 5 digits)
| ID (padded) |
|---|
| 00123 |
| 00456 |
| 00789 |
Problem 4: Trailing Punctuation
Exports sometimes include stray punctuation:
| ID (with punctuation) |
|---|
| 12345, |
| 67890. |
| 11223; |
Enable "Trim Punctuation" to clean it up:
| ID (cleaned) |
|---|
| 12345 |
| 67890 |
| 11223 |
Putting It Together
Before:
| SKU (dirty) |
|---|
| SKU-001, |
| SKU-002 |
| SKU-001. |
| SKU-003 |
After (with all cleaning enabled):
IN ('SKU-001', 'SKU-002', 'SKU-003')Three duplicates removed, whitespace trimmed, punctuation stripped, proper SQL formatting applied.
When to Keep Duplicates
Sometimes duplicates matter:
- Counting occurrences in test data
- Verifying data integrity
- Reproducing specific scenarios
Toggle off "Remove Duplicates" when you need the original count.