DenchClaw CSV Import: Handle Any Format
DenchClaw CSV import guide: handle messy headers, date normalization, deduplication, large files, and imports from LinkedIn, Apollo, HubSpot, and Typeform exports.
DenchClaw CSV Import: Handle Any Format
Every CRM migration, every lead list, every conference export lands as a CSV file. DenchClaw's import system handles most formats directly, but every real-world CSV has quirks — weird headers, mixed date formats, duplicates, extra columns you don't need.
Here's how to import any CSV file cleanly, fix common problems, and verify your data landed correctly.
The CSV Import Workflow#
The basic import command:
npx denchclaw import csv \
--file leads.csv \
--object people \
--dedupe emailThat's the minimum. For anything more complex, you'll add:
--field-mapto rename columns--skip-fieldsto ignore unwanted columns--default-valuesto set fields the CSV doesn't have--batch-sizefor large files--dry-runto preview without writing
Full import command reference#
npx denchclaw import csv \
--file /path/to/file.csv \
--object [people|companies|deals|activities] \
--field-map "CSV Header:crm_field,Another Header:another_field" \
--skip-fields "Unwanted Column,Another Useless Column" \
--default-values '{"status": "Lead", "source": "CSV Import"}' \
--dedupe [email|name|phone] \
--batch-size 500 \
--encoding utf-8 \
--dry-runHandling Messy Headers#
Real CSV files have headers like "First Name " (with trailing space), "E-Mail" (with hyphen), or "Phone #". DenchClaw's importer handles whitespace trimming automatically, but special characters need mapping.
Field mapping examples#
# Map messy headers to clean field names
npx denchclaw import csv \
--file contacts.csv \
--object people \
--field-map \
"First Name:first_name" \
"Last Name:last_name" \
"E-Mail:email" \
"Phone #:phone" \
"Job Title / Role:job_title" \
"Company / Organization:company_name" \
"LinkedIn Profile URL:linkedin_url"For a CSV with a combined "Name" field when you need separate first/last (DenchClaw stores a single name field, so combined is fine):
--field-map "Full Name:name"
# or if the CSV has both:
--field-map "First:first_name,Last:last_name"DenchClaw auto-concatenates first_name + last_name into name during import if both are provided.
Preview headers before importing#
# Quick preview of CSV structure
head -2 contacts.csv | tr ',' '\n'Or with Python:
import pandas as pd
df = pd.read_csv('contacts.csv', nrows=3)
print(df.columns.tolist())
print(df.head(2))Field Type Inference#
DenchClaw attempts to infer field types from the data. Here's how inference works and when it fails:
| Data Pattern | Inferred Type | Potential Issues |
|---|---|---|
john@example.com | ✅ Reliable | |
+1-555-123-4567 | phone | ✅ Reliable |
https://... | url | ✅ Reliable |
2024-03-15 | date (ISO) | ✅ Reliable |
March 15, 2024 | date (text) | ⚠️ May need normalization |
15/03/2024 | date (EU) | ⚠️ Ambiguous with US format |
true/false | boolean | ✅ Reliable |
1500.00 | number | ✅ Reliable |
$1,500.00 | text (with symbols) | ⚠️ Strip symbols first |
When inference fails or you want to be explicit, specify field types in the mapping:
--field-map "Close Date[date]:close_date,Value[number]:value,Active[boolean]:is_active"Date Format Normalization#
Date formats are the #1 source of import failures. Here's how to handle the common ones.
Normalize dates before import#
import pandas as pd
from dateutil import parser
df = pd.read_csv('contacts.csv')
# Handle multiple possible date formats in one column
def parse_date(val):
if pd.isna(val) or str(val).strip() == '':
return None
try:
return parser.parse(str(val)).strftime('%Y-%m-%d')
except:
return None
df['Created Date'] = df['Created Date'].apply(parse_date)
df['Last Contact'] = df['Last Contact'].apply(parse_date)
df.to_csv('contacts_clean.csv', index=False)Common date format conversions#
import pandas as pd
df = pd.read_csv('contacts.csv')
# MM/DD/YYYY → YYYY-MM-DD
df['date_field'] = pd.to_datetime(df['date_field'], format='%m/%d/%Y').dt.strftime('%Y-%m-%d')
# DD-Mon-YY (e.g., 15-Mar-24) → YYYY-MM-DD
df['date_field'] = pd.to_datetime(df['date_field'], format='%d-%b-%y').dt.strftime('%Y-%m-%d')
# Unix timestamp → YYYY-MM-DD
df['date_field'] = pd.to_datetime(df['date_field'], unit='ms').dt.strftime('%Y-%m-%d')Deduplication Strategy#
The --dedupe flag tells DenchClaw which field to use for checking existing records.
# Dedupe by email (most reliable for people)
npx denchclaw import csv --file leads.csv --object people --dedupe email
# Dedupe by name + company (for contacts without emails)
npx denchclaw import csv --file leads.csv --object people --dedupe "name,company_name"
# Dedupe by website for companies
npx denchclaw import csv --file companies.csv --object companies --dedupe websiteDeduplication behavior#
When a duplicate is found, DenchClaw's default behavior is skip (don't import the duplicate). You can change this:
# Skip duplicates (default)
--dedupe-strategy skip
# Update existing record with new CSV data
--dedupe-strategy update
# Update only empty fields (don't overwrite existing data)
--dedupe-strategy fill-emptyThe fill-empty strategy is useful for enrichment imports: you have a base list, then import a second CSV with additional fields (phone numbers, LinkedIn URLs) — this adds missing data without overwriting what you have.
Post-import deduplication#
If you skipped deduplication on import and need to clean up duplicates:
-- Find duplicate emails
SELECT email, COUNT(*) as count
FROM v_people
WHERE email IS NOT NULL
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY count DESC;
-- Find the duplicate entry IDs
SELECT id, name, email, created_at
FROM v_people
WHERE email IN (
SELECT email FROM v_people
GROUP BY email HAVING COUNT(*) > 1
)
ORDER BY email, created_at;Then delete the older duplicates:
-- Delete older duplicate (keep newest)
DELETE FROM entries
WHERE id IN (
SELECT id FROM (
SELECT id, email, created_at,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) as rn
FROM v_people
WHERE email IS NOT NULL
)
WHERE rn > 1
);Importing to Specific Objects#
People import#
npx denchclaw import csv \
--file contacts.csv \
--object people \
--default-values '{"status": "Lead", "source": "Imported"}' \
--dedupe emailCompanies import#
npx denchclaw import csv \
--file companies.csv \
--object companies \
--field-map "Company:name,URL:website,Industry:industry,Employees:employees" \
--dedupe websiteDeals import#
npx denchclaw import csv \
--file deals.csv \
--object deals \
--field-map "Opportunity:name,Stage:status,Amount:value,Close:close_date" \
--default-values '{"currency": "USD"}' \
--dedupe nameError Handling#
Common import errors and fixes:
Error: "Field not found: xyz"
The CSV header doesn't match any DenchClaw field. Fix: add a --field-map entry or create the field with npx denchclaw field create.
Error: "Invalid date format" The date column has inconsistent formats. Fix: normalize dates in Python before import (see date normalization section above).
Error: "Encoding error on row 45" Non-UTF-8 characters in the file. Fix:
# Convert to UTF-8
iconv -f latin1 -t utf-8 input.csv > output.csv
# or
iconv -f windows-1252 -t utf-8 input.csv > output.csvError: "Row count mismatch" CSV has rows with different column counts (often due to unescaped commas in fields). Fix: ensure all text fields with commas are quoted. Use Python to validate:
import csv
with open('contacts.csv') as f:
reader = csv.reader(f)
for i, row in enumerate(reader):
if len(row) != 10: # expected columns
print(f"Row {i}: {len(row)} columns — {row}")Large File Imports#
For files over 10,000 rows, use the batch import with DuckDB COPY for better performance:
# DuckDB direct import (fastest for large files)
duckdb workspace.duckdb << 'EOF'
-- Load CSV into a staging table
CREATE TABLE import_staging AS
SELECT * FROM read_csv_auto('/path/to/large-file.csv', header=true);
-- Check row count and preview
SELECT COUNT(*) FROM import_staging;
SELECT * FROM import_staging LIMIT 5;
EOFThen run DenchClaw's import with increased batch size:
npx denchclaw import csv \
--file large-file.csv \
--object people \
--batch-size 1000 \
--dedupe emailDuckDB's read_csv_auto handles format detection automatically and is extremely fast for large files.
Importing from Common Sources#
LinkedIn Sales Navigator export#
LinkedIn exports contacts as CSV with specific headers:
npx denchclaw import csv \
--file linkedin-export.csv \
--object people \
--field-map \
"First Name:first_name" \
"Last Name:last_name" \
"Email Address:email" \
"Company:company_name" \
"Title:job_title" \
"Connected On:created_at" \
"Profile URL:linkedin_url" \
--default-values '{"source": "LinkedIn", "status": "Lead"}' \
--dedupe emailApollo.io export#
Apollo exports with these key columns:
npx denchclaw import csv \
--file apollo-export.csv \
--object people \
--field-map \
"First Name:first_name" \
"Last Name:last_name" \
"Email:email" \
"Title:job_title" \
"Company:company_name" \
"Company Website:website" \
"Phone:phone" \
"City:city" \
"State:state" \
"Country:country" \
"LinkedIn URL:linkedin_url" \
"Technologies:tech_stack" \
--default-values '{"source": "Apollo", "status": "Lead"}' \
--dedupe emailHubSpot export#
npx denchclaw import csv \
--file hubspot-contacts.csv \
--object people \
--field-map \
"First Name:first_name" \
"Last Name:last_name" \
"Email:email" \
"Phone Number:phone" \
"Company Name:company_name" \
"Job Title:job_title" \
"Lifecycle Stage:status" \
"Create Date:created_at" \
--skip-fields "Record ID,Owner,HubSpot Score" \
--dedupe emailTypeform export#
Typeform response exports need field mapping since the headers are your question text:
npx denchclaw import csv \
--file typeform-responses.csv \
--object people \
--field-map \
"What is your name?:name" \
"What is your email address?:email" \
"What company do you work for?:company_name" \
"What is your phone number?:phone" \
"How did you hear about us?:lead_source" \
"Start Date:created_at" \
--default-values '{"source": "Typeform", "status": "Lead"}' \
--dedupe emailPost-Import Cleanup with DuckDB#
After any import, run these queries to verify data quality:
-- Data quality report
SELECT
COUNT(*) as total,
COUNT(*) FILTER(WHERE email IS NULL) as missing_email,
COUNT(*) FILTER(WHERE name IS NULL OR name = '') as missing_name,
COUNT(*) FILTER(WHERE phone IS NULL) as missing_phone,
COUNT(*) FILTER(WHERE company_name IS NULL) as missing_company
FROM v_people
WHERE created_at::DATE = TODAY();
-- Check for obviously bad emails
SELECT email FROM v_people
WHERE email NOT LIKE '%@%.%'
AND email IS NOT NULL;
-- Check status distribution
SELECT status, COUNT(*) as count
FROM v_people
GROUP BY status
ORDER BY count DESC;
-- Find entries that need enrichment (missing key fields)
SELECT name, email, company_name
FROM v_people
WHERE (phone IS NULL OR company_name IS NULL)
AND status = 'Lead'
ORDER BY created_at DESC
LIMIT 20;FAQ#
What's the maximum CSV file size DenchClaw can handle?
There's no hard limit — DuckDB is designed for analytical workloads and can handle millions of rows. For files over 100,000 rows, use --batch-size 1000 and expect the import to take a few minutes.
Can I import multiple CSV files at once? Not with a single command, but you can loop:
for f in exports/*.csv; do
npx denchclaw import csv --file "$f" --object people --dedupe email
doneWhat happens if a row fails during import?
By default, DenchClaw logs the error and continues with remaining rows. Check the import log with npx denchclaw import log --last to see any failures.
Can I undo a CSV import? Not with a single command. Use DuckDB to delete the imported entries by date:
DELETE FROM entries WHERE created_at::DATE = TODAY() AND id IN (
SELECT entry_id FROM entry_fields WHERE field_name = 'source' AND value = 'Apollo'
);Does DenchClaw support Excel files (.xlsx) directly?
Not directly — convert to CSV first using Excel, Google Sheets, or Python's pandas.read_excel().
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
