Merge multiple CSV/Excel files with intelligent column matching, data deduplication, and conflict resolution. Handles different schemas, formats, and combines data sources. Use when users need to merge spreadsheets, combine data exports, or consolidate multiple files into one.
Intelligently merge multiple CSV or Excel files with automatic column matching and data deduplication.
When a user needs to merge CSV or Excel files:
Analyze Input Files:
Inspect File Structures:
Create Merge Strategy:
Column Matching:
Conflict Resolution (when same record appears in multiple files):
Deduplication:
Perform Merge:
# Example merge logic
import pandas as pd
# Read files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# Normalize column names
df1.columns = df1.columns.str.lower().str.strip()
df2.columns = df2.columns.str.lower().str.strip()
# Map similar columns
column_mapping = {
'firstname': 'first_name',
'e_mail': 'email',
# ...
}
df2 = df2.rename(columns=column_mapping)
# Merge
merged = pd.concat([df1, df2], ignore_index=True)
# Deduplicate
merged = merged.drop_duplicates(subset=['email'], keep='last')
# Save
merged.to_csv('merged_output.csv', index=False)
Format Output:
📊 CSV/EXCEL MERGER REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 INPUT FILES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File 1: contacts_jan.csv
Rows: 1,245
Columns: 8 (name, email, phone, company, ...)
File 2: contacts_feb.csv
Rows: 987
Columns: 9 (firstname, lastname, email, mobile, ...)
File 3: leads_export.xlsx
Rows: 2,103
Columns: 12 (full_name, email_address, phone, ...)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔄 COLUMN MAPPING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Unified Schema:
• first_name ← [firstname, first name, fname]
• last_name ← [lastname, last name, lname]
• email ← [email, e-mail, email_address]
• phone ← [phone, mobile, phone_number, tel]
• company ← [company, organization, org]
• title ← [title, job_title, position]
• source ← [file origin tracking]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 MERGE ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total rows before merge: 4,335
Duplicate records found: 892
Conflicts detected: 47
Deduplication Strategy: Keep most recent (by source file date)
Primary Key: email
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ CONFLICTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Record: [email protected]
File 1 phone: (555) 123-4567
File 2 phone: (555) 987-6543
Resolution: Kept most recent (File 2)
[List top 10 conflicts]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ MERGE RESULTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Output File: merged_contacts.csv
Total Rows: 3,443
Columns: 7
Duplicates Removed: 892
Breakdown by Source:
• contacts_jan.csv: 1,245 rows (398 unique)
• contacts_feb.csv: 987 rows (521 unique)
• leads_export.xlsx: 2,103 rows (2,524 unique)
Data Quality:
• Email completeness: 98.2%
• Phone completeness: 87.5%
• Company completeness: 91.3%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• Review 47 conflict records manually
• Standardize phone number format
• Fill missing company names (8.7% incomplete)
• Export conflicts to: conflicts_review.csv
Handle Special Cases:
Multiple Primary Keys:
Different Data Types:
Missing Columns:
Large Files:
Generate Code: Provide Python/pandas script that:
Export Options:
Column Matching:
Data Quality:
Performance:
Transparency:
Ensure merges:
Generate clean, deduplicated merged files with full transparency and data quality checks.