Foundation migration transforming flat customer records into hierarchical account structure. Each customer becomes Entity + Account + Node with ltree-based hierarchy. This is Phase 1 - all other migrations depend on this.
Migration of customer/account data from DTX 1.0 to DTX 2.0. This is Phase 1 β the foundation that everything else depends on.
Important: Full production data is not available. Migration scripts must be built defensively with self-validation, dry-run mode, and graceful error handling. All assumptions are based on a 100-row sample dataset.
| Table | Purpose |
|---|---|
customer |
Main customer/account record (47 fields) |
customer_audit |
BEFORE/AFTER change snapshots |
currency |
Currency reference table (100+ currencies) |
| Model | App | Purpose |
|---|---|---|
Entity |
docutrax_hierarchy |
Real-world entity (name, contacts, address) |
Account |
docutrax_hierarchy |
Account settings, config, emails |
Node |
docutrax_hierarchy |
Hierarchy node with ltree path |
Directory |
docutrax_hierarchy |
Tenant directory (for root accounts only) |
AccountEmail |
docutrax_hierarchy |
Multiple emails per account (COI, override, reviewer) |
EntityContact |
docutrax_hierarchy |
Multiple contacts per entity |
Contact |
docutrax_utils |
Contact details (name, phone, fax, etc.) |
ContactEmail |
docutrax_utils |
Multiple emails per contact |
Address |
docutrax_utils |
Physical address |
Each DTX 1.0 customer row becomes 3-5 DTX 2.0 records:
1. Entity (name, proper_ch_name, entity_type='account')
2. Account (all config fields, FK to Entity)
3. Node (ltree path, FK to Entity)
4. Directory (only for root-level customers, FK to root Account)
5. Address (if address fields populated)
6. Contact + ContactEmail (if contact fields populated)
7. EntityContact (links Contact to Entity)
8. AccountEmail (for each parsed email β COI, override, reviewer)
DTX 1.0 (customer) |
DTX 2.0 (Account) |
Notes |
|---|---|---|
status |
status |
Map: ACTIVE β active, INACTIVE β suspended |
division |
division |
Direct |
management_type |
management_type |
Direct |
business_intelligence |
business_intelligence |
Direct |
comments |
comments |
Direct |
coi_send_hold |
coi_send_hold |
Direct |
doc_send_hold |
doc_send_hold |
Direct |
domain |
domain |
Direct |
internal_id |
internal_id |
Direct |
landing_page_view |
landing_page_view |
Direct |
last_active |
last_active |
Direct |
letter_default |
letter_default |
Direct |
logo_image |
logo |
File migration needed (old storage β S3) |
show_fields |
show_fields |
Map: "yes" β True, else False |
stripe_customer_id |
stripe_customer |
Direct |
vendor_pay |
vendor_pay |
Map: truthy β True |
vendor_fee |
vendor_fee |
Direct |
document_mgr |
document_mgr |
Direct |
program_id |
program_id |
Direct (integer FK to program table) |
sender_name |
sender_name |
Direct |
sender_email |
sender_email |
Direct |
country |
country |
Direct |
create_time |
created (TimestampedModel) |
Direct |
update_time |
modified (TimestampedModel) |
Direct |
| DTX 1.0 | DTX 2.0 (Entity) |
Notes |
|---|---|---|
name1 (company name) |
name |
See Decision #3 (name vs name1) |
proper_ch_name |
proper_ch_name |
Direct |
active |
active |
Map: "yes" β True |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
contact_name |
Contact.name |
Create Contact, link via EntityContact |
contact_title |
Contact.title |
|
phone |
Contact.phone |
|
phone_extension |
Contact.phone_ext |
|
fax |
Contact.fax |
|
mobile_phone |
Contact.mobile_phone |
|
email |
ContactEmail.email |
PHP serialized β parse first |
| DTX 1.0 | DTX 2.0 (Address) |
Notes |
|---|---|---|
address_1 |
street |
|
address_2 |
street2 |
|
city |
city |
|
state |
state_or_province |
|
zip |
postal_or_zip_code |
|
country |
country |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
coi_email |
AccountEmail(email_type='coi') |
Direct (not serialized in sample) |
override_notice_email |
AccountEmail(email_type='override') |
PHP serialized β can contain multiple |
reviewer_email |
AccountEmail(email_type='reviewer') |
PHP serialized β can contain multiple |
email |
ContactEmail on the Contact |
PHP serialized |
| DTX 1.0 | Reason |
|---|---|
database |
Legacy MySQL database name ("wwwriskt_customers") β not relevant |
name (location name) |
See Decision #3 β may store as Entity.description or metadata |
type |
All "R" in sample β meaning unclear, investigate with full data |
| DTX 1.0 | Depends On | Notes |
|---|---|---|
broker_id |
Area 9 (Brokers) | FK to broker table β store as reference, link after broker migration |
parent_id |
This migration (tree build) | Determines hierarchy β see Decision #1 |
Risk Level: HIGH
DTX 1.0 uses parent_id for a simple FK tree. DTX 2.0 uses PostgreSQL ltree materialized paths via the Node model.
Sample data observation: All 100 sample records have parent_id=NULL (all roots). The full dataset may have children β we cannot assume flat structure.
1. Load all customers into memory
2. Build adjacency list from parent_id
3. Validate tree integrity:
- Detect orphans (parent_id β non-existent customer_id)
- Detect circular references
- Report tree depth statistics
4. Topological sort (parents before children)
5. For each customer in sorted order:
a. Create Entity
b. Create Account (FK to Entity)
c. If root (parent_id=NULL):
- Create Directory
- Create root Node with ltree path = generated_label
- Set Directory.root_account = Account
d. If child (parent_id set):
- Look up parent's Node path
- Create child Node with path = parent_path.child_label
e. Create Address, Contact, AccountEmail records
f. Record old_customer_id β new_account_id in mapping table
Labels must be valid ltree labels (alphanumeric + underscore, no dots). Strategy:
- Use account_{id} as the label (e.g., account_4094)
- Path example: account_4094.account_5001.account_5002
Risk Level: Medium
DTX 1.0 email fields use PHP serialize() format:
// Simple
a:1:{i:0;s:21:"ar@teboproperties.com";}
// Nested
a:1:{i:0;a:1:{i:0;s:18:"mamon@docutrax.com";}}
// Empty
a:1:{i:0;s:0:"";}
# Use phpserialize library or write a simple parser
# For each email field:
# 1. Try to parse PHP serialized format
# 2. Extract all email strings
# 3. Validate each with email regex
# 4. Create AccountEmail/ContactEmail records
# 5. First valid email = is_primary=True
# 6. Log failures to migration_issues.csv
name vs name1Risk Level: Medium
Sample data shows:
name: "12225 East 39th Ave." β property/location name
name1: "Tebo Properties" β company name
Approach:
- name1 β Entity.name (this is the company/account name displayed in UI)
- name β store as metadata or Entity.description if different from name1
- If name1 is NULL/empty, fall back to name
- Flag records where name and name1 are both populated and different
Cannot fully validate without full dataset. The migration script will log all records where name != name1 for review.
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
ACTIVE |
active |
Direct |
INACTIVE |
suspended |
Closest match |
| Unknown values | pending |
Flag for review |
The active field ("yes" / NULL) appears redundant with status. Use status as the primary source. If status=ACTIVE but active != "yes", flag for review.
Each root-level customer (parent_id=NULL) needs a Directory.
| Field | Value | Notes |
|---|---|---|
slug |
Generated from name1 (slugified, unique) |
e.g., "tebo-properties" |
directory_type |
docutrax_managed (default) |
DTX 1.0 customers were all Docutrax-managed |
timezone |
America/New_York (default) |
Can be updated later |
root_account |
FK to the created Account |
If division='docutrax', set directory_type='docutrax_managed'. Otherwise, may need mapping logic based on other fields.
Approach: Don't migrate
The customer_audit table contains BEFORE/AFTER snapshots of customer record changes. This is historical audit data from the old system.
DTX 2.0 uses Django's built-in audit mechanisms. Migrating old audit trails would require recreating the old schema context, which provides no value.
Resolution: Drop. Keep CSV as archive for compliance if needed.
Approach: Use code directly, seed reference data separately
DTX 1.0 has a currency reference table with 100+ rows. DTX 2.0 stores currency_code as a CharField on Account (default 'USD').
Resolution:
- Don't migrate the currency table as a model
- Map each customer's currency setting to Account.currency_code
- If DTX 2.0 needs a currency dropdown, seed a reference table independently (ISO 4217 standard data)
Approach: Migrate files to S3, update Account.logo
DTX 1.0 stores logo_image as a file path (e.g., "schnippers.jpg"). These files need to be:
1. Located in the old file storage
2. Uploaded to DTX 2.0's S3 bucket using org_prefixed_upload_path('account_logos')
3. Account.logo field updated with the new S3 path
Dependency: Requires access to DTX 1.0 file storage. Logos are stored at /efsdata/logos/ (200 MB total). The S3 upload will use the shared file migration utility built in Area 4 (Decision #5). This area maps the Account.logo field; Area 4 handles the actual S3 upload.
Note: The so_property table contains key-value metadata (Account Source, Industry Sector, Account Manager) that maps to Account.account_source, Account.industry_sector, Account.account_manager. These fields will be populated during the Area 3 (Vendors) migration since so_property is listed under that area's table inventory.
Since full production data is not available, the migration script must handle unknowns gracefully.
Before any data is written, the script runs in dry-run mode and produces a validation report:
=== DTX 1.0 Customer Migration - Dry Run Report ===
Total records: ???
Root customers: ???
Child customers: ???
Max tree depth: ???
Status distribution: ACTIVE: ???, INACTIVE: ???, other: ???
Type distribution: R: ???, other: ???
PHP email parse results:
- email field: ??? success, ??? failed
- override_notice: ??? success, ??? failed
- reviewer_email: ??? success, ??? failed
Name analysis:
- name == name1: ???
- name != name1: ???
- name1 is NULL: ???
Tree integrity:
- Orphan records: ??? (parent_id β missing customer_id)
- Circular references: ???
- Max depth: ???
Issues flagged: ???
Ready to migrate: ???
| Scenario | Behavior |
|---|---|
| PHP email parse fails | Log to migration_issues.csv, skip email, continue |
| Orphan parent_id | Log issue, create as root with flag |
| Circular reference | Log issue, break cycle, create as root with flag |
| Duplicate slug | Append numeric suffix (tebo-properties-2) |
| Missing required fields | Log issue, skip record |
| Unknown status/type values | Log issue, use default, flag for review |
| File | Purpose |
|---|---|
id_mapping.csv |
old_customer_id β new_account_id β new_entity_id β new_directory_id |
migration_issues.csv |
All flagged records with issue description |
migration_report.txt |
Summary statistics and validation results |
customer_id used in other tables exists in mapping1. Validate and report (dry-run)
2. Create Address records (no dependencies)
3. Create Contact + ContactEmail records (no dependencies)
4. Create Entity records (links to Address, Contact via EntityContact)
5. Create Account records (FK to Entity)
6. Create Node records with ltree paths (FK to Entity, requires parent ordering)
7. Create Directory records (FK to root Account, only for roots)
8. Create AccountEmail records (FK to Account, from parsed PHP emails)
9. Generate and export ID mapping table
10. Verify
| Depends On | What's Needed |
|---|---|
| Nothing | This is Phase 1 β no prerequisites |
| Depended On By | For |
|---|---|
| Area 1: Users | User.current_account, AccountUserAssignment |
| Area 3: Vendors | Vendor β Account relationships |
| Area 4: Certificates & Documents | Certificate/Document β Account |
| Area 5: Coverage | Coverage β Account |
| Area 6: Requirements | Requirements β Account |
| Area 8: Notifications | Notification schedules β Account |
| Area 9: Brokers | broker_id FK on customer |
| Area 11: File Repo | File β Account |
| # | Decision | Status | Approach |
|---|---|---|---|
| 1 | parent_id β ltree hierarchy | Resolved | Topological sort, validate tree, generate ltree paths |
| 2 | PHP serialized emails | Resolved | Parse with fallback + error logging |
| 3 | name vs name1 | Resolved | name1 β Entity.name, name β metadata/description, flag mismatches |
| 4 | Status mapping | Resolved | ACTIVE β active, INACTIVE β suspended, unknown β flag |
| 5 | Directory creation | Resolved | Auto-create for root customers, slugify name1 |
| 6 | customer_audit | Resolved | Don't migrate β keep CSV archive |
| 7 | currency table | Resolved | Use code directly on Account.currency_code |
| 8 | Logo file migration | Resolved | Migrate files to S3, needs old file storage access |