πŸ“‹ Migration Summary

Source Tables: 3
Target Models: 9
Decisions: 8

Foundation migration transforming flat customer records into hierarchical account structure. Each customer becomes Entity + Account + Node with ltree-based hierarchy. This is Phase 1 - all other migrations depend on this.

🎯 Key Actions

⚠️ Key Challenges

DTX 1.0 β†’ DTX 2.0 Migration Plan: Customers & Accounts

Overview

Migration of customer/account data from DTX 1.0 to DTX 2.0. This is Phase 1 β€” the foundation that everything else depends on.

Important: Full production data is not available. Migration scripts must be built defensively with self-validation, dry-run mode, and graceful error handling. All assumptions are based on a 100-row sample dataset.

DTX 1.0 Source Tables

Table Purpose
customer Main customer/account record (47 fields)
customer_audit BEFORE/AFTER change snapshots
currency Currency reference table (100+ currencies)

DTX 2.0 Target Models

Model App Purpose
Entity docutrax_hierarchy Real-world entity (name, contacts, address)
Account docutrax_hierarchy Account settings, config, emails
Node docutrax_hierarchy Hierarchy node with ltree path
Directory docutrax_hierarchy Tenant directory (for root accounts only)
AccountEmail docutrax_hierarchy Multiple emails per account (COI, override, reviewer)
EntityContact docutrax_hierarchy Multiple contacts per entity
Contact docutrax_utils Contact details (name, phone, fax, etc.)
ContactEmail docutrax_utils Multiple emails per contact
Address docutrax_utils Physical address

Records Created Per Customer

Each DTX 1.0 customer row becomes 3-5 DTX 2.0 records:

1. Entity (name, proper_ch_name, entity_type='account')
2. Account (all config fields, FK to Entity)
3. Node (ltree path, FK to Entity)
4. Directory (only for root-level customers, FK to root Account)
5. Address (if address fields populated)
6. Contact + ContactEmail (if contact fields populated)
7. EntityContact (links Contact to Entity)
8. AccountEmail (for each parsed email β€” COI, override, reviewer)

Field Mapping

Direct Matches β†’ Account

DTX 1.0 (customer) DTX 2.0 (Account) Notes
status status Map: ACTIVE β†’ active, INACTIVE β†’ suspended
division division Direct
management_type management_type Direct
business_intelligence business_intelligence Direct
comments comments Direct
coi_send_hold coi_send_hold Direct
doc_send_hold doc_send_hold Direct
domain domain Direct
internal_id internal_id Direct
landing_page_view landing_page_view Direct
last_active last_active Direct
letter_default letter_default Direct
logo_image logo File migration needed (old storage β†’ S3)
show_fields show_fields Map: "yes" β†’ True, else False
stripe_customer_id stripe_customer Direct
vendor_pay vendor_pay Map: truthy β†’ True
vendor_fee vendor_fee Direct
document_mgr document_mgr Direct
program_id program_id Direct (integer FK to program table)
sender_name sender_name Direct
sender_email sender_email Direct
country country Direct
create_time created (TimestampedModel) Direct
update_time modified (TimestampedModel) Direct

Direct Matches β†’ Entity

DTX 1.0 DTX 2.0 (Entity) Notes
name1 (company name) name See Decision #3 (name vs name1)
proper_ch_name proper_ch_name Direct
active active Map: "yes" β†’ True

Contact Fields β†’ Contact + EntityContact

DTX 1.0 DTX 2.0 Notes
contact_name Contact.name Create Contact, link via EntityContact
contact_title Contact.title
phone Contact.phone
phone_extension Contact.phone_ext
fax Contact.fax
mobile_phone Contact.mobile_phone
email ContactEmail.email PHP serialized β€” parse first

Address Fields β†’ Address

DTX 1.0 DTX 2.0 (Address) Notes
address_1 street
address_2 street2
city city
state state_or_province
zip postal_or_zip_code
country country

Email Fields β†’ AccountEmail (PHP serialized)

DTX 1.0 DTX 2.0 Notes
coi_email AccountEmail(email_type='coi') Direct (not serialized in sample)
override_notice_email AccountEmail(email_type='override') PHP serialized β€” can contain multiple
reviewer_email AccountEmail(email_type='reviewer') PHP serialized β€” can contain multiple
email ContactEmail on the Contact PHP serialized

Fields to Drop

DTX 1.0 Reason
database Legacy MySQL database name ("wwwriskt_customers") β€” not relevant
name (location name) See Decision #3 β€” may store as Entity.description or metadata
type All "R" in sample β€” meaning unclear, investigate with full data

Fields Requiring Downstream Migration

DTX 1.0 Depends On Notes
broker_id Area 9 (Brokers) FK to broker table β€” store as reference, link after broker migration
parent_id This migration (tree build) Determines hierarchy β€” see Decision #1

Decisions

Decision #1: parent_id β†’ ltree Hierarchy

Risk Level: HIGH

DTX 1.0 uses parent_id for a simple FK tree. DTX 2.0 uses PostgreSQL ltree materialized paths via the Node model.

Sample data observation: All 100 sample records have parent_id=NULL (all roots). The full dataset may have children β€” we cannot assume flat structure.

Migration approach

1. Load all customers into memory
2. Build adjacency list from parent_id
3. Validate tree integrity:
   - Detect orphans (parent_id β†’ non-existent customer_id)
   - Detect circular references
   - Report tree depth statistics
4. Topological sort (parents before children)
5. For each customer in sorted order:
   a. Create Entity
   b. Create Account (FK to Entity)
   c. If root (parent_id=NULL):
      - Create Directory
      - Create root Node with ltree path = generated_label
      - Set Directory.root_account = Account
   d. If child (parent_id set):
      - Look up parent's Node path
      - Create child Node with path = parent_path.child_label
   e. Create Address, Contact, AccountEmail records
   f. Record old_customer_id β†’ new_account_id in mapping table

ltree label generation

Labels must be valid ltree labels (alphanumeric + underscore, no dots). Strategy: - Use account_{id} as the label (e.g., account_4094) - Path example: account_4094.account_5001.account_5002

Decision #2: PHP Serialized Email Parsing

Risk Level: Medium

DTX 1.0 email fields use PHP serialize() format:

// Simple
a:1:{i:0;s:21:"ar@teboproperties.com";}

// Nested
a:1:{i:0;a:1:{i:0;s:18:"mamon@docutrax.com";}}

// Empty
a:1:{i:0;s:0:"";}

Migration approach

# Use phpserialize library or write a simple parser
# For each email field:
#   1. Try to parse PHP serialized format
#   2. Extract all email strings
#   3. Validate each with email regex
#   4. Create AccountEmail/ContactEmail records
#   5. First valid email = is_primary=True
#   6. Log failures to migration_issues.csv

Fallback for unparseable values

Decision #3: name vs name1

Risk Level: Medium

Sample data shows:

name:  "12225 East 39th Ave."     ← property/location name
name1: "Tebo Properties"          ← company name

Approach: - name1 β†’ Entity.name (this is the company/account name displayed in UI) - name β†’ store as metadata or Entity.description if different from name1 - If name1 is NULL/empty, fall back to name - Flag records where name and name1 are both populated and different

Cannot fully validate without full dataset. The migration script will log all records where name != name1 for review.

Decision #4: Status Mapping

DTX 1.0 DTX 2.0 Notes
ACTIVE active Direct
INACTIVE suspended Closest match
Unknown values pending Flag for review

The active field ("yes" / NULL) appears redundant with status. Use status as the primary source. If status=ACTIVE but active != "yes", flag for review.

Decision #5: Directory Creation

Each root-level customer (parent_id=NULL) needs a Directory.

Field Value Notes
slug Generated from name1 (slugified, unique) e.g., "tebo-properties"
directory_type docutrax_managed (default) DTX 1.0 customers were all Docutrax-managed
timezone America/New_York (default) Can be updated later
root_account FK to the created Account

If division='docutrax', set directory_type='docutrax_managed'. Otherwise, may need mapping logic based on other fields.

Decision #6: customer_audit

Approach: Don't migrate

The customer_audit table contains BEFORE/AFTER snapshots of customer record changes. This is historical audit data from the old system.

DTX 2.0 uses Django's built-in audit mechanisms. Migrating old audit trails would require recreating the old schema context, which provides no value.

Resolution: Drop. Keep CSV as archive for compliance if needed.

Decision #7: currency Table

Approach: Use code directly, seed reference data separately

DTX 1.0 has a currency reference table with 100+ rows. DTX 2.0 stores currency_code as a CharField on Account (default 'USD').

Resolution: - Don't migrate the currency table as a model - Map each customer's currency setting to Account.currency_code - If DTX 2.0 needs a currency dropdown, seed a reference table independently (ISO 4217 standard data)

Decision #8: Logo/Image File Migration

Approach: Migrate files to S3, update Account.logo

DTX 1.0 stores logo_image as a file path (e.g., "schnippers.jpg"). These files need to be: 1. Located in the old file storage 2. Uploaded to DTX 2.0's S3 bucket using org_prefixed_upload_path('account_logos') 3. Account.logo field updated with the new S3 path

Dependency: Requires access to DTX 1.0 file storage. Logos are stored at /efsdata/logos/ (200 MB total). The S3 upload will use the shared file migration utility built in Area 4 (Decision #5). This area maps the Account.logo field; Area 4 handles the actual S3 upload.

Note: The so_property table contains key-value metadata (Account Source, Industry Sector, Account Manager) that maps to Account.account_source, Account.industry_sector, Account.account_manager. These fields will be populated during the Area 3 (Vendors) migration since so_property is listed under that area's table inventory.


Defensive Migration Strategy

Since full production data is not available, the migration script must handle unknowns gracefully.

Dry-Run Mode

Before any data is written, the script runs in dry-run mode and produces a validation report:

=== DTX 1.0 Customer Migration - Dry Run Report ===

Total records:           ???
Root customers:          ???
Child customers:         ???
Max tree depth:          ???

Status distribution:     ACTIVE: ???, INACTIVE: ???, other: ???
Type distribution:       R: ???, other: ???

PHP email parse results:
  - email field:         ??? success, ??? failed
  - override_notice:     ??? success, ??? failed
  - reviewer_email:      ??? success, ??? failed

Name analysis:
  - name == name1:       ???
  - name != name1:       ???
  - name1 is NULL:       ???

Tree integrity:
  - Orphan records:      ??? (parent_id β†’ missing customer_id)
  - Circular references: ???
  - Max depth:           ???

Issues flagged:          ???
Ready to migrate:        ???

Error Handling

Scenario Behavior
PHP email parse fails Log to migration_issues.csv, skip email, continue
Orphan parent_id Log issue, create as root with flag
Circular reference Log issue, break cycle, create as root with flag
Duplicate slug Append numeric suffix (tebo-properties-2)
Missing required fields Log issue, skip record
Unknown status/type values Log issue, use default, flag for review

Output Files

File Purpose
id_mapping.csv old_customer_id β†’ new_account_id β†’ new_entity_id β†’ new_directory_id
migration_issues.csv All flagged records with issue description
migration_report.txt Summary statistics and validation results

Verification Steps (post-migration)

  1. Count check: DTX 1.0 customer count == DTX 2.0 Account count (minus skipped)
  2. ID mapping completeness: every customer_id used in other tables exists in mapping
  3. Tree integrity: all Node paths are valid ltree, no orphans
  4. Email check: spot-check parsed emails against originals
  5. Name check: spot-check Entity.name values against original name1/name

Migration Order (within this area)

1. Validate and report (dry-run)
2. Create Address records (no dependencies)
3. Create Contact + ContactEmail records (no dependencies)
4. Create Entity records (links to Address, Contact via EntityContact)
5. Create Account records (FK to Entity)
6. Create Node records with ltree paths (FK to Entity, requires parent ordering)
7. Create Directory records (FK to root Account, only for roots)
8. Create AccountEmail records (FK to Account, from parsed PHP emails)
9. Generate and export ID mapping table
10. Verify

Dependencies

Depends On What's Needed
Nothing This is Phase 1 β€” no prerequisites
Depended On By For
Area 1: Users User.current_account, AccountUserAssignment
Area 3: Vendors Vendor ↔ Account relationships
Area 4: Certificates & Documents Certificate/Document ↔ Account
Area 5: Coverage Coverage ↔ Account
Area 6: Requirements Requirements ↔ Account
Area 8: Notifications Notification schedules ↔ Account
Area 9: Brokers broker_id FK on customer
Area 11: File Repo File ↔ Account

Decision Summary

# Decision Status Approach
1 parent_id β†’ ltree hierarchy Resolved Topological sort, validate tree, generate ltree paths
2 PHP serialized emails Resolved Parse with fallback + error logging
3 name vs name1 Resolved name1 β†’ Entity.name, name β†’ metadata/description, flag mismatches
4 Status mapping Resolved ACTIVE β†’ active, INACTIVE β†’ suspended, unknown β†’ flag
5 Directory creation Resolved Auto-create for root customers, slugify name1
6 customer_audit Resolved Don't migrate β€” keep CSV archive
7 currency table Resolved Use code directly on Account.currency_code
8 Logo file migration Resolved Migrate files to S3, needs old file storage access