πŸ“‹ Migration Summary

Source Tables: 8
Target Models: 8
Decisions: 13

Largest single-table migration: DTX 1.0 vendor table (65 fields, ~104K production records) is a denormalized mega-table. Splitting into Entity + Record + Node + Assignment. Deduplication deferred to post-migration.

🎯 Key Actions

⚠️ Key Challenges

DTX 1.0 β†’ DTX 2.0 Migration Plan: Vendors & Records

Overview

Migration of vendor/record data from DTX 1.0 to DTX 2.0. This is Phase 3 β€” depends on Area 2 (Customers/Accounts). This is the largest and most complex single table migration β€” the DTX 1.0 vendor table has 65 fields and is a denormalized mega-table combining entity data, relationships, compliance status, and notification state.

Important: Full production data is not available. The so_summary table shows ~103,991 total vendor records in production. Migration scripts must handle this volume and be built defensively.

DTX 1.0 Source Tables

Table Purpose
vendor Main vendor record (65 fields β€” entity, relationship, compliance, notifications)
vendor_status Per-field compliance status (LIMIT, TERMS checks with min/actual values)
vendor_audit BEFORE/AFTER change snapshots
vendor_coi_cache Cached COI compliance data (massive PHP-serialized blobs)
vendor_display_cache Cached display/rendering data (massive PHP-serialized blobs)
vendor_create_time_audit Vendor creation timestamp audit
so_summary Global aggregate stats (1 row: totals for all vendors)
so_property Key-value metadata on customers (Account Source, Industry Sector, etc.)

DTX 2.0 Target Models

Model App Purpose
Entity docutrax_hierarchy Real-world entity (name, contacts, address)
Record docutrax_hierarchy Vendor-specific fields (tax_id, wc_exempt, risk profiles)
Node docutrax_hierarchy Position in hierarchy (ltree path under an Account)
Assignment docutrax_hierarchy Entity↔Node relationship with contact/address overrides
Contact docutrax_utils Contact details
ContactEmail docutrax_utils Multiple emails per contact
Address docutrax_utils Physical address
EntityContact docutrax_hierarchy Links contacts to entities

Records Created Per Vendor

Each DTX 1.0 vendor row becomes 4-7 DTX 2.0 records:

1. Entity (name, proper_ph_name, entity_type='record')
2. Record (FK to Entity, tax_id, wc_exempt, compliance fields)
3. Node (ltree path under the vendor's customer/Account tree)
4. Assignment (Entity β†’ Node, with contact/address overrides)
5. Address (if address fields populated)
6. Contact + ContactEmail (if contact fields populated)
7. EntityContact (links Contact to Entity)

Key Decision: Vendor Deduplication Strategy

Approach: Migrate 1:1, merge later. Do NOT deduplicate during migration.

Why deduplication during migration is a trap

In DTX 1.0, the same real-world vendor can appear as multiple rows β€” one per customer they work for. For example, "ABC Plumbing" working for 5 customers = 5 vendor rows.

DTX 2.0's Entity model is designed to be a single representation of a real-world thing. However, deduplicating during migration is too risky:

  1. Fuzzy matching is error-prone β€” matching "ABC Plumbing" vs "ABC Plumbing Inc" vs "A.B.C. Plumbing LLC" risks merging different companies (irreversible data loss) or missing duplicates.

  2. Per-customer data differs β€” even for the "same" vendor, each row has different contacts, addresses, requirements, compliance status, and notification state. All of this per-customer data is meaningful.

  3. Migration complexity β€” the migration is already complex enough. Adding fuzzy matching multiplies risk for zero immediate benefit.

  4. DTX 2.0 is designed for this β€” the Entity model docs explicitly state: "When Non-Docutrax end-users create new Entities, we will need a way to merge them with existing ones if they represent the same 'real world' thing." Merge is a planned future feature, not a migration concern.

Migration approach

Each vendor row becomes its own Entity + Record + Assignment:

DTX 1.0:  vendor row (vendor_id=6254, customer_id=713)
    ↓
DTX 2.0:
    Entity (name="Partners Advantage Insurance Services", entity_type='record')
    Record (FK to Entity, tax_id, wc_exempt, etc.)
    Node (under Account 713's hierarchy tree)
    Assignment (Entity β†’ Node, with contact/address)

Same vendor appearing under 3 customers = 3 Entities, 3 Records, 3 Assignments under 3 different Account trees.

Post-migration deduplication (separate project)

Once data is safely migrated and verified, run a dedup pass using:

Match strategy Confidence Action
Exact tax_id match High Auto-merge candidates
Exact name + exact address High Auto-merge candidates
Fuzzy name match (>90% similarity) Medium Flag for manual review
Name + city + state match Medium Flag for manual review
Everything else Low Leave as separate entities

This can be done incrementally, account by account, with no migration risk.


Field Mapping

Vendor Identity β†’ Entity

DTX 1.0 (vendor) DTX 2.0 (Entity) Notes
name1 (company name) name Same pattern as customer β€” name1 is the company
name description or metadata If different from name1, store as secondary name
proper_ph_name proper_ch_name Direct
active active Map: "yes" β†’ True
website website Direct
create_time created Direct
update_time modified Direct

Vendor-Specific β†’ Record

DTX 1.0 DTX 2.0 (Record) Notes
tax_id tax_id Direct
internal_id internal_id Direct
internal_id1 internal_id1 Direct
allow_email_adds allow_email_adds Map: "yes" β†’ True
invalid_email invalid_email Map: "yes" β†’ True
WCExempt wc_exempt Map: "yes" β†’ True / "no" β†’ False
comments comments Direct
proper_ph_name proper_ph_name Direct
requirements_id coi_risk_profile Requires Requirements migration (Area 6) β€” store as ref
document_set_id document_risk_profile Requires Requirements migration (Area 6) β€” store as ref
country Entity's Address.country Via Address model

Contact Fields β†’ Contact + EntityContact

DTX 1.0 DTX 2.0 Notes
contact_name Contact.name Create Contact, link via EntityContact
contact_title Contact.title
phone Contact.phone
phone_extension Contact.phone_ext
fax Contact.fax
mobile_phone Contact.mobile_phone
email ContactEmail.email PHP serialized β€” parse (can contain multiple)

Address Fields β†’ Address

DTX 1.0 DTX 2.0 (Address) Notes
address_1 street
address_2 street2
city city
state state_or_province
zip postal_or_zip_code
country country

Compliance Status Fields

DTX 1.0 stores 12+ compliance status/date columns directly on the vendor row:

DTX 1.0 Category Notes
vendor_status COI overall status COMPLIANT, NON-COMPLIANT, ALERT, etc.
vendor_expiration COI expiration date
vendor_request COI request status
vendor_request_date COI request date
vendor_expired COI expired status ALERT, SECOND, etc.
vendor_expired_date COI expired date
vendor_compliance COI compliance status
vendor_compliance_date COI compliance date
HolderLegacyDeficiency Holder check status
OpsDescLegacyDeficiency Ops description check status
document_status Document overall status COMPLIANT, OVERRIDE, etc.
document_expiration Document expiration date
document_request Document request status
document_request_date Document request date
document_expired Document expired status
document_expired_date Document expired date
document_compliance Document compliance status
document_compliance_date Document compliance date

Decision: Store as metadata or compute from source data.

DTX 2.0 computes compliance status dynamically from certificates and documents rather than storing it as static fields on the vendor. Options:

Approach Details
A. Don't migrate status fields DTX 2.0 recalculates compliance from actual certificates/documents after they're migrated. Most accurate.
B. Store as initial snapshot Import as Record metadata (JSON field) for historical reference. DTX 2.0 recalculates going forward.

Resolution: Option B β€” store as a JSON snapshot on Record for reference, but DTX 2.0 will recalculate compliance status from actual certificate/document data. This preserves history while ensuring 2.0 compliance is based on real data.

Notification/Ping Fields

DTX 1.0 Purpose Decision
ping_date Last notification sent date Drop β€” DTX 2.0 tracks via NotificationQueue
ping_to Notification recipients (PHP serialized) Drop β€” DTX 2.0 uses NotificationSchedule
ping_cc CC recipients (PHP serialized) Drop
ping_subject Last notification subject Drop
ping_message Last notification message Drop
ping_note Notification notes Drop
additional_language Extra notification language text Store as ref if needed for notification templates
custom_message_2 Custom notification message Store as ref if needed

Relationship Fields

DTX 1.0 DTX 2.0 Notes
customer_id Assignment → Node (under Account's tree) Via customer→account ID mapping
requirements_id Record.coi_risk_profile Store as ref. Area 4 creates RequirementsGroup from (customer_id, requirements_id, document_set_id). Area 6 backfills COIRiskProfile.
requirements_name Reference only
document_set_id Record.document_risk_profile Store as ref. Area 4 creates DocumentRiskProfile from document_set + RequirementsGroup.
document_set_name Reference only
old_requirements_id Drop Legacy reference
old_document_set_id Drop Legacy reference
user_id Entity.user (creator) Via user ID mapping

Fields to Drop

DTX 1.0 Reason
last_active Can be computed from activity
type Unclear meaning β€” investigate with full data
old_requirements_id Legacy reference
old_document_set_id Legacy reference

vendor_status β€” Per-field compliance checks

Field Purpose
vendor_id FK to vendor
customer_id FK to customer
status_code LIMIT, TERMS
field_name e.g., UMBLimAgg, UMBLimEaOcc, AutoAddlInsd
minimum Required minimum value
actual Actual value found
create_time When check was performed

Decision: Store as reference data. DTX 2.0 recalculates compliance from actual certificates/documents. This table is a historical record of past compliance checks. Store as a JSON snapshot or reference table for audit purposes.

vendor_coi_cache β€” Cached COI compliance data

Massive PHP-serialized blobs containing: - validation_data β€” all COI field values (policy numbers, limits, dates, carrier info) - waived_policies β€” policies with waived requirements - errors β€” compliance errors by status - policy_set β€” which certificates satisfy which policy types - compliant_policy_set β€” subset of compliant policies

Decision: Don't migrate. This is a computed cache derived from certificates and compliance rules. DTX 2.0 will rebuild this from migrated certificates + requirements. Migrating serialized PHP cache data into a different system's cache format provides no value.

vendor_display_cache β€” Cached display data

Similar to vendor_coi_cache but formatted for UI rendering. Contains PHP-serialized display objects.

Decision: Don't migrate. Same reasoning β€” computed cache, will be regenerated.

vendor_audit β€” Change audit trail

BEFORE/AFTER snapshots of vendor records.

Decision: Don't migrate. Historical audit data from old system. DTX 2.0 uses its own audit mechanisms. Keep CSV as archive.

vendor_create_time_audit β€” Creation timestamp audit

Simple audit of vendor_id, user_id, create_time, change_time.

Decision: Don't migrate. The create_time is preserved in Entity.created / Record timestamps. Keep CSV as archive.

so_summary β€” Global aggregate stats

Single row with global counts: total=103,991 vendors, compliant=53,781, non_compliant=50,210, etc.

Decision: Don't migrate. This is a dashboard cache that DTX 2.0 will compute dynamically. Useful only for validating migration counts post-migration.

Useful for: Verification β€” after migration, compare DTX 2.0 record counts against these numbers.

so_property β€” Customer key-value metadata

Key Example Value DTX 2.0 Mapping
Account Source "ProSight" Account.account_source
Industry Sector "Construction" Account.industry_sector
Account Manager "Construction Div" Account.account_manager

Decision: Migrate directly. These map to existing fields on the DTX 2.0 Account model. This is actually part of the Customer/Account migration (Area 2) but listed here since it's in the so_property table.

Migration: For each so_property row, look up the Account via customer_id mapping and set the corresponding field.


PHP Serialized Fields (same strategy as Area 2)

The following vendor fields use PHP serialize() format:

Field Example Action
email a:1:{i:0;s:22:"dina@morrischamber.org";} Parse β†’ ContactEmail records
ping_to a:0:{} Drop β€” notification data
ping_cc a:0:{} Drop β€” notification data

Parser handles: simple arrays, nested arrays, empty arrays, and malformed strings (log + skip).


Defensive Migration Strategy

Volume

Production has ~103,991 vendor records (per so_summary). At 4-7 DTX 2.0 records per vendor, this means ~400,000-700,000 records to create. Migration must be: - Batched β€” process in chunks of 1,000 to avoid memory issues - Transaction-wrapped β€” each batch in a transaction for rollback safety - Resumable β€” track progress so a failed run can resume from last successful batch

Dry-Run Report

=== DTX 1.0 Vendor Migration - Dry Run Report ===

Total vendors:           ???
Unique customer_ids:     ???
Customer IDs not in mapping: ??? (CRITICAL β€” these vendors can't be placed)

Status distribution:     COMPLIANT: ???, NON-COMPLIANT: ???, ALERT: ???, etc.
Active distribution:     yes: ???, no: ???

PHP email parse results:
  - email field:         ??? success, ??? failed

Name analysis:
  - name == name1:       ???
  - name != name1:       ???
  - name1 is NULL:       ???

Requirements/doc set references:
  - Unique requirements_ids: ???
  - Unique document_set_ids: ???

Issues flagged:          ???
Ready to migrate:        ???

Error Handling

Scenario Behavior
customer_id not in mapping CRITICAL β€” log to issues, skip vendor (can't place without account)
PHP email parse fails Log, skip email, continue with other fields
Missing required fields (name) Log, skip record
Duplicate vendor within same account Allow (dedup is post-migration)

Output Files

File Purpose
vendor_id_mapping.csv old_vendor_id β†’ new_entity_id β†’ new_record_id β†’ new_node_id β†’ new_assignment_id
migration_issues.csv All flagged records with issue description
migration_report.txt Summary statistics
compliance_snapshot.json Historical compliance status data (from vendor status fields)

Migration Steps

1. Validate prerequisites:
   - Customer β†’ Account ID mapping must exist and be verified
   - Account hierarchy (Nodes) must exist

2. Dry-run validation report

3. For each vendor (batched, 1000 at a time):
   a. Look up Account via customer_id mapping
   b. Create Entity (name=name1, entity_type='record')
   c. Create Record (FK to Entity, vendor-specific fields)
   d. Create Node under Account's hierarchy tree
   e. Create Assignment (Entity β†’ Node)
   f. Create Address (if address fields populated)
   g. Create Contact + ContactEmail (if contact fields, parse PHP email)
   h. Create EntityContact (link Contact β†’ Entity)
   i. Store compliance status as JSON snapshot on Record
   j. Record vendor_id β†’ new IDs in mapping table

4. Migrate so_property β†’ Account fields (Account Source, Industry Sector, Account Manager)

5. Verify:
   - Count check against so_summary totals
   - ID mapping completeness
   - Spot-check vendor data against originals

Dependencies

Depends On What's Needed
Area 2: Customers/Accounts Customer β†’ Account ID mapping, Account hierarchy
Depended On By For
Area 4: Certificates & Documents Certificate/Document β†’ vendor (via vendor_id mapping). Also: Area 4 creates RequirementsGroupMember to link each vendor's Assignment to its RequirementsGroup.
Area 5: Coverage Coverage records β†’ vendor
Area 6: Requirements Requirements assignment β†’ vendor (coi_risk_profile, document_risk_profile)
Area 7: Extraction/OCR Extraction results β†’ vendor's certificates
Area 8: Notifications Notification state β†’ vendor

Decision Summary

# Decision Status Approach
1 Vendor deduplication Resolved Don't deduplicate during migration. Migrate 1:1, merge later as separate project.
2 Compliance status fields Resolved Store as JSON snapshot on Record. DTX 2.0 recalculates from actual data.
3 vendor_coi_cache Resolved Don't migrate β€” computed cache, will be regenerated.
4 vendor_display_cache Resolved Don't migrate β€” computed cache, will be regenerated.
5 so_summary Resolved Don't migrate β€” use for post-migration count verification.
6 so_property Resolved Migrate to Account fields (account_source, industry_sector, account_manager).
7 vendor_audit Resolved Don't migrate β€” keep CSV archive.
8 vendor_create_time_audit Resolved Don't migrate β€” timestamps preserved in Entity/Record.
9 PHP serialized emails Resolved Parse with fallback + error logging (same as Area 2).
10 name vs name1 Resolved name1 β†’ Entity.name, name β†’ description/metadata, flag mismatches.
11 Notification/ping fields Resolved Drop β€” DTX 2.0 uses NotificationSchedule/Queue.
12 Requirements/doc set FKs Deferred Store as reference. Link after Area 6 (Requirements) migration.
13 Volume handling Resolved Batched (1000/batch), transaction-wrapped, resumable.

Post-Migration: Deduplication Project

After migration is verified, run vendor deduplication as a separate project:

Phase Action
1. Analysis Generate duplicate candidate report
2. Auto-merge (high confidence) Exact tax_id match, exact name + address match
3. Review (medium confidence) Fuzzy name match (>90%), name + city + state match
4. Ignore (low confidence) Leave as separate entities

Merge operation: Keep one Entity, reassign all Assignments/Nodes to the surviving Entity, archive the duplicates.