Largest single-table migration: DTX 1.0 vendor table (65 fields, ~104K production records) is a denormalized mega-table. Splitting into Entity + Record + Node + Assignment. Deduplication deferred to post-migration.
Migration of vendor/record data from DTX 1.0 to DTX 2.0. This is Phase 3 β depends on Area 2 (Customers/Accounts). This is the largest and most complex single table migration β the DTX 1.0 vendor table has 65 fields and is a denormalized mega-table combining entity data, relationships, compliance status, and notification state.
Important: Full production data is not available. The so_summary table shows ~103,991 total vendor records in production. Migration scripts must handle this volume and be built defensively.
| Table | Purpose |
|---|---|
vendor |
Main vendor record (65 fields β entity, relationship, compliance, notifications) |
vendor_status |
Per-field compliance status (LIMIT, TERMS checks with min/actual values) |
vendor_audit |
BEFORE/AFTER change snapshots |
vendor_coi_cache |
Cached COI compliance data (massive PHP-serialized blobs) |
vendor_display_cache |
Cached display/rendering data (massive PHP-serialized blobs) |
vendor_create_time_audit |
Vendor creation timestamp audit |
so_summary |
Global aggregate stats (1 row: totals for all vendors) |
so_property |
Key-value metadata on customers (Account Source, Industry Sector, etc.) |
| Model | App | Purpose |
|---|---|---|
Entity |
docutrax_hierarchy |
Real-world entity (name, contacts, address) |
Record |
docutrax_hierarchy |
Vendor-specific fields (tax_id, wc_exempt, risk profiles) |
Node |
docutrax_hierarchy |
Position in hierarchy (ltree path under an Account) |
Assignment |
docutrax_hierarchy |
EntityβNode relationship with contact/address overrides |
Contact |
docutrax_utils |
Contact details |
ContactEmail |
docutrax_utils |
Multiple emails per contact |
Address |
docutrax_utils |
Physical address |
EntityContact |
docutrax_hierarchy |
Links contacts to entities |
Each DTX 1.0 vendor row becomes 4-7 DTX 2.0 records:
1. Entity (name, proper_ph_name, entity_type='record')
2. Record (FK to Entity, tax_id, wc_exempt, compliance fields)
3. Node (ltree path under the vendor's customer/Account tree)
4. Assignment (Entity β Node, with contact/address overrides)
5. Address (if address fields populated)
6. Contact + ContactEmail (if contact fields populated)
7. EntityContact (links Contact to Entity)
Approach: Migrate 1:1, merge later. Do NOT deduplicate during migration.
In DTX 1.0, the same real-world vendor can appear as multiple rows β one per customer they work for. For example, "ABC Plumbing" working for 5 customers = 5 vendor rows.
DTX 2.0's Entity model is designed to be a single representation of a real-world thing. However, deduplicating during migration is too risky:
Fuzzy matching is error-prone β matching "ABC Plumbing" vs "ABC Plumbing Inc" vs "A.B.C. Plumbing LLC" risks merging different companies (irreversible data loss) or missing duplicates.
Per-customer data differs β even for the "same" vendor, each row has different contacts, addresses, requirements, compliance status, and notification state. All of this per-customer data is meaningful.
Migration complexity β the migration is already complex enough. Adding fuzzy matching multiplies risk for zero immediate benefit.
DTX 2.0 is designed for this β the Entity model docs explicitly state: "When Non-Docutrax end-users create new Entities, we will need a way to merge them with existing ones if they represent the same 'real world' thing." Merge is a planned future feature, not a migration concern.
Each vendor row becomes its own Entity + Record + Assignment:
DTX 1.0: vendor row (vendor_id=6254, customer_id=713)
β
DTX 2.0:
Entity (name="Partners Advantage Insurance Services", entity_type='record')
Record (FK to Entity, tax_id, wc_exempt, etc.)
Node (under Account 713's hierarchy tree)
Assignment (Entity β Node, with contact/address)
Same vendor appearing under 3 customers = 3 Entities, 3 Records, 3 Assignments under 3 different Account trees.
Once data is safely migrated and verified, run a dedup pass using:
| Match strategy | Confidence | Action |
|---|---|---|
Exact tax_id match |
High | Auto-merge candidates |
| Exact name + exact address | High | Auto-merge candidates |
| Fuzzy name match (>90% similarity) | Medium | Flag for manual review |
| Name + city + state match | Medium | Flag for manual review |
| Everything else | Low | Leave as separate entities |
This can be done incrementally, account by account, with no migration risk.
DTX 1.0 (vendor) |
DTX 2.0 (Entity) |
Notes |
|---|---|---|
name1 (company name) |
name |
Same pattern as customer β name1 is the company |
name |
description or metadata |
If different from name1, store as secondary name |
proper_ph_name |
proper_ch_name |
Direct |
active |
active |
Map: "yes" β True |
website |
website |
Direct |
create_time |
created |
Direct |
update_time |
modified |
Direct |
| DTX 1.0 | DTX 2.0 (Record) |
Notes |
|---|---|---|
tax_id |
tax_id |
Direct |
internal_id |
internal_id |
Direct |
internal_id1 |
internal_id1 |
Direct |
allow_email_adds |
allow_email_adds |
Map: "yes" β True |
invalid_email |
invalid_email |
Map: "yes" β True |
WCExempt |
wc_exempt |
Map: "yes" β True / "no" β False |
comments |
comments |
Direct |
proper_ph_name |
proper_ph_name |
Direct |
requirements_id |
coi_risk_profile |
Requires Requirements migration (Area 6) β store as ref |
document_set_id |
document_risk_profile |
Requires Requirements migration (Area 6) β store as ref |
country |
Entity's Address.country | Via Address model |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
contact_name |
Contact.name |
Create Contact, link via EntityContact |
contact_title |
Contact.title |
|
phone |
Contact.phone |
|
phone_extension |
Contact.phone_ext |
|
fax |
Contact.fax |
|
mobile_phone |
Contact.mobile_phone |
|
email |
ContactEmail.email |
PHP serialized β parse (can contain multiple) |
| DTX 1.0 | DTX 2.0 (Address) |
Notes |
|---|---|---|
address_1 |
street |
|
address_2 |
street2 |
|
city |
city |
|
state |
state_or_province |
|
zip |
postal_or_zip_code |
|
country |
country |
DTX 1.0 stores 12+ compliance status/date columns directly on the vendor row:
| DTX 1.0 | Category | Notes |
|---|---|---|
vendor_status |
COI overall status | COMPLIANT, NON-COMPLIANT, ALERT, etc. |
vendor_expiration |
COI expiration date | |
vendor_request |
COI request status | |
vendor_request_date |
COI request date | |
vendor_expired |
COI expired status | ALERT, SECOND, etc. |
vendor_expired_date |
COI expired date | |
vendor_compliance |
COI compliance status | |
vendor_compliance_date |
COI compliance date | |
HolderLegacyDeficiency |
Holder check status | |
OpsDescLegacyDeficiency |
Ops description check status | |
document_status |
Document overall status | COMPLIANT, OVERRIDE, etc. |
document_expiration |
Document expiration date | |
document_request |
Document request status | |
document_request_date |
Document request date | |
document_expired |
Document expired status | |
document_expired_date |
Document expired date | |
document_compliance |
Document compliance status | |
document_compliance_date |
Document compliance date |
Decision: Store as metadata or compute from source data.
DTX 2.0 computes compliance status dynamically from certificates and documents rather than storing it as static fields on the vendor. Options:
| Approach | Details |
|---|---|
| A. Don't migrate status fields | DTX 2.0 recalculates compliance from actual certificates/documents after they're migrated. Most accurate. |
| B. Store as initial snapshot | Import as Record metadata (JSON field) for historical reference. DTX 2.0 recalculates going forward. |
Resolution: Option B β store as a JSON snapshot on Record for reference, but DTX 2.0 will recalculate compliance status from actual certificate/document data. This preserves history while ensuring 2.0 compliance is based on real data.
| DTX 1.0 | Purpose | Decision |
|---|---|---|
ping_date |
Last notification sent date | Drop β DTX 2.0 tracks via NotificationQueue |
ping_to |
Notification recipients (PHP serialized) | Drop β DTX 2.0 uses NotificationSchedule |
ping_cc |
CC recipients (PHP serialized) | Drop |
ping_subject |
Last notification subject | Drop |
ping_message |
Last notification message | Drop |
ping_note |
Notification notes | Drop |
additional_language |
Extra notification language text | Store as ref if needed for notification templates |
custom_message_2 |
Custom notification message | Store as ref if needed |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
customer_id |
Assignment β Node (under Account's tree) | Via customerβaccount ID mapping |
requirements_id |
Record.coi_risk_profile | Store as ref. Area 4 creates RequirementsGroup from (customer_id, requirements_id, document_set_id). Area 6 backfills COIRiskProfile. |
requirements_name |
Reference only | |
document_set_id |
Record.document_risk_profile | Store as ref. Area 4 creates DocumentRiskProfile from document_set + RequirementsGroup. |
document_set_name |
Reference only | |
old_requirements_id |
Drop | Legacy reference |
old_document_set_id |
Drop | Legacy reference |
user_id |
Entity.user (creator) | Via user ID mapping |
| DTX 1.0 | Reason |
|---|---|
last_active |
Can be computed from activity |
type |
Unclear meaning β investigate with full data |
old_requirements_id |
Legacy reference |
old_document_set_id |
Legacy reference |
vendor_status β Per-field compliance checks| Field | Purpose |
|---|---|
vendor_id |
FK to vendor |
customer_id |
FK to customer |
status_code |
LIMIT, TERMS |
field_name |
e.g., UMBLimAgg, UMBLimEaOcc, AutoAddlInsd |
minimum |
Required minimum value |
actual |
Actual value found |
create_time |
When check was performed |
Decision: Store as reference data. DTX 2.0 recalculates compliance from actual certificates/documents. This table is a historical record of past compliance checks. Store as a JSON snapshot or reference table for audit purposes.
vendor_coi_cache β Cached COI compliance dataMassive PHP-serialized blobs containing:
- validation_data β all COI field values (policy numbers, limits, dates, carrier info)
- waived_policies β policies with waived requirements
- errors β compliance errors by status
- policy_set β which certificates satisfy which policy types
- compliant_policy_set β subset of compliant policies
Decision: Don't migrate. This is a computed cache derived from certificates and compliance rules. DTX 2.0 will rebuild this from migrated certificates + requirements. Migrating serialized PHP cache data into a different system's cache format provides no value.
vendor_display_cache β Cached display dataSimilar to vendor_coi_cache but formatted for UI rendering. Contains PHP-serialized display objects.
Decision: Don't migrate. Same reasoning β computed cache, will be regenerated.
vendor_audit β Change audit trailBEFORE/AFTER snapshots of vendor records.
Decision: Don't migrate. Historical audit data from old system. DTX 2.0 uses its own audit mechanisms. Keep CSV as archive.
vendor_create_time_audit β Creation timestamp auditSimple audit of vendor_id, user_id, create_time, change_time.
Decision: Don't migrate. The create_time is preserved in Entity.created / Record timestamps. Keep CSV as archive.
so_summary β Global aggregate statsSingle row with global counts: total=103,991 vendors, compliant=53,781, non_compliant=50,210, etc.
Decision: Don't migrate. This is a dashboard cache that DTX 2.0 will compute dynamically. Useful only for validating migration counts post-migration.
Useful for: Verification β after migration, compare DTX 2.0 record counts against these numbers.
so_property β Customer key-value metadata| Key | Example Value | DTX 2.0 Mapping |
|---|---|---|
Account Source |
"ProSight" |
Account.account_source |
Industry Sector |
"Construction" |
Account.industry_sector |
Account Manager |
"Construction Div" |
Account.account_manager |
Decision: Migrate directly. These map to existing fields on the DTX 2.0 Account model. This is actually part of the Customer/Account migration (Area 2) but listed here since it's in the so_property table.
Migration: For each so_property row, look up the Account via customer_id mapping and set the corresponding field.
The following vendor fields use PHP serialize() format:
| Field | Example | Action |
|---|---|---|
email |
a:1:{i:0;s:22:"dina@morrischamber.org";} |
Parse β ContactEmail records |
ping_to |
a:0:{} |
Drop β notification data |
ping_cc |
a:0:{} |
Drop β notification data |
Parser handles: simple arrays, nested arrays, empty arrays, and malformed strings (log + skip).
Production has ~103,991 vendor records (per so_summary). At 4-7 DTX 2.0 records per vendor, this means ~400,000-700,000 records to create. Migration must be:
- Batched β process in chunks of 1,000 to avoid memory issues
- Transaction-wrapped β each batch in a transaction for rollback safety
- Resumable β track progress so a failed run can resume from last successful batch
=== DTX 1.0 Vendor Migration - Dry Run Report ===
Total vendors: ???
Unique customer_ids: ???
Customer IDs not in mapping: ??? (CRITICAL β these vendors can't be placed)
Status distribution: COMPLIANT: ???, NON-COMPLIANT: ???, ALERT: ???, etc.
Active distribution: yes: ???, no: ???
PHP email parse results:
- email field: ??? success, ??? failed
Name analysis:
- name == name1: ???
- name != name1: ???
- name1 is NULL: ???
Requirements/doc set references:
- Unique requirements_ids: ???
- Unique document_set_ids: ???
Issues flagged: ???
Ready to migrate: ???
| Scenario | Behavior |
|---|---|
customer_id not in mapping |
CRITICAL β log to issues, skip vendor (can't place without account) |
| PHP email parse fails | Log, skip email, continue with other fields |
| Missing required fields (name) | Log, skip record |
| Duplicate vendor within same account | Allow (dedup is post-migration) |
| File | Purpose |
|---|---|
vendor_id_mapping.csv |
old_vendor_id β new_entity_id β new_record_id β new_node_id β new_assignment_id |
migration_issues.csv |
All flagged records with issue description |
migration_report.txt |
Summary statistics |
compliance_snapshot.json |
Historical compliance status data (from vendor status fields) |
1. Validate prerequisites:
- Customer β Account ID mapping must exist and be verified
- Account hierarchy (Nodes) must exist
2. Dry-run validation report
3. For each vendor (batched, 1000 at a time):
a. Look up Account via customer_id mapping
b. Create Entity (name=name1, entity_type='record')
c. Create Record (FK to Entity, vendor-specific fields)
d. Create Node under Account's hierarchy tree
e. Create Assignment (Entity β Node)
f. Create Address (if address fields populated)
g. Create Contact + ContactEmail (if contact fields, parse PHP email)
h. Create EntityContact (link Contact β Entity)
i. Store compliance status as JSON snapshot on Record
j. Record vendor_id β new IDs in mapping table
4. Migrate so_property β Account fields (Account Source, Industry Sector, Account Manager)
5. Verify:
- Count check against so_summary totals
- ID mapping completeness
- Spot-check vendor data against originals
| Depends On | What's Needed |
|---|---|
| Area 2: Customers/Accounts | Customer β Account ID mapping, Account hierarchy |
| Depended On By | For |
|---|---|
| Area 4: Certificates & Documents | Certificate/Document β vendor (via vendor_id mapping). Also: Area 4 creates RequirementsGroupMember to link each vendor's Assignment to its RequirementsGroup. |
| Area 5: Coverage | Coverage records β vendor |
| Area 6: Requirements | Requirements assignment β vendor (coi_risk_profile, document_risk_profile) |
| Area 7: Extraction/OCR | Extraction results β vendor's certificates |
| Area 8: Notifications | Notification state β vendor |
| # | Decision | Status | Approach |
|---|---|---|---|
| 1 | Vendor deduplication | Resolved | Don't deduplicate during migration. Migrate 1:1, merge later as separate project. |
| 2 | Compliance status fields | Resolved | Store as JSON snapshot on Record. DTX 2.0 recalculates from actual data. |
| 3 | vendor_coi_cache | Resolved | Don't migrate β computed cache, will be regenerated. |
| 4 | vendor_display_cache | Resolved | Don't migrate β computed cache, will be regenerated. |
| 5 | so_summary | Resolved | Don't migrate β use for post-migration count verification. |
| 6 | so_property | Resolved | Migrate to Account fields (account_source, industry_sector, account_manager). |
| 7 | vendor_audit | Resolved | Don't migrate β keep CSV archive. |
| 8 | vendor_create_time_audit | Resolved | Don't migrate β timestamps preserved in Entity/Record. |
| 9 | PHP serialized emails | Resolved | Parse with fallback + error logging (same as Area 2). |
| 10 | name vs name1 | Resolved | name1 β Entity.name, name β description/metadata, flag mismatches. |
| 11 | Notification/ping fields | Resolved | Drop β DTX 2.0 uses NotificationSchedule/Queue. |
| 12 | Requirements/doc set FKs | Deferred | Store as reference. Link after Area 6 (Requirements) migration. |
| 13 | Volume handling | Resolved | Batched (1000/batch), transaction-wrapped, resumable. |
After migration is verified, run vendor deduplication as a separate project:
| Phase | Action |
|---|---|
| 1. Analysis | Generate duplicate candidate report |
| 2. Auto-merge (high confidence) | Exact tax_id match, exact name + address match |
| 3. Review (medium confidence) | Fuzzy name match (>90%), name + city + state match |
| 4. Ignore (low confidence) | Leave as separate entities |
Merge operation: Keep one Entity, reassign all Assignments/Nodes to the surviving Entity, archive the duplicates.