πŸ“‹ Migration Summary

Source Tables: 10
Target Models: 8+
Decisions: 16

Migrating certificates (COI) and documents from denormalized 69-field certificate table to structured models. Includes file migration from EFS to S3, and building 3-tier evaluation pipeline.

🎯 Key Actions

⚠️ Key Challenges

DTX 1.0 β†’ DTX 2.0 Migration Plan: Certificates & Documents

Overview

Migration of certificate and document data from DTX 1.0 to DTX 2.0. This is Phase 3 β€” depends on Accounts (Area 2) and Vendors/Records (Area 3), and feeds into Coverage/Insurance (Area 5) and Requirements (Area 6).

The DTX 1.0 certificate table is a massive denormalized table (69+ fields) containing the full ACORD certificate of insurance data β€” producer, insured, holder contacts, review workflow, and coverage references all in one row. DTX 2.0 splits this into a cleaner Certificate model with related coverage models and a 3-tier evaluation system.

Important: Full production data is not available. Migration scripts must be built defensively with dry-run mode, self-validation, and graceful error handling.

DTX 1.0 Source Tables

Table Purpose
certificate Main COI record (69+ fields β€” producer, insured, holder, status, review)
certificate_status Per-field compliance checks (COMPLIANT/status_code, field_name, minimum, actual)
certificate_queue Processing queue with GUIDs
document Non-COI documents (23 fields β€” library refs, review status, DocuSign templates)
document_status VendorΓ—document status tracking (NOT_PROVIDED, etc.)
document_library Document template definitions per customer
document_requirements Junction table: document_set Γ— document
document_set Named document requirement sets per customer
attachment File attachments linked to certificates or documents
missing_documents Aggregate tracking of missing docs per vendor

DTX 2.0 Target Models

Model App Purpose
Certificate docutrax_profiles COI record (agent, insured, holder, carrier A-F, status, file)
Document docutrax_profiles Uploaded document with file, status, content hash
DocumentType docutrax_profiles Global document type definitions (unique name)
DocRequirement docutrax_profiles Links RequirementsGroup to DocumentType (what's required)
DocumentRequirementFulfillment docutrax_profiles Junction: Document Γ— DocRequirement (what's fulfilled)
DocumentRiskProfile docutrax_profiles Named collection of required document types per account
DocumentRiskProfileDocumentType docutrax_profiles Junction: profile Γ— document type (with ordering)
Evaluation docutrax_profiles Certificate evaluation result (pass/fail/override/pending)
EvaluationCoverage docutrax_profiles Coverage-level evaluation result
EvaluationDetail docutrax_profiles Individual requirement evaluation result
RequirementsGroup docutrax_hierarchy Ties together COI profile + document profile + named entity
AuditHistory docutrax_profiles JSON snapshot of evaluations for compliance trail

Records Created Per Certificate

Each DTX 1.0 certificate row becomes 2-5 DTX 2.0 records:

1. Certificate (core fields, agent/insured/holder, carrier info)
2. Evaluation (from certificate_status rows β†’ pass/fail result)
3. EvaluationCoverage (one per coverage type found in certificate_status)
4. EvaluationDetail (one per field_name in certificate_status)
5. AuditHistory (optional β€” snapshot for compliance)

Records Created Per Document

Each DTX 1.0 document row becomes 1-3 DTX 2.0 records:

1. Document (core fields, file, status)
2. DocumentRequirementFulfillment (links document to its DocRequirement)
3. AuditHistory (optional β€” snapshot for compliance)

Field Mapping

Certificate: Direct Matches

DTX 1.0 (certificate) DTX 2.0 (Certificate) Notes
CertNo cert_number Direct
certificate_expiration expiration_date Direct
OpsDesc ops_desc Direct
AuthRepName auth_rep_name Direct
AuthDate auth_date Direct
create_time created (TimestampedModel) Direct
update_time modified (TimestampedModel) Direct

Certificate: Producer/Agent Fields

DTX 1.0 DTX 2.0 Notes
ProdName agent_name Direct
ProdTitle agent_title Direct
ProdContact agent_contact Concatenate with phone/fax/email (see Decision #3)
ProdAddr1 + ProdAddr2 + ProdCity + ProdState + ProdZip agent_address Concatenate into single text field
ProdPhone, ProdPhoneExtension, ProdFax, ProdEmail agent_contact Pack into agent_contact (see Decision #3)

Certificate: Insured Fields

DTX 1.0 DTX 2.0 Notes
InsName insured_name Direct
InsName1 insured_name Append to insured_name if different (see Decision #15)
InsAddr1 insured_address1 Direct
InsAddr2 insured_address2 Direct
InsCity insured_city Direct
InsState insured_state Direct
InsZip insured_zip Direct
InsContact insured_contact Concatenate with phone/fax/email
InsTitle insured_title Direct
InsPhone, InsPhoneExtension, InsFax, InsMobilePhone, InsEmail insured_contact Pack into insured_contact

Certificate: Holder Fields

DTX 1.0 DTX 2.0 Notes
HolderName certificate_holder_name Direct
HolderAddr1 certificate_holder_address1 Direct
HolderAddr2 certificate_holder_address2 Direct
HolderCity certificate_holder_city Direct
HolderState certificate_holder_state Direct
HolderZip certificate_holder_zip Direct
HolderContact, HolderTitle, HolderPhone, etc. (no direct field) See Decision #3

Certificate: FK References

DTX 1.0 Maps To Notes
customer_id RequirementsGroup.account Via Account mapping from Area 2
vendor_id RequirementsGroup β†’ Assignment Via Record mapping from Area 3
requirements_id RequirementsGroup.coi_risk_profile Via Area 6 migration (deferred)
document_set_id RequirementsGroup.document_risk_profile Via this area's document_set migration
replacement_id (dropped) See Decision #10
broker_id (deferred) Area 9 β€” store as reference
attachment_id Certificate.file File migration to S3 (see Decision #5)
reviewer_id (dropped) Review workflow not migrated
user_id OwnershipModel.created_by Via User mapping from Area 1

Certificate: Fields to Drop

DTX 1.0 Reason
jws_Certificate_ID Legacy integration ID β€” not relevant
page_name, starting_page OCR-related β€” Area 7
ocr OCR flag β€” Area 7
CertRevNo Revision number β€” no equivalent in DTX 2.0, store in notes if needed
review_status, review_time Review workflow β€” DTX 2.0 has different workflow
OpsDescReviewStatus, OpsDescReason, OpsDescCorrectedBy Review sub-workflow β€” see Decision #11
HolderReviewStatus, HolderReason, HolderCorrectedBy Review sub-workflow β€” see Decision #11
override_reason Old override system β€” DTX 2.0 uses ManualOverride model
AMBestRatingOK Migrates to CarrierRating in Area 5 (Coverage)
WCExempt Migrates to WorkersCompensation in Area 5
proper_ph_name, proper_ch_name Entity-level β€” already mapped in Areas 2/3
Customer1, Customer2, Customer3 Unknown purpose β€” flag for review with full data
final_pol_expiration Calculated from coverage expirations β€” DTX 2.0 recalculates
certificate_status See Decision #2 (maps to Certificate.status)
AuthRepAffirm Yes/no affirmation flag β€” no DTX 2.0 equivalent, drop

Document: Field Mapping

DTX 1.0 (document) DTX 2.0 (Document) Notes
name name Direct
expiration_date expiry_date Direct
comments supervisor_notes Direct
create_time created (TimestampedModel) Direct
update_time modified (TimestampedModel) Direct
attachment_id file File migration to S3
document_status status See Decision #6
vendor_id uploaded_by Via Record mapping from Area 3
customer_id via RequirementsGroup.account Via Account mapping from Area 2
document_library_id via DocRequirement.document_type See Decision #7
reviewer_id (dropped) Review workflow not migrated
user_id OwnershipModel.created_by Via User mapping from Area 1
docusign_template_id (dropped) See Decision #12
parent_document_id (dropped) See Decision #9
source_coi, source_broker, source_insured (dropped) Origin metadata β€” no equivalent
send_sample (dropped) Configuration flag β€” no equivalent
next_review_level, review_status, review_status_date, review_period (dropped) Old review workflow

Decisions

Decision #1: RequirementsGroup Creation (Cross-Cutting)

Risk Level: HIGH

DTX 2.0 introduces RequirementsGroup as a central organizing concept that ties together: - COIRiskProfile (what insurance is required) - DocumentRiskProfile (what documents are required) - NamedEntity (who should be on certificates)

In DTX 1.0, these are separate FK references on different tables: - certificate.requirements_id β†’ requirements table (Area 6 β†’ COIRiskProfile) - certificate.document_set_id β†’ document_set table (this area β†’ DocumentRiskProfile) - No explicit named entity concept

Migration approach

1. Build a lookup of all unique (customer_id, requirements_id, document_set_id)
   combinations found in BOTH the certificate table AND the vendor table
   (vendors may have requirements without certificates)
2. For each unique combination:
   a. Look up the Account (from Area 2 mapping)
   b. Look up the COIRiskProfile (from Area 6 mapping β€” may be NULL initially)
   c. Look up the DocumentRiskProfile (from this area's document_set mapping)
   d. Create RequirementsGroup with these links
3. Record mapping: (customer_id, requirements_id, document_set_id) β†’ requirements_group_id
4. When migrating certificates:
   - Look up the RequirementsGroup for each certificate's combination
   - Set Certificate.requirements_group = found group
5. When migrating vendor Assignments:
   - Create RequirementsGroupMember linking Assignment to RequirementsGroup

Note: RequirementsGroups that reference requirements_id will have coi_risk_profile=NULL until Area 6 migration fills them in. The migration script must support this phased linking.

Cross-area linkage: After creating RequirementsGroups, this step also creates RequirementsGroupMember records linking each vendor's Assignment (created in Area 3) to its RequirementsGroup. The lookup key is: vendor's (customer_id, requirements_id, document_set_id) β†’ RequirementsGroup β†’ RequirementsGroupMember(assignment=vendor's Assignment).

Decision #2: Certificate Status Mapping

Risk Level: Medium

DTX 1.0 has two status fields: - certificate_status (e.g., "FINISHED") β€” processing status - review_status β€” review workflow status

DTX 2.0 has:

class CertificateStatus(TextChoices):
    READY = 'ready'            # Parsing succeeded
    MANUAL_REVIEW = 'manual_review'  # Parsing failed, needs manual entry
    EVALUATED = 'evaluated'    # Has been evaluated

Mapping

DTX 1.0 certificate_status DTX 2.0 Certificate.status Notes
FINISHED evaluated Certificate was processed and evaluated
PENDING manual_review Awaiting review β€” closest match
PROCESSING ready Being processed β€” treat as ready
Unknown values manual_review Flag for review, safe default

All migrated certificates are historical data, so evaluated is the most appropriate status for any certificate that was fully processed in DTX 1.0. Use manual_review as fallback.

Cannot fully validate without full dataset. The migration script will log the distribution of certificate_status values.

Decision #3: Contact Field Consolidation

Risk Level: Low

DTX 1.0 has separate fields for each contact's phone, fax, email, extension. DTX 2.0 has fewer, wider text fields.

Producer/Agent

DTX 2.0 has only 4 agent fields (agent_name, agent_title, agent_contact, agent_address).

# agent_address = concatenated address
agent_address = f"{ProdAddr1}\n{ProdAddr2}\n{ProdCity}, {ProdState} {ProdZip}".strip()

# agent_contact = concatenated contact info
parts = []
if ProdContact: parts.append(f"Contact: {ProdContact}")
if ProdPhone:
    phone = ProdPhone
    if ProdPhoneExtension: phone += f" x{ProdPhoneExtension}"
    parts.append(f"Phone: {phone}")
if ProdFax: parts.append(f"Fax: {ProdFax}")
if ProdEmail: parts.append(f"Email: {ProdEmail}")
agent_contact = " | ".join(parts)

Insured

DTX 2.0 has separate address fields (insured_address1, city, state, zip) β€” direct mapping. Phone/fax/email packed into insured_contact:

parts = []
if InsContact: parts.append(f"Contact: {InsContact}")
if InsPhone:
    phone = InsPhone
    if InsPhoneExtension: phone += f" x{InsPhoneExtension}"
    parts.append(f"Phone: {phone}")
if InsFax: parts.append(f"Fax: {InsFax}")
if InsMobilePhone: parts.append(f"Mobile: {InsMobilePhone}")
if InsEmail: parts.append(f"Email: {InsEmail}")
insured_contact = " | ".join(parts)

Holder

DTX 2.0 has separate address fields β€” direct mapping. Holder contact info (phone, fax, email) has no direct target field. Store in Certificate.notes with a prefix:

holder_extras = []
if HolderContact: holder_extras.append(f"Contact: {HolderContact}")
if HolderTitle: holder_extras.append(f"Title: {HolderTitle}")
if HolderPhone: holder_extras.append(f"Phone: {HolderPhone}")
if HolderFax: holder_extras.append(f"Fax: {HolderFax}")
if HolderEmail: holder_extras.append(f"Email: {HolderEmail}")
if holder_extras:
    notes = f"[Migrated Holder Contact] {' | '.join(holder_extras)}"

Decision #4: certificate_status β†’ Evaluation Pipeline

Risk Level: HIGH

DTX 1.0's certificate_status table stores per-field compliance checks:

certificate_id | status_code | field_name       | minimum    | actual
5215          | COMPLIANT   | UMBPolExpDate    | 2018-04-01 | 2019-04-04
5215          | COMPLIANT   | WCPolExpDate     | 2018-04-01 | 2019-06-28
5215          | COMPLIANT   | AutoPolExpDate   | 2018-04-01 | 2019-04-04
5215          | COMPLIANT   | GLPolExpDate     | 2018-04-01 | 2019-02-06

DTX 2.0 has a 3-tier evaluation structure:

Evaluation (overall: pass/fail/override/pending)
  └── EvaluationCoverage (per coverage type: GeneralLiability, Auto, etc.)
       └── EvaluationDetail (per requirement: limit amounts, dates, endorsements)

Migration approach

1. Group certificate_status rows by certificate_id
2. For each certificate's status rows:
   a. Create Evaluation:
      - certificate = migrated Certificate
      - requirements_group = Certificate's RequirementsGroup
      - status = 'pass' if ALL rows are COMPLIANT, else 'fail'
      - evaluation_date = min(create_time) of status rows
   b. Map field_name to coverage_type:
      - GLPolExpDate β†’ "GeneralLiability"
      - AutoPolExpDate β†’ "AutoLiability"
      - WCPolExpDate β†’ "WorkersCompensation"
      - UMBPolExpDate β†’ "UmbrellaLiability"
      - Other field_names β†’ log and map based on prefix
   c. Create EvaluationCoverage per coverage_type:
      - coverage_type = mapped type
      - status = 'pass' if all rows for this type are COMPLIANT, else 'fail'
   d. Create EvaluationDetail per field_name:
      - evaluation_coverage = parent coverage
      - requirement_name = field_name (human-readable)
      - status = 'pass' if COMPLIANT, else 'fail'
      - required_value = minimum
      - provided_value = actual
      - field_key = field_name (raw)
   e. Set Certificate.current_evaluation = created Evaluation

field_name β†’ coverage_type mapping

field_name prefix coverage_type
GL* GeneralLiability
Auto* AutoLiability
WC* WorkersCompensation
UMB* UmbrellaLiability
LL* LiquorLiability
Prop* PropertyCoverage
Unknown Log to issues, create as OtherCoverage

Cannot fully validate mapping without full dataset. The migration script will log all unique field_name values and their frequency.

Decision #5: File Migration β€” EFS to S3

Risk Level: CRITICAL

This is the single largest operational task in the entire migration. 2.5 TB of files need to move from DTX 1.0's EFS to DTX 2.0's S3.

Source: EFS Storage Architecture (from Docutrax EFS paths.docx and DTX 1.0 Data Sources and File Size & Count.xlsx)

Storage Area EFS Path Size Path Pattern
Certificates /efsdata/attachments/certificate_management/ 657.5 GB <certificate_id>/<filename>
Documents /efsdata/attachments/document_management/ 1.3 TB <customer_id>/<document_id>/<filename>
File Repository /efsdata/filerepo/ 258 GB <customer_id>/<repo_file_id>/<filename>
Logos /efsdata/logos/ 200 MB <filename>

Total: ~2.22 TB (excluding CloudWatch/SFTP)

Path Evolution (three formats β€” script must handle all three)

Files may exist under any of these path prefixes depending on when they were created:

Generation 1: /home/wwwriskt/public_html/attachments/...
Generation 2: /home/wwwriskt/public_html/www1/attachments/...
Generation 3: /efsdata/attachments/...

The repo_file table has the actual repo_filepath which tells us which generation each file uses. For attachment records without a full path, construct from the EFS pattern.

File Lookup Chain

repo_file.repo_filepath β†’ actual file path (most reliable)
     ↓
repo_file.attachment_id β†’ attachment table
     ↓
attachment.certificate_id OR attachment.document_id β†’ target record

For files with a repo_file record, use repo_filepath directly. For files with only an attachment record, construct the path: - Certificate: /efsdata/attachments/certificate_management/<certificate_id>/<attachment.name> - Document: /efsdata/attachments/document_management/<customer_id>/<document_id>/<attachment.name>

Target: DTX 2.0 S3 Storage

DTX 2.0 uses org_prefixed_upload_path() with SHA-256 content hashing and file_uid (compound unique ID: {content_hash}_{timestamp}).

Migration approach

Phase 1: Inventory (dry-run)
  1. Scan all repo_file records β€” extract repo_filepath, verify file exists
  2. Scan all attachment records β€” construct path, verify file exists
  3. Report:
     - Total files found vs expected
     - Missing files (path exists in DB but file not on disk)
     - Orphan files (file on disk but no DB record)
     - Size distribution and total
     - Duplicate files (same content_hash)

Phase 2: Upload (batched, resumable)
  1. Process files in batches of 1000
  2. For each file:
     a. Read file from EFS
     b. Compute SHA-256 content_hash
     c. Generate file_uid: {content_hash}_{YYYYMMDDHHmmssffffff}
     d. Upload to S3 using org_prefixed_upload_path
     e. Record: old_path β†’ S3_key β†’ content_hash β†’ file_uid
  3. Checkpoint after each batch (save progress to resume on failure)

Phase 3: Link
  1. For certificate files:
     - Set Certificate.file = S3 key
     - Set Certificate.original_filename = attachment.name
     - Set Certificate.content_hash = computed hash
     - Set Certificate.file_uid = generated UID
  2. For document files:
     - Set Document.file = S3 key
     - Set Document.original_filename = attachment.name
     - Set Document.content_hash = computed hash
     - Set Document.file_uid = generated UID
  3. For repo files (Area 11):
     - Store S3 key in mapping for Area 11 migration
  4. For logos (200 MB β€” `/efsdata/logos/`):
     - Upload to S3, set Account.logo = S3 key (from Area 2 mapping)
     - Note: Area 2 maps the field; this step does the actual S3 upload

Operational considerations

Concern Plan
2.2 TB transfer time At 100 MB/s sustained: ~6 hours. At 25 MB/s: ~24 hours. Plan for a weekend migration window.
EFS β†’ S3 same region If both are in the same AWS region, use internal transfer (no egress cost, faster).
Resumability Checkpoint after each batch. If script crashes at batch 500 of 1000, restart from batch 501.
Duplicate files Same content_hash = same file. Upload once, reference multiple times. Could save significant space.
Missing files Create DB record without file, flag in migration_issues.csv. Don't block the migration.
File integrity After upload, compare S3 ETag (MD5) or re-download and verify content_hash matches.
Parallel uploads Use boto3 with TransferConfig(max_concurrency=10) for parallel S3 uploads within each batch.
Cost S3 Standard: ~$0.023/GB/month. 2.2 TB β‰ˆ $50/month. PUT requests: ~$0.005 per 1000 requests.

The repo_file table (cross-reference with Area 11)

The repo_file table is in Area 11 (File Repository & Misc) but is central to file migration. Key fields:

Field Purpose
repo_filepath Full path to file on EFS β€” most reliable source of truth
treatment Processing type: COI_MANUAL, COI_OCR, DOC
status SUBMITTED, DELETED, MANUAL_ENTRY, IN_REVIEW, etc.
upload_src Upload source: DnD (drag-drop), Broker, Email
certificate_id Link to certificate (if COI)
document_id Link to document (if non-COI)
attachment_id Link to attachment record

Decision: File migration (the S3 upload) should be a shared utility used by both Area 4 and Area 11. Area 4 migrates certificate and document files. Area 11 migrates remaining repo files. Both use the same upload/hash/UID logic.

Dependency: Requires read access to DTX 1.0 EFS mount. If EFS is not mountable, an alternative is aws s3 sync from the EFS volume's EC2 instance, then process from a staging S3 bucket.

Decision #6: Document Status Mapping

Risk Level: Medium

DTX 1.0 document.document_status appears to use numeric codes (sample shows 0). The document_status table uses string statuses like NOT_PROVIDED.

DTX 2.0 uses a detailed DocumentStatus enum with 13 values representing a document workflow.

Mapping

DTX 1.0 DTX 2.0 DocumentStatus Notes
0 (or equivalent "new") new Initial state
NOT_PROVIDED new Vendor hasn't provided it yet
Reviewed/approved submitted Final submission complete
Rejected ocr_reject Rejection state
Unknown values new Safe default, flag for review

Cannot fully validate without full dataset. The migration script will log all unique document_status values (both from document.document_status and document_status.document_status) and their frequency.

DocRequirementStatus mapping

The document_status table's per-vendor status needs mapping:

DTX 1.0 DTX 2.0 DocRequirementStatus Notes
NOT_PROVIDED In Review Awaiting submission
PROVIDED / APPROVED Pass Document accepted
REJECTED Fail Document rejected
OVERRIDE Override Manual override
Unknown In Review Safe default

Decision #7: document_library β†’ DocumentType

Risk Level: Medium

DTX 1.0 document_library is per-customer (has customer_id). DTX 2.0 DocumentType is global (unique name across all accounts).

Challenge: Different customers may have document libraries with the same name but different purposes.

Migration approach

1. Extract all document_library entries
2. Deduplicate by name:
   - If name is unique across customers β†’ create one DocumentType
   - If name is shared across customers with same meaning β†’ create one DocumentType
   - If name is shared with different meanings β†’ append customer identifier
     e.g., "W-9 Form" (shared) vs "Custom Policy Doc [Customer ABC]" (unique)
3. Record mapping: old document_library_id β†’ new DocumentType.id
4. Use DocumentType to create DocRequirement entries

Dedup strategy: Start by assuming same name = same type (since document libraries represent generic document types like "W-9", "Certificate of Good Standing", etc.). Flag duplicates for post-migration review.

Decision #8: document_set β†’ DocumentRiskProfile + DocRequirement

Risk Level: HIGH

DTX 1.0 uses: - document_set β€” named requirement sets per customer (e.g., "Standard - Policy and Endorsement (w/ UMB) - Construction") - document_requirements β€” junction linking sets to specific documents

DTX 2.0 uses: - DocumentRiskProfile β€” named collection of required document types per account - DocRequirement β€” links RequirementsGroup to DocumentType with a status

Structural difference: DTX 1.0 links sets to specific document instances. DTX 2.0 links profiles to document types (categories).

Migration approach

1. For each document_set:
   a. Create DocumentRiskProfile:
      - account = mapped Account (from customer_id)
      - name = document_set.name
      - description = document_set.description
   b. Find all documents linked via document_requirements junction
   c. For each linked document, get its document_library_id
   d. Map document_library_id β†’ DocumentType (from Decision #7)
   e. Create DocumentRiskProfileDocumentType entries for each unique DocumentType
   f. Record mapping: old document_set_id β†’ new DocumentRiskProfile.id
2. For each RequirementsGroup (from Decision #1):
   a. Link to the mapped DocumentRiskProfile
   b. For each DocumentType in the profile:
      - Create DocRequirement(requirements_group, document_type)
      - Set status based on document_status table data for this vendor/customer
3. For each document linked to a set:
   a. Find or create the DocRequirement for (RequirementsGroup, DocumentType)
   b. Create DocumentRequirementFulfillment(document, doc_requirement)

Decision #9: Document Hierarchy (parent_document_id)

Risk Level: Low

DTX 1.0 document has a parent_document_id self-reference for hierarchical grouping. DTX 2.0's Document model is flat β€” no parent reference.

Sample data observation: All 9 sample document records have parent_document_id=NULL. The full dataset may have children.

Migration approach

Decision #10: Replacement Certificates (replacement_id)

Risk Level: Low

DTX 1.0 certificate has a replacement_id FK to another certificate, tracking when one certificate replaces another. DTX 2.0 has no explicit replacement chain β€” certificates belong to RequirementsGroups, and current_evaluation tracks the latest evaluation.

Migration approach

Decision #11: Review Workflow Fields

Risk Level: Low

DTX 1.0 has detailed review workflow tracking: - review_status, review_time, reviewer_id β€” general review state - OpsDescReviewStatus, OpsDescReason, OpsDescCorrectedBy β€” operations description review - HolderReviewStatus, HolderReason, HolderCorrectedBy β€” holder review - override_reason β€” override justification

DTX 2.0 has a completely different review workflow (DocumentStatus enum with 13 states, ManualOverride model, AuditHistory snapshots).

Migration approach

Decision #12: DocuSign Template References

Risk Level: Low

DTX 1.0 document has a docusign_template_id field. DTX 2.0 has no DocuSign integration visible in the models.

Migration approach

Decision #13: missing_documents Table

Risk Level: Low

The missing_documents table is a denormalized reporting/cache table tracking missing document counts per vendor. DTX 2.0 calculates compliance status dynamically from DocRequirement statuses.

Migration approach

Decision #14: certificate_queue Table

Risk Level: Low

The certificate_queue table is a processing queue with GUIDs for distributed processing. It represents work-in-progress, not historical data.

Migration approach

Decision #15: InsName vs InsName1

Risk Level: Low

DTX 1.0 certificate has both InsName (col 37) and InsName1 (col 38). On the ACORD form, InsName is typically the primary insured name and InsName1 is a DBA or second named insured line.

Migration approach

if InsName and InsName1 and InsName != InsName1:
    insured_name = f"{InsName} / {InsName1}"  # Combine both
elif InsName:
    insured_name = InsName
elif InsName1:
    insured_name = InsName1
else:
    insured_name = None  # Flag for review

Log all records where both are populated and different.

Decision #16: Carrier A-F Backfill (Area 5 Dependency)

Risk Level: Medium

DTX 2.0 Certificate has carrier_a_name through carrier_f_naic (12 fields) for the ACORD form's INSURER A-F section. However, DTX 1.0's certificate table has NO carrier columns β€” carrier/insurer information lives in the coverage tables (general_liability, automobile_liability, workers_comp, etc.) which are Area 5.

Migration approach

Coverage model β†’ Certificate linkage

Each DTX 2.0 coverage model (GeneralLiability, AutoLiability, etc.) has certificate = ForeignKey('Certificate'). During Area 5 migration, the coverage tables' data needs to link to Certificate records created here. Area 5 will consume the id_mapping_certificates.csv to establish these links.


Difficulty Assessment

Overall: HARD (hardest area so far)

This area has more structural transformation than any previous area. Areas 1-3 were mostly field-level mappings with occasional data parsing. Area 4 involves architectural changes β€” flat tables becoming hierarchical structures, instance-based references becoming type-based, and a new cross-cutting RequirementsGroup concept.

What makes it hard

Challenge Difficulty Why
RequirementsGroup creation Hard Cross-cutting concept that doesn't exist in DTX 1.0. Needs careful coordination with Areas 5 & 6. Phased population (coi_risk_profile filled later).
certificate_status β†’ 3-tier Evaluation Hard Structural transformation from flat rows to 3-level hierarchy. field_name β†’ coverage_type mapping is guesswork without full data.
File migration to S3 Hard (operationally) Depends on EFS access. Potentially 100K+ files / TB of data. S3 upload bandwidth, content hashing, UID generation at scale. Multi-day operation.
document_set β†’ DocumentRiskProfile Hard Paradigm shift: DTX 1.0 links sets to document instances, DTX 2.0 links profiles to document types (categories). Requires inference through junction table.
Carrier A-F backfill Medium Deferred to Area 5 β€” adds cross-area coordination complexity
DocumentType deduplication Medium Per-customer β†’ global. Same name may mean different things for different customers.
14-step migration order Medium Most steps of any area. Ordering constraints between steps.
Status mappings Easy Straightforward value mapping, but need full data to validate
Contact consolidation Easy Formulaic concatenation
Drop decisions (5 tables) Easy Just don't migrate

What makes it doable

  1. DTX 2.0 models are well-designed β€” the target schema is clean and consistent
  2. Most certificate fields map directly β€” agent, insured, holder fields have clear targets
  3. The 5 "don't migrate" decisions reduce scope significantly (certificate_queue, missing_documents, review workflow, DocuSign, replacement chains)
  4. Defensive strategy handles unknowns β€” dry-run mode catches problems before they become data corruption
  5. Phased linking works β€” NULL FKs during migration, backfill later, is a proven pattern

Estimated volume (actual storage sizes from DTX 1.0 Data Sources and File Size & Count.xlsx)

Item Estimate Source
Certificate files 657.5 GB EFS /efsdata/attachments/certificate_management/
Document files 1.3 TB EFS /efsdata/attachments/document_management/
Repo files 258 GB EFS /efsdata/filerepo/ (Area 11, shared migration)
Logos 200 MB EFS /efsdata/logos/ (Area 2)
Total file migration ~2.2 TB
Certificates (records) 50K-200K Accumulate over years, multiple per vendor
Certificate status rows 200K-800K ~4 rows per certificate
Documents (records) 10K-50K Fewer than certificates
DocumentTypes 50-200 After deduplication
DocumentRiskProfiles 500-2K Per-customer document sets
RequirementsGroups 5K-50K Unique (customer, req, doc_set) combos
DTX 2.0 records created 300K-1.5M Certificates + evaluations + documents + fulfillments

Risk summary

Risk Likelihood Impact Mitigation
2.2 TB file transfer takes too long Medium High Same-region EFS→S3 transfer, parallel uploads, weekend window. At 100MB/s = ~6hrs.
Legacy path formats (3 generations) High Medium Script handles all 3 prefixes. repo_file.repo_filepath is source of truth.
Missing files on EFS Medium High Create DB records without files, flag in issues CSV. Don't block migration.
field_name values don't match prefix mapping Medium Medium Dry-run logs all unique values, fallback to OtherCoverage
DocumentType name collisions Low Medium Dedup by name, flag for post-migration review
RequirementsGroup combos explode Low Medium Dry-run reports count before creating
Coverage β†’ Certificate linking breaks Low High ID mapping CSV ensures traceability
Duplicate files waste S3 space Medium Low Dedup by content_hash β€” upload once, reference many. Could save 10-30% storage.

Defensive Migration Strategy

Dry-Run Mode

Before any data is written, the script runs in dry-run mode and produces a validation report:

=== DTX 1.0 Certificates & Documents Migration - Dry Run Report ===

Certificates:
  Total records:              ???
  Status distribution:        FINISHED: ???, PENDING: ???, other: ???
  With attachment:            ???
  Without attachment:         ???
  With replacement_id:        ???
  With requirements_id:       ???
  With document_set_id:       ???

Certificate Status (compliance checks):
  Total rows:                 ???
  Unique certificates:        ???
  Unique field_names:         [list all]
  COMPLIANT count:            ???
  Non-COMPLIANT count:        ???

Documents:
  Total records:              ???
  Status distribution:        0: ???, other: ???
  With attachment:            ???
  With parent_document_id:    ???
  With docusign_template_id:  ???

Document Libraries:
  Total entries:              ???
  Unique names:               ???
  Duplicate names:            ???

Document Sets:
  Total entries:              ???
  Requirements per set:       min: ???, max: ???, avg: ???

Attachments:
  Total files:                ???
  With certificate_id:        ???
  With document_id:           ???
  Total file size:            ??? GB
  File types:                 [distribution]

RequirementsGroups to create:
  Unique (customer, requirements, doc_set) combos: ???

Issues flagged:               ???
Ready to migrate:             ???

Error Handling

Scenario Behavior
Attachment file not found in EFS Create record without file, flag in issues CSV
Unknown certificate_status value Default to manual_review, log
Unknown document_status value Default to new, log
field_name doesn't map to coverage_type Create as OtherCoverage, log
Duplicate DocumentType name Merge (same name = same type), log
Orphan document_library_id Create DocumentType, log as potentially orphaned
Missing customer_id in mapping Skip record, log to issues
Missing vendor_id in mapping Skip record, log to issues
certificate_status rows without matching certificate Log to issues, skip

Output Files

File Purpose
id_mapping_certificates.csv old_certificate_id β†’ new_certificate_id β†’ requirements_group_id β†’ evaluation_id
id_mapping_documents.csv old_document_id β†’ new_document_id β†’ doc_requirement_id
id_mapping_doc_types.csv old_document_library_id β†’ new_document_type_id
id_mapping_doc_profiles.csv old_document_set_id β†’ new_document_risk_profile_id
id_mapping_req_groups.csv (customer_id, requirements_id, document_set_id) β†’ new_requirements_group_id
migration_issues.csv All flagged records with issue description
migration_report.txt Summary statistics and validation results

Verification Steps (post-migration)

  1. Count check: DTX 1.0 certificate count == DTX 2.0 Certificate count (minus skipped)
  2. Count check: DTX 1.0 document count == DTX 2.0 Document count (minus skipped)
  3. Evaluation check: every certificate with certificate_status rows has an Evaluation
  4. File check: every Certificate/Document with an attachment has a file in S3
  5. RequirementsGroup check: every Certificate has a RequirementsGroup
  6. DocRequirement check: every Document has a DocRequirement or is flagged as orphaned
  7. ID mapping completeness: every certificate_id and document_id used in other tables exists in mapping
  8. Spot-check: compare 10 random certificates' field values against originals

Migration Order (within this area)

1. Validate and report (dry-run)
2. Create DocumentType records (from document_library β€” no dependencies)
3. Create DocumentRiskProfile + DocumentRiskProfileDocumentType records (from document_set)
4. Create RequirementsGroup records (from unique certificate combinations)
5. Migrate attachment files to S3 (parallel, batch)
6. Create Certificate records (with file references, agent/insured/holder mapping)
7. Create Evaluation + EvaluationCoverage + EvaluationDetail records (from certificate_status)
8. Set Certificate.current_evaluation on latest certificates
9. Create Document records (with file references)
10. Create DocRequirement records (from document_requirements + DocumentType mapping)
11. Create DocumentRequirementFulfillment records (link documents to requirements)
12. Create AuditHistory snapshots (optional β€” for compliance trail)
13. Generate and export all ID mapping tables
14. Verify

Dependencies

Depends On What's Needed
Area 2: Customers β†’ Accounts Account IDs for RequirementsGroup, DocumentRiskProfile
Area 3: Vendors β†’ Records Record IDs for Document.uploaded_by, Assignment links
Depended On By For
Area 5: Coverage / Insurance Coverage models link to Certificate and COIRiskProfile
Area 6: Requirements & Compliance RequirementsGroup.coi_risk_profile (backfill)
Area 7: Extraction / OCR OCR data linked to certificates
Area 8: Notifications Notification triggers from certificate/document events

Decision Summary

# Decision Status Approach
1 RequirementsGroup creation Resolved Create from unique (customer, requirements, doc_set) combos; phase coi_risk_profile link
2 Certificate status mapping Resolved FINISHED β†’ evaluated, PENDING β†’ manual_review, unknown β†’ manual_review
3 Contact field consolidation Resolved Concatenate phone/fax/email into text fields, holder extras β†’ notes
4 certificate_status β†’ Evaluation Resolved Group by certificate, map field_name prefix to coverage_type, 3-tier creation
5 Attachment β†’ S3 file migration Resolved Upload to S3, compute content_hash/file_uid, handle missing files gracefully
6 Document status mapping Resolved Numeric β†’ DocumentStatus enum, NOT_PROVIDED β†’ new, unknown β†’ new
7 document_library β†’ DocumentType Resolved Deduplicate by name, assume same name = same type, flag duplicates
8 document_set β†’ DocumentRiskProfile Resolved Direct map, infer document_types from junction, create DocRequirement per type
9 Document hierarchy Resolved Flatten, store parent reference in supervisor_notes, log relationships
10 Replacement certificates Resolved Don't migrate chain, latest by create_time gets current_evaluation
11 Review workflow fields Resolved Drop (historical), migrate override_reason β†’ ManualOverride if set
12 DocuSign template references Resolved Drop, log for reference
13 missing_documents table Resolved Don't migrate β€” cache table, DTX 2.0 recalculates
14 certificate_queue table Resolved Don't migrate β€” transient processing queue
15 InsName vs InsName1 Resolved Combine with " / " separator if both populated and different
16 Carrier A-F backfill Resolved Leave NULL, backfill after Area 5 from coverage records