Migrating certificates (COI) and documents from denormalized 69-field certificate table to structured models. Includes file migration from EFS to S3, and building 3-tier evaluation pipeline.
Migration of certificate and document data from DTX 1.0 to DTX 2.0. This is Phase 3 β depends on Accounts (Area 2) and Vendors/Records (Area 3), and feeds into Coverage/Insurance (Area 5) and Requirements (Area 6).
The DTX 1.0 certificate table is a massive denormalized table (69+ fields) containing the full ACORD certificate of insurance data β producer, insured, holder contacts, review workflow, and coverage references all in one row. DTX 2.0 splits this into a cleaner Certificate model with related coverage models and a 3-tier evaluation system.
Important: Full production data is not available. Migration scripts must be built defensively with dry-run mode, self-validation, and graceful error handling.
| Table | Purpose |
|---|---|
certificate |
Main COI record (69+ fields β producer, insured, holder, status, review) |
certificate_status |
Per-field compliance checks (COMPLIANT/status_code, field_name, minimum, actual) |
certificate_queue |
Processing queue with GUIDs |
document |
Non-COI documents (23 fields β library refs, review status, DocuSign templates) |
document_status |
VendorΓdocument status tracking (NOT_PROVIDED, etc.) |
document_library |
Document template definitions per customer |
document_requirements |
Junction table: document_set Γ document |
document_set |
Named document requirement sets per customer |
attachment |
File attachments linked to certificates or documents |
missing_documents |
Aggregate tracking of missing docs per vendor |
| Model | App | Purpose |
|---|---|---|
Certificate |
docutrax_profiles |
COI record (agent, insured, holder, carrier A-F, status, file) |
Document |
docutrax_profiles |
Uploaded document with file, status, content hash |
DocumentType |
docutrax_profiles |
Global document type definitions (unique name) |
DocRequirement |
docutrax_profiles |
Links RequirementsGroup to DocumentType (what's required) |
DocumentRequirementFulfillment |
docutrax_profiles |
Junction: Document Γ DocRequirement (what's fulfilled) |
DocumentRiskProfile |
docutrax_profiles |
Named collection of required document types per account |
DocumentRiskProfileDocumentType |
docutrax_profiles |
Junction: profile Γ document type (with ordering) |
Evaluation |
docutrax_profiles |
Certificate evaluation result (pass/fail/override/pending) |
EvaluationCoverage |
docutrax_profiles |
Coverage-level evaluation result |
EvaluationDetail |
docutrax_profiles |
Individual requirement evaluation result |
RequirementsGroup |
docutrax_hierarchy |
Ties together COI profile + document profile + named entity |
AuditHistory |
docutrax_profiles |
JSON snapshot of evaluations for compliance trail |
Each DTX 1.0 certificate row becomes 2-5 DTX 2.0 records:
1. Certificate (core fields, agent/insured/holder, carrier info)
2. Evaluation (from certificate_status rows β pass/fail result)
3. EvaluationCoverage (one per coverage type found in certificate_status)
4. EvaluationDetail (one per field_name in certificate_status)
5. AuditHistory (optional β snapshot for compliance)
Each DTX 1.0 document row becomes 1-3 DTX 2.0 records:
1. Document (core fields, file, status)
2. DocumentRequirementFulfillment (links document to its DocRequirement)
3. AuditHistory (optional β snapshot for compliance)
DTX 1.0 (certificate) |
DTX 2.0 (Certificate) |
Notes |
|---|---|---|
CertNo |
cert_number |
Direct |
certificate_expiration |
expiration_date |
Direct |
OpsDesc |
ops_desc |
Direct |
AuthRepName |
auth_rep_name |
Direct |
AuthDate |
auth_date |
Direct |
create_time |
created (TimestampedModel) |
Direct |
update_time |
modified (TimestampedModel) |
Direct |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
ProdName |
agent_name |
Direct |
ProdTitle |
agent_title |
Direct |
ProdContact |
agent_contact |
Concatenate with phone/fax/email (see Decision #3) |
ProdAddr1 + ProdAddr2 + ProdCity + ProdState + ProdZip |
agent_address |
Concatenate into single text field |
ProdPhone, ProdPhoneExtension, ProdFax, ProdEmail |
agent_contact |
Pack into agent_contact (see Decision #3) |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
InsName |
insured_name |
Direct |
InsName1 |
insured_name |
Append to insured_name if different (see Decision #15) |
InsAddr1 |
insured_address1 |
Direct |
InsAddr2 |
insured_address2 |
Direct |
InsCity |
insured_city |
Direct |
InsState |
insured_state |
Direct |
InsZip |
insured_zip |
Direct |
InsContact |
insured_contact |
Concatenate with phone/fax/email |
InsTitle |
insured_title |
Direct |
InsPhone, InsPhoneExtension, InsFax, InsMobilePhone, InsEmail |
insured_contact |
Pack into insured_contact |
| DTX 1.0 | DTX 2.0 | Notes |
|---|---|---|
HolderName |
certificate_holder_name |
Direct |
HolderAddr1 |
certificate_holder_address1 |
Direct |
HolderAddr2 |
certificate_holder_address2 |
Direct |
HolderCity |
certificate_holder_city |
Direct |
HolderState |
certificate_holder_state |
Direct |
HolderZip |
certificate_holder_zip |
Direct |
HolderContact, HolderTitle, HolderPhone, etc. |
(no direct field) | See Decision #3 |
| DTX 1.0 | Maps To | Notes |
|---|---|---|
customer_id |
RequirementsGroup.account |
Via Account mapping from Area 2 |
vendor_id |
RequirementsGroup β Assignment |
Via Record mapping from Area 3 |
requirements_id |
RequirementsGroup.coi_risk_profile |
Via Area 6 migration (deferred) |
document_set_id |
RequirementsGroup.document_risk_profile |
Via this area's document_set migration |
replacement_id |
(dropped) | See Decision #10 |
broker_id |
(deferred) | Area 9 β store as reference |
attachment_id |
Certificate.file |
File migration to S3 (see Decision #5) |
reviewer_id |
(dropped) | Review workflow not migrated |
user_id |
OwnershipModel.created_by |
Via User mapping from Area 1 |
| DTX 1.0 | Reason |
|---|---|
jws_Certificate_ID |
Legacy integration ID β not relevant |
page_name, starting_page |
OCR-related β Area 7 |
ocr |
OCR flag β Area 7 |
CertRevNo |
Revision number β no equivalent in DTX 2.0, store in notes if needed |
review_status, review_time |
Review workflow β DTX 2.0 has different workflow |
OpsDescReviewStatus, OpsDescReason, OpsDescCorrectedBy |
Review sub-workflow β see Decision #11 |
HolderReviewStatus, HolderReason, HolderCorrectedBy |
Review sub-workflow β see Decision #11 |
override_reason |
Old override system β DTX 2.0 uses ManualOverride model |
AMBestRatingOK |
Migrates to CarrierRating in Area 5 (Coverage) |
WCExempt |
Migrates to WorkersCompensation in Area 5 |
proper_ph_name, proper_ch_name |
Entity-level β already mapped in Areas 2/3 |
Customer1, Customer2, Customer3 |
Unknown purpose β flag for review with full data |
final_pol_expiration |
Calculated from coverage expirations β DTX 2.0 recalculates |
certificate_status |
See Decision #2 (maps to Certificate.status) |
AuthRepAffirm |
Yes/no affirmation flag β no DTX 2.0 equivalent, drop |
DTX 1.0 (document) |
DTX 2.0 (Document) |
Notes |
|---|---|---|
name |
name |
Direct |
expiration_date |
expiry_date |
Direct |
comments |
supervisor_notes |
Direct |
create_time |
created (TimestampedModel) |
Direct |
update_time |
modified (TimestampedModel) |
Direct |
attachment_id |
file |
File migration to S3 |
document_status |
status |
See Decision #6 |
vendor_id |
uploaded_by |
Via Record mapping from Area 3 |
customer_id |
via RequirementsGroup.account |
Via Account mapping from Area 2 |
document_library_id |
via DocRequirement.document_type |
See Decision #7 |
reviewer_id |
(dropped) | Review workflow not migrated |
user_id |
OwnershipModel.created_by |
Via User mapping from Area 1 |
docusign_template_id |
(dropped) | See Decision #12 |
parent_document_id |
(dropped) | See Decision #9 |
source_coi, source_broker, source_insured |
(dropped) | Origin metadata β no equivalent |
send_sample |
(dropped) | Configuration flag β no equivalent |
next_review_level, review_status, review_status_date, review_period |
(dropped) | Old review workflow |
Risk Level: HIGH
DTX 2.0 introduces RequirementsGroup as a central organizing concept that ties together:
- COIRiskProfile (what insurance is required)
- DocumentRiskProfile (what documents are required)
- NamedEntity (who should be on certificates)
In DTX 1.0, these are separate FK references on different tables:
- certificate.requirements_id β requirements table (Area 6 β COIRiskProfile)
- certificate.document_set_id β document_set table (this area β DocumentRiskProfile)
- No explicit named entity concept
1. Build a lookup of all unique (customer_id, requirements_id, document_set_id)
combinations found in BOTH the certificate table AND the vendor table
(vendors may have requirements without certificates)
2. For each unique combination:
a. Look up the Account (from Area 2 mapping)
b. Look up the COIRiskProfile (from Area 6 mapping β may be NULL initially)
c. Look up the DocumentRiskProfile (from this area's document_set mapping)
d. Create RequirementsGroup with these links
3. Record mapping: (customer_id, requirements_id, document_set_id) β requirements_group_id
4. When migrating certificates:
- Look up the RequirementsGroup for each certificate's combination
- Set Certificate.requirements_group = found group
5. When migrating vendor Assignments:
- Create RequirementsGroupMember linking Assignment to RequirementsGroup
Note: RequirementsGroups that reference requirements_id will have coi_risk_profile=NULL until Area 6 migration fills them in. The migration script must support this phased linking.
Cross-area linkage: After creating RequirementsGroups, this step also creates RequirementsGroupMember records linking each vendor's Assignment (created in Area 3) to its RequirementsGroup. The lookup key is: vendor's (customer_id, requirements_id, document_set_id) β RequirementsGroup β RequirementsGroupMember(assignment=vendor's Assignment).
Risk Level: Medium
DTX 1.0 has two status fields:
- certificate_status (e.g., "FINISHED") β processing status
- review_status β review workflow status
DTX 2.0 has:
class CertificateStatus(TextChoices):
READY = 'ready' # Parsing succeeded
MANUAL_REVIEW = 'manual_review' # Parsing failed, needs manual entry
EVALUATED = 'evaluated' # Has been evaluated
DTX 1.0 certificate_status |
DTX 2.0 Certificate.status |
Notes |
|---|---|---|
FINISHED |
evaluated |
Certificate was processed and evaluated |
PENDING |
manual_review |
Awaiting review β closest match |
PROCESSING |
ready |
Being processed β treat as ready |
| Unknown values | manual_review |
Flag for review, safe default |
All migrated certificates are historical data, so evaluated is the most appropriate status for any certificate that was fully processed in DTX 1.0. Use manual_review as fallback.
Cannot fully validate without full dataset. The migration script will log the distribution of certificate_status values.
Risk Level: Low
DTX 1.0 has separate fields for each contact's phone, fax, email, extension. DTX 2.0 has fewer, wider text fields.
DTX 2.0 has only 4 agent fields (agent_name, agent_title, agent_contact, agent_address).
# agent_address = concatenated address
agent_address = f"{ProdAddr1}\n{ProdAddr2}\n{ProdCity}, {ProdState} {ProdZip}".strip()
# agent_contact = concatenated contact info
parts = []
if ProdContact: parts.append(f"Contact: {ProdContact}")
if ProdPhone:
phone = ProdPhone
if ProdPhoneExtension: phone += f" x{ProdPhoneExtension}"
parts.append(f"Phone: {phone}")
if ProdFax: parts.append(f"Fax: {ProdFax}")
if ProdEmail: parts.append(f"Email: {ProdEmail}")
agent_contact = " | ".join(parts)
DTX 2.0 has separate address fields (insured_address1, city, state, zip) β direct mapping. Phone/fax/email packed into insured_contact:
parts = []
if InsContact: parts.append(f"Contact: {InsContact}")
if InsPhone:
phone = InsPhone
if InsPhoneExtension: phone += f" x{InsPhoneExtension}"
parts.append(f"Phone: {phone}")
if InsFax: parts.append(f"Fax: {InsFax}")
if InsMobilePhone: parts.append(f"Mobile: {InsMobilePhone}")
if InsEmail: parts.append(f"Email: {InsEmail}")
insured_contact = " | ".join(parts)
DTX 2.0 has separate address fields β direct mapping. Holder contact info (phone, fax, email) has no direct target field. Store in Certificate.notes with a prefix:
holder_extras = []
if HolderContact: holder_extras.append(f"Contact: {HolderContact}")
if HolderTitle: holder_extras.append(f"Title: {HolderTitle}")
if HolderPhone: holder_extras.append(f"Phone: {HolderPhone}")
if HolderFax: holder_extras.append(f"Fax: {HolderFax}")
if HolderEmail: holder_extras.append(f"Email: {HolderEmail}")
if holder_extras:
notes = f"[Migrated Holder Contact] {' | '.join(holder_extras)}"
Risk Level: HIGH
DTX 1.0's certificate_status table stores per-field compliance checks:
certificate_id | status_code | field_name | minimum | actual
5215 | COMPLIANT | UMBPolExpDate | 2018-04-01 | 2019-04-04
5215 | COMPLIANT | WCPolExpDate | 2018-04-01 | 2019-06-28
5215 | COMPLIANT | AutoPolExpDate | 2018-04-01 | 2019-04-04
5215 | COMPLIANT | GLPolExpDate | 2018-04-01 | 2019-02-06
DTX 2.0 has a 3-tier evaluation structure:
Evaluation (overall: pass/fail/override/pending)
βββ EvaluationCoverage (per coverage type: GeneralLiability, Auto, etc.)
βββ EvaluationDetail (per requirement: limit amounts, dates, endorsements)
1. Group certificate_status rows by certificate_id
2. For each certificate's status rows:
a. Create Evaluation:
- certificate = migrated Certificate
- requirements_group = Certificate's RequirementsGroup
- status = 'pass' if ALL rows are COMPLIANT, else 'fail'
- evaluation_date = min(create_time) of status rows
b. Map field_name to coverage_type:
- GLPolExpDate β "GeneralLiability"
- AutoPolExpDate β "AutoLiability"
- WCPolExpDate β "WorkersCompensation"
- UMBPolExpDate β "UmbrellaLiability"
- Other field_names β log and map based on prefix
c. Create EvaluationCoverage per coverage_type:
- coverage_type = mapped type
- status = 'pass' if all rows for this type are COMPLIANT, else 'fail'
d. Create EvaluationDetail per field_name:
- evaluation_coverage = parent coverage
- requirement_name = field_name (human-readable)
- status = 'pass' if COMPLIANT, else 'fail'
- required_value = minimum
- provided_value = actual
- field_key = field_name (raw)
e. Set Certificate.current_evaluation = created Evaluation
| field_name prefix | coverage_type |
|---|---|
GL* |
GeneralLiability |
Auto* |
AutoLiability |
WC* |
WorkersCompensation |
UMB* |
UmbrellaLiability |
LL* |
LiquorLiability |
Prop* |
PropertyCoverage |
| Unknown | Log to issues, create as OtherCoverage |
Cannot fully validate mapping without full dataset. The migration script will log all unique field_name values and their frequency.
Risk Level: CRITICAL
This is the single largest operational task in the entire migration. 2.5 TB of files need to move from DTX 1.0's EFS to DTX 2.0's S3.
Docutrax EFS paths.docx and DTX 1.0 Data Sources and File Size & Count.xlsx)| Storage Area | EFS Path | Size | Path Pattern |
|---|---|---|---|
| Certificates | /efsdata/attachments/certificate_management/ |
657.5 GB | <certificate_id>/<filename> |
| Documents | /efsdata/attachments/document_management/ |
1.3 TB | <customer_id>/<document_id>/<filename> |
| File Repository | /efsdata/filerepo/ |
258 GB | <customer_id>/<repo_file_id>/<filename> |
| Logos | /efsdata/logos/ |
200 MB | <filename> |
Total: ~2.22 TB (excluding CloudWatch/SFTP)
Files may exist under any of these path prefixes depending on when they were created:
Generation 1: /home/wwwriskt/public_html/attachments/...
Generation 2: /home/wwwriskt/public_html/www1/attachments/...
Generation 3: /efsdata/attachments/...
The repo_file table has the actual repo_filepath which tells us which generation each file uses. For attachment records without a full path, construct from the EFS pattern.
repo_file.repo_filepath β actual file path (most reliable)
β
repo_file.attachment_id β attachment table
β
attachment.certificate_id OR attachment.document_id β target record
For files with a repo_file record, use repo_filepath directly.
For files with only an attachment record, construct the path:
- Certificate: /efsdata/attachments/certificate_management/<certificate_id>/<attachment.name>
- Document: /efsdata/attachments/document_management/<customer_id>/<document_id>/<attachment.name>
DTX 2.0 uses org_prefixed_upload_path() with SHA-256 content hashing and file_uid (compound unique ID: {content_hash}_{timestamp}).
Phase 1: Inventory (dry-run)
1. Scan all repo_file records β extract repo_filepath, verify file exists
2. Scan all attachment records β construct path, verify file exists
3. Report:
- Total files found vs expected
- Missing files (path exists in DB but file not on disk)
- Orphan files (file on disk but no DB record)
- Size distribution and total
- Duplicate files (same content_hash)
Phase 2: Upload (batched, resumable)
1. Process files in batches of 1000
2. For each file:
a. Read file from EFS
b. Compute SHA-256 content_hash
c. Generate file_uid: {content_hash}_{YYYYMMDDHHmmssffffff}
d. Upload to S3 using org_prefixed_upload_path
e. Record: old_path β S3_key β content_hash β file_uid
3. Checkpoint after each batch (save progress to resume on failure)
Phase 3: Link
1. For certificate files:
- Set Certificate.file = S3 key
- Set Certificate.original_filename = attachment.name
- Set Certificate.content_hash = computed hash
- Set Certificate.file_uid = generated UID
2. For document files:
- Set Document.file = S3 key
- Set Document.original_filename = attachment.name
- Set Document.content_hash = computed hash
- Set Document.file_uid = generated UID
3. For repo files (Area 11):
- Store S3 key in mapping for Area 11 migration
4. For logos (200 MB β `/efsdata/logos/`):
- Upload to S3, set Account.logo = S3 key (from Area 2 mapping)
- Note: Area 2 maps the field; this step does the actual S3 upload
| Concern | Plan |
|---|---|
| 2.2 TB transfer time | At 100 MB/s sustained: ~6 hours. At 25 MB/s: ~24 hours. Plan for a weekend migration window. |
| EFS β S3 same region | If both are in the same AWS region, use internal transfer (no egress cost, faster). |
| Resumability | Checkpoint after each batch. If script crashes at batch 500 of 1000, restart from batch 501. |
| Duplicate files | Same content_hash = same file. Upload once, reference multiple times. Could save significant space. |
| Missing files | Create DB record without file, flag in migration_issues.csv. Don't block the migration. |
| File integrity | After upload, compare S3 ETag (MD5) or re-download and verify content_hash matches. |
| Parallel uploads | Use boto3 with TransferConfig(max_concurrency=10) for parallel S3 uploads within each batch. |
| Cost | S3 Standard: ~$0.023/GB/month. 2.2 TB β $50/month. PUT requests: ~$0.005 per 1000 requests. |
repo_file table (cross-reference with Area 11)The repo_file table is in Area 11 (File Repository & Misc) but is central to file migration. Key fields:
| Field | Purpose |
|---|---|
repo_filepath |
Full path to file on EFS β most reliable source of truth |
treatment |
Processing type: COI_MANUAL, COI_OCR, DOC |
status |
SUBMITTED, DELETED, MANUAL_ENTRY, IN_REVIEW, etc. |
upload_src |
Upload source: DnD (drag-drop), Broker, Email |
certificate_id |
Link to certificate (if COI) |
document_id |
Link to document (if non-COI) |
attachment_id |
Link to attachment record |
Decision: File migration (the S3 upload) should be a shared utility used by both Area 4 and Area 11. Area 4 migrates certificate and document files. Area 11 migrates remaining repo files. Both use the same upload/hash/UID logic.
Dependency: Requires read access to DTX 1.0 EFS mount. If EFS is not mountable, an alternative is aws s3 sync from the EFS volume's EC2 instance, then process from a staging S3 bucket.
Risk Level: Medium
DTX 1.0 document.document_status appears to use numeric codes (sample shows 0). The document_status table uses string statuses like NOT_PROVIDED.
DTX 2.0 uses a detailed DocumentStatus enum with 13 values representing a document workflow.
| DTX 1.0 | DTX 2.0 DocumentStatus |
Notes |
|---|---|---|
0 (or equivalent "new") |
new |
Initial state |
NOT_PROVIDED |
new |
Vendor hasn't provided it yet |
| Reviewed/approved | submitted |
Final submission complete |
| Rejected | ocr_reject |
Rejection state |
| Unknown values | new |
Safe default, flag for review |
Cannot fully validate without full dataset. The migration script will log all unique document_status values (both from document.document_status and document_status.document_status) and their frequency.
The document_status table's per-vendor status needs mapping:
| DTX 1.0 | DTX 2.0 DocRequirementStatus |
Notes |
|---|---|---|
NOT_PROVIDED |
In Review |
Awaiting submission |
PROVIDED / APPROVED |
Pass |
Document accepted |
REJECTED |
Fail |
Document rejected |
OVERRIDE |
Override |
Manual override |
| Unknown | In Review |
Safe default |
Risk Level: Medium
DTX 1.0 document_library is per-customer (has customer_id). DTX 2.0 DocumentType is global (unique name across all accounts).
Challenge: Different customers may have document libraries with the same name but different purposes.
1. Extract all document_library entries
2. Deduplicate by name:
- If name is unique across customers β create one DocumentType
- If name is shared across customers with same meaning β create one DocumentType
- If name is shared with different meanings β append customer identifier
e.g., "W-9 Form" (shared) vs "Custom Policy Doc [Customer ABC]" (unique)
3. Record mapping: old document_library_id β new DocumentType.id
4. Use DocumentType to create DocRequirement entries
Dedup strategy: Start by assuming same name = same type (since document libraries represent generic document types like "W-9", "Certificate of Good Standing", etc.). Flag duplicates for post-migration review.
Risk Level: HIGH
DTX 1.0 uses:
- document_set β named requirement sets per customer (e.g., "Standard - Policy and Endorsement (w/ UMB) - Construction")
- document_requirements β junction linking sets to specific documents
DTX 2.0 uses:
- DocumentRiskProfile β named collection of required document types per account
- DocRequirement β links RequirementsGroup to DocumentType with a status
Structural difference: DTX 1.0 links sets to specific document instances. DTX 2.0 links profiles to document types (categories).
1. For each document_set:
a. Create DocumentRiskProfile:
- account = mapped Account (from customer_id)
- name = document_set.name
- description = document_set.description
b. Find all documents linked via document_requirements junction
c. For each linked document, get its document_library_id
d. Map document_library_id β DocumentType (from Decision #7)
e. Create DocumentRiskProfileDocumentType entries for each unique DocumentType
f. Record mapping: old document_set_id β new DocumentRiskProfile.id
2. For each RequirementsGroup (from Decision #1):
a. Link to the mapped DocumentRiskProfile
b. For each DocumentType in the profile:
- Create DocRequirement(requirements_group, document_type)
- Set status based on document_status table data for this vendor/customer
3. For each document linked to a set:
a. Find or create the DocRequirement for (RequirementsGroup, DocumentType)
b. Create DocumentRequirementFulfillment(document, doc_requirement)
Risk Level: Low
DTX 1.0 document has a parent_document_id self-reference for hierarchical grouping. DTX 2.0's Document model is flat β no parent reference.
Sample data observation: All 9 sample document records have parent_document_id=NULL. The full dataset may have children.
parent_document_id is set, store the parent reference in Document.supervisor_notes as metadata:
[Migrated] Parent document: {parent_document_name} (old_id: {parent_document_id})migration_issues.csv for post-migration review.Risk Level: Low
DTX 1.0 certificate has a replacement_id FK to another certificate, tracking when one certificate replaces another. DTX 2.0 has no explicit replacement chain β certificates belong to RequirementsGroups, and current_evaluation tracks the latest evaluation.
create_time) within a RequirementsGroup naturally becomes the current one.Certificate.current_evaluation only on the most recent evaluated certificate per RequirementsGroup.Risk Level: Low
DTX 1.0 has detailed review workflow tracking:
- review_status, review_time, reviewer_id β general review state
- OpsDescReviewStatus, OpsDescReason, OpsDescCorrectedBy β operations description review
- HolderReviewStatus, HolderReason, HolderCorrectedBy β holder review
- override_reason β override justification
DTX 2.0 has a completely different review workflow (DocumentStatus enum with 13 states, ManualOverride model, AuditHistory snapshots).
override_reason field: if the certificate was overridden, create a ManualOverride record:
python
if certificate.override_reason:
ManualOverride.objects.create(
requirements_group=cert.requirements_group,
coi_risk_profile=cert.requirements_group.coi_risk_profile,
coverage_type='General', # may need refinement
reason=certificate.override_reason,
approved_at=certificate.review_time,
)Risk Level: Low
DTX 1.0 document has a docusign_template_id field. DTX 2.0 has no DocuSign integration visible in the models.
docusign_template_id set to migration report for reference.Risk Level: Low
The missing_documents table is a denormalized reporting/cache table tracking missing document counts per vendor. DTX 2.0 calculates compliance status dynamically from DocRequirement statuses.
Risk Level: Low
The certificate_queue table is a processing queue with GUIDs for distributed processing. It represents work-in-progress, not historical data.
Risk Level: Low
DTX 1.0 certificate has both InsName (col 37) and InsName1 (col 38). On the ACORD form, InsName is typically the primary insured name and InsName1 is a DBA or second named insured line.
if InsName and InsName1 and InsName != InsName1:
insured_name = f"{InsName} / {InsName1}" # Combine both
elif InsName:
insured_name = InsName
elif InsName1:
insured_name = InsName1
else:
insured_name = None # Flag for review
Log all records where both are populated and different.
Risk Level: Medium
DTX 2.0 Certificate has carrier_a_name through carrier_f_naic (12 fields) for the ACORD form's INSURER A-F section. However, DTX 1.0's certificate table has NO carrier columns β carrier/insurer information lives in the coverage tables (general_liability, automobile_liability, workers_comp, etc.) which are Area 5.
Each DTX 2.0 coverage model (GeneralLiability, AutoLiability, etc.) has certificate = ForeignKey('Certificate'). During Area 5 migration, the coverage tables' data needs to link to Certificate records created here. Area 5 will consume the id_mapping_certificates.csv to establish these links.
This area has more structural transformation than any previous area. Areas 1-3 were mostly field-level mappings with occasional data parsing. Area 4 involves architectural changes β flat tables becoming hierarchical structures, instance-based references becoming type-based, and a new cross-cutting RequirementsGroup concept.
| Challenge | Difficulty | Why |
|---|---|---|
| RequirementsGroup creation | Hard | Cross-cutting concept that doesn't exist in DTX 1.0. Needs careful coordination with Areas 5 & 6. Phased population (coi_risk_profile filled later). |
| certificate_status β 3-tier Evaluation | Hard | Structural transformation from flat rows to 3-level hierarchy. field_name β coverage_type mapping is guesswork without full data. |
| File migration to S3 | Hard (operationally) | Depends on EFS access. Potentially 100K+ files / TB of data. S3 upload bandwidth, content hashing, UID generation at scale. Multi-day operation. |
| document_set β DocumentRiskProfile | Hard | Paradigm shift: DTX 1.0 links sets to document instances, DTX 2.0 links profiles to document types (categories). Requires inference through junction table. |
| Carrier A-F backfill | Medium | Deferred to Area 5 β adds cross-area coordination complexity |
| DocumentType deduplication | Medium | Per-customer β global. Same name may mean different things for different customers. |
| 14-step migration order | Medium | Most steps of any area. Ordering constraints between steps. |
| Status mappings | Easy | Straightforward value mapping, but need full data to validate |
| Contact consolidation | Easy | Formulaic concatenation |
| Drop decisions (5 tables) | Easy | Just don't migrate |
DTX 1.0 Data Sources and File Size & Count.xlsx)| Item | Estimate | Source |
|---|---|---|
| Certificate files | 657.5 GB | EFS /efsdata/attachments/certificate_management/ |
| Document files | 1.3 TB | EFS /efsdata/attachments/document_management/ |
| Repo files | 258 GB | EFS /efsdata/filerepo/ (Area 11, shared migration) |
| Logos | 200 MB | EFS /efsdata/logos/ (Area 2) |
| Total file migration | ~2.2 TB | |
| Certificates (records) | 50K-200K | Accumulate over years, multiple per vendor |
| Certificate status rows | 200K-800K | ~4 rows per certificate |
| Documents (records) | 10K-50K | Fewer than certificates |
| DocumentTypes | 50-200 | After deduplication |
| DocumentRiskProfiles | 500-2K | Per-customer document sets |
| RequirementsGroups | 5K-50K | Unique (customer, req, doc_set) combos |
| DTX 2.0 records created | 300K-1.5M | Certificates + evaluations + documents + fulfillments |
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| 2.2 TB file transfer takes too long | Medium | High | Same-region EFSβS3 transfer, parallel uploads, weekend window. At 100MB/s = ~6hrs. |
| Legacy path formats (3 generations) | High | Medium | Script handles all 3 prefixes. repo_file.repo_filepath is source of truth. |
| Missing files on EFS | Medium | High | Create DB records without files, flag in issues CSV. Don't block migration. |
| field_name values don't match prefix mapping | Medium | Medium | Dry-run logs all unique values, fallback to OtherCoverage |
| DocumentType name collisions | Low | Medium | Dedup by name, flag for post-migration review |
| RequirementsGroup combos explode | Low | Medium | Dry-run reports count before creating |
| Coverage β Certificate linking breaks | Low | High | ID mapping CSV ensures traceability |
| Duplicate files waste S3 space | Medium | Low | Dedup by content_hash β upload once, reference many. Could save 10-30% storage. |
Before any data is written, the script runs in dry-run mode and produces a validation report:
=== DTX 1.0 Certificates & Documents Migration - Dry Run Report ===
Certificates:
Total records: ???
Status distribution: FINISHED: ???, PENDING: ???, other: ???
With attachment: ???
Without attachment: ???
With replacement_id: ???
With requirements_id: ???
With document_set_id: ???
Certificate Status (compliance checks):
Total rows: ???
Unique certificates: ???
Unique field_names: [list all]
COMPLIANT count: ???
Non-COMPLIANT count: ???
Documents:
Total records: ???
Status distribution: 0: ???, other: ???
With attachment: ???
With parent_document_id: ???
With docusign_template_id: ???
Document Libraries:
Total entries: ???
Unique names: ???
Duplicate names: ???
Document Sets:
Total entries: ???
Requirements per set: min: ???, max: ???, avg: ???
Attachments:
Total files: ???
With certificate_id: ???
With document_id: ???
Total file size: ??? GB
File types: [distribution]
RequirementsGroups to create:
Unique (customer, requirements, doc_set) combos: ???
Issues flagged: ???
Ready to migrate: ???
| Scenario | Behavior |
|---|---|
| Attachment file not found in EFS | Create record without file, flag in issues CSV |
| Unknown certificate_status value | Default to manual_review, log |
| Unknown document_status value | Default to new, log |
| field_name doesn't map to coverage_type | Create as OtherCoverage, log |
| Duplicate DocumentType name | Merge (same name = same type), log |
| Orphan document_library_id | Create DocumentType, log as potentially orphaned |
| Missing customer_id in mapping | Skip record, log to issues |
| Missing vendor_id in mapping | Skip record, log to issues |
| certificate_status rows without matching certificate | Log to issues, skip |
| File | Purpose |
|---|---|
id_mapping_certificates.csv |
old_certificate_id β new_certificate_id β requirements_group_id β evaluation_id |
id_mapping_documents.csv |
old_document_id β new_document_id β doc_requirement_id |
id_mapping_doc_types.csv |
old_document_library_id β new_document_type_id |
id_mapping_doc_profiles.csv |
old_document_set_id β new_document_risk_profile_id |
id_mapping_req_groups.csv |
(customer_id, requirements_id, document_set_id) β new_requirements_group_id |
migration_issues.csv |
All flagged records with issue description |
migration_report.txt |
Summary statistics and validation results |
certificate_id and document_id used in other tables exists in mapping1. Validate and report (dry-run)
2. Create DocumentType records (from document_library β no dependencies)
3. Create DocumentRiskProfile + DocumentRiskProfileDocumentType records (from document_set)
4. Create RequirementsGroup records (from unique certificate combinations)
5. Migrate attachment files to S3 (parallel, batch)
6. Create Certificate records (with file references, agent/insured/holder mapping)
7. Create Evaluation + EvaluationCoverage + EvaluationDetail records (from certificate_status)
8. Set Certificate.current_evaluation on latest certificates
9. Create Document records (with file references)
10. Create DocRequirement records (from document_requirements + DocumentType mapping)
11. Create DocumentRequirementFulfillment records (link documents to requirements)
12. Create AuditHistory snapshots (optional β for compliance trail)
13. Generate and export all ID mapping tables
14. Verify
| Depends On | What's Needed |
|---|---|
| Area 2: Customers β Accounts | Account IDs for RequirementsGroup, DocumentRiskProfile |
| Area 3: Vendors β Records | Record IDs for Document.uploaded_by, Assignment links |
| Depended On By | For |
|---|---|
| Area 5: Coverage / Insurance | Coverage models link to Certificate and COIRiskProfile |
| Area 6: Requirements & Compliance | RequirementsGroup.coi_risk_profile (backfill) |
| Area 7: Extraction / OCR | OCR data linked to certificates |
| Area 8: Notifications | Notification triggers from certificate/document events |
| # | Decision | Status | Approach |
|---|---|---|---|
| 1 | RequirementsGroup creation | Resolved | Create from unique (customer, requirements, doc_set) combos; phase coi_risk_profile link |
| 2 | Certificate status mapping | Resolved | FINISHED β evaluated, PENDING β manual_review, unknown β manual_review |
| 3 | Contact field consolidation | Resolved | Concatenate phone/fax/email into text fields, holder extras β notes |
| 4 | certificate_status β Evaluation | Resolved | Group by certificate, map field_name prefix to coverage_type, 3-tier creation |
| 5 | Attachment β S3 file migration | Resolved | Upload to S3, compute content_hash/file_uid, handle missing files gracefully |
| 6 | Document status mapping | Resolved | Numeric β DocumentStatus enum, NOT_PROVIDED β new, unknown β new |
| 7 | document_library β DocumentType | Resolved | Deduplicate by name, assume same name = same type, flag duplicates |
| 8 | document_set β DocumentRiskProfile | Resolved | Direct map, infer document_types from junction, create DocRequirement per type |
| 9 | Document hierarchy | Resolved | Flatten, store parent reference in supervisor_notes, log relationships |
| 10 | Replacement certificates | Resolved | Don't migrate chain, latest by create_time gets current_evaluation |
| 11 | Review workflow fields | Resolved | Drop (historical), migrate override_reason β ManualOverride if set |
| 12 | DocuSign template references | Resolved | Drop, log for reference |
| 13 | missing_documents table | Resolved | Don't migrate β cache table, DTX 2.0 recalculates |
| 14 | certificate_queue table | Resolved | Don't migrate β transient processing queue |
| 15 | InsName vs InsName1 | Resolved | Combine with " / " separator if both populated and different |
| 16 | Carrier A-F backfill | Resolved | Leave NULL, backfill after Area 5 from coverage records |