📋 Migration Summary

Source Tables: 14
Target Models: 9
Decisions: 11

Migrating file repository, IMAP uploads, form fields, keywords, exemptions, billing data, and miscellaneous tables.

🎯 Key Actions

⚠️ Key Challenges

Area 11: File Repository & Misc

Overview

Migration of file repository and miscellaneous data from DTX 1.0 to DTX 2.0. This is Phase 5 — depends on Areas 2, 3 (Customers/Accounts, Vendors/Records).

This area covers 14 tables that span four functional domains: file repository management, billing/revenue tracking, custom field configuration, and client-specific integrations. Most tables in this area are either denormalized reporting snapshots, historical audit trails, or client-specific customizations with limited or no direct DTX 2.0 equivalents.

The most operationally significant item is the repo_file table — the file storage layer connecting DTX 1.0's on-disk EFS files to certificate and document records. Area 4 (Certificates & Documents) already defined the file migration strategy (Decision #5: "File Migration — EFS to S3"), including the 258 GB file repo partition. This area resolves the repo_file metadata and remaining file repo tables.

DTX 2.0 has fundamentally replaced several subsystems: billing is now Stripe-based (docutrax_billing app), activity tracking uses UserActivityLog, and exemptions/overrides use ManualOverride + WaiverRequest + is_exempted flags.

Source Tables (14)

DTX 1.0 Table Rows (sample) Type DTX 2.0 Target
repo_file 100 Transactional Document / Certificate (file fields) — see Area 4
repo_file_audit 100 Audit trail — (historical)
repo_throughput 100 Reporting snapshot — (computed)
imap_upload 100 Transactional No target model
form_field 100 Configuration No target model (partial: AccountMeta, Record custom fields)
keyword 100 Configuration No target model
exemption 100 Transactional ManualOverride / DocRequirement.is_exempted
reviewer_address 100 Configuration AccountEmail (email_type='reviewer') — see Area 2
billing_data 100 Reporting snapshot — (replaced by Stripe billing)
inactivated_billing_data 100 Reporting snapshot — (replaced by Stripe billing)
lost_revenue 100 Reporting snapshot — (replaced by Stripe billing)
prepay_so_list 100 Flag list — (replaced by BillingProfile.payment_method)
work_log 100 Activity tracking — (replaced by UserActivityLog)
vornado_feed 100 Client-specific feed No target model

Relevant Target Models

DTX 2.0 Model App Purpose
Document docutrax_profiles File storage via S3 FileField
Certificate docutrax_profiles File storage via S3 FileField
ManualOverride docutrax_profiles Persistent coverage exemptions/overrides
WaiverRequest docutrax_profiles Waiver request workflow
DocRequirement docutrax_profiles Document requirements with is_exempted flag
AccountEmail docutrax_hierarchy Account-level email addresses
AccountMeta docutrax_hierarchy Key-value metadata per account
BillingProfile docutrax_billing Stripe-based billing config
UserActivityLog docutrax_utils HTTP request-level activity tracking

Dependencies

Depends On What's Needed
Area 2: Customers/Accounts repo_file.customer_id → Account mapping; AccountEmail for reviewer_address
Area 3: Vendors/Records repo_file.vendor_id, exemption.vendor_id → Record mapping
Area 4: Certificates/Documents repo_file.certificate_id/document_id → Certificate/Document mapping; shared file migration utility
Area 6: Requirements/Compliance exemption.requirements_id → RequirementsGroup mapping
Depended On By What It Provides
Area 10: Reporting Repo metrics, billing data for reports

Decisions

Decision #1: repo_file → Shared file migration utility (Area 4 coordination)

The repo_file table (27 columns) is the file storage layer connecting on-disk files to certificate and document records. Area 4 already defined the complete file migration strategy (Decision #5: "File Migration — EFS to S3"), including:

Area 11's responsibility: Migrate remaining repo_file records that are NOT already linked to certificates or documents via Area 4's migration. These are "orphaned" or "unlinked" repo files — files uploaded but never processed into a certificate or document record.

Column mapping (for repo_file metadata not covered by Area 4):

DTX 1.0 DTX 2.0 Action
repo_file_id Used as mapping key; not stored in 2.0
customer_id via Account mapping For linking unlinked files
name original_filename On Document or Certificate
treatment Drop — COI_MANUAL/COI_OCR/DOC processing type not in 2.0
vendor_id uploaded_by Via Record mapping
status Drop — SUBMITTED/DELETED/MANUAL_ENTRY; 2.0 has own status workflow
old_status Drop
certificate_id Certificate FK Via Area 4 mapping (already migrated)
document_id Document FK Via Area 4 mapping (already migrated)
attachment_id Drop — intermediate link table, resolved by Area 4
upload_src Drop — DnD/Broker/Email upload source not tracked in 2.0
upload_email Drop (100% NULL in sample)
upload_user_id created_by Via User mapping
upload_ip Drop (100% NULL in sample)
intake_date created Timestamp
repo_filepath Used for file migration — source of truth for locating the file on EFS
ocr_event_id Drop — Area 7 resolved (no OcrEvent model)
ocr_qa Drop — PASS/FAIL QA flag not in 2.0
display Drop (always 'no')
supervisor Drop (always 'no' or NULL)
autoextractable Drop (100% NULL)
type Drop — MIME type not stored on Document/Certificate
size Drop — file size not stored (available from S3 metadata)
evt_userid, evt_sessionid Drop (100% NULL)
client_id Drop — denormalized FK

Approach: 1. During Area 4 file migration, mark which repo_file_id values have been processed. 2. After Area 4 completes, identify remaining repo_file records (those with no certificate_id AND no document_id, or with status=DELETED). 3. For deleted files: do not migrate (they were intentionally removed). 4. For unlinked but non-deleted files: upload to S3, create a Document record with status='new' and flag for manual review.


Decision #2: repo_file_audit → Don't migrate

The repo_file_audit table (29 columns) records BEFORE/AFTER snapshots of repo_file state changes. It is a historical audit trail for file processing operations in DTX 1.0.

Don't migrate. DTX 2.0 tracks document/certificate changes through AuditHistory (evaluation snapshots) and UserActivityLog (HTTP request tracking). The old file-level audit trail has no operational value in the new system.

Keep CSV as archive.


Decision #3: repo_throughput → Don't migrate

The repo_throughput table (19 columns) is a denormalized reporting snapshot — a single batch-generated snapshot of file processing metrics per service owner. Column names contain special characters (#, %, spaces). No primary key.

Don't migrate. This is a computed cache, not source data. DTX 2.0 will calculate throughput metrics dynamically from actual Document, Certificate, and Evaluation data.

Keep CSV as archive for validation comparison.


Decision #4: imap_upload → No target model

The imap_upload table (10 columns) tracks files received via email-based IMAP ingestion — vendors/brokers emailing certificates to a monitored inbox. Each row records the sender email, filename, filesize, and links to a repo_file_id.

DTX 2.0 does not have an IMAP email ingestion system. Documents are uploaded via the web portal (drag-and-drop or magic links).

The team must decide: 1. Archive and discard — Email ingestion is a deprecated workflow. Vendors now upload via the web portal. 2. Build IMAP ingestion in DTX 2.0 — If email-based upload is a desired feature, create an IMapUpload model and processing pipeline. The DTX 1.0 data provides a useful reference for the schema. 3. Migrate as reference data — Store imap_upload records as metadata on the corresponding Document records (via repo_file_id → Document mapping) to preserve upload provenance.


Decision #5: form_field → No target model (partial mapping)

The form_field table (17 columns) stores custom form field definitions per customer/vendor. In the sample, all rows are for form_name=vendor_view, model_name=Vendorhistory, property_name=custom_field_0, with per-vendor value entries.

DTX 2.0 handles custom fields differently: - Record.internal_id and Record.internal_id1 — two custom text fields per Record - AccountMeta — key-value metadata per Account (flexible but unstructured) - Coverage models have hardcoded fields (not dynamic form fields)

The team must decide: 1. Map custom_field_0 values to Record.internal_id — If the custom field data is still needed per vendor, migrate form_field.value where property_name='custom_field_0' to Record.internal_id. 2. Store in AccountMeta — Migrate customer-level form field configurations as AccountMeta key-value pairs. 3. Archive and discard — If the custom form field system is replaced by DTX 2.0's fixed schema.

Note: The sample only shows one property_name. The full dataset likely has custom_field_1 through custom_field_4 (matching the keyword table's CUSTOM_FIELD_1 through CUSTOM_FIELD_4 labels). The full scope is unclear from sample data alone.

Pre-migration action: Query full DTX 1.0 for distinct form_name, model_name, property_name combinations to understand the full scope of custom fields.


Decision #6: keyword → No target model

The keyword table (3 columns) is a UI label customization system. Customers can rename standard labels — e.g., TRACKED_ITEM_NAME → "Tenants", GROUP_NAME → "Centers", CUSTOM_FIELD_1 → "Lease #".

DTX 2.0 does not have a keyword/label customization model. The UI uses fixed labels.

The team must decide: 1. Store in AccountMeta — Migrate keyword mappings as AccountMeta entries (key=keyword, value=replacement) so the DTX 2.0 UI can eventually support label customization. 2. Store in Directory.settings — Use the JSONField on Directory for per-tenant label overrides. 3. Archive and discard — If DTX 2.0 will use fixed labels for all customers.

Data size: Small — likely hundreds of rows total (6 keywords × number of customers who customized).


Decision #7: exemption → ManualOverride + DocRequirement.is_exempted

The exemption table (11 columns) records vendor-level exemptions from specific coverage or document requirements — e.g., "Sole proprietor" (WC exempt), "WC NOT MANDATORY IN TX", "waived". Each row has a vendor_id, item_type (COI/DOCUMENT), item_name (coverage line or document type), expiry date, and reason.

Column mapping:

DTX 1.0 DTX 2.0 Action
exemption_id No direct PK mapping (new auto-increment IDs)
vendor_id ManualOverride.requirements_group Via Record → RequirementsGroupMember → RequirementsGroup
item_type = 'COI' ManualOverride Coverage-level exemption
item_type = 'DOCUMENT' DocRequirement.is_exempted = True Document requirement exemption
item_name ManualOverride.coverage_type Map values (see below)
ExemptExpDate ManualOverride.expires_at Direct; map sentinel dates to NULL
ExemptReason ManualOverride.reason Direct (trim trailing whitespace)
ExemptReviewStatus All 'COMPLIANT' in sample → implies approved status
create_user_id ManualOverride.approved_by Via User mapping
create_time ManualOverride.approved_at Direct
requirements_id ManualOverride.coi_risk_profile Via Area 6 requirements → COIRiskProfile mapping
src Drop — source reference (PH or certificate_id), not in 2.0

item_name → coverage_type mapping (COI exemptions):

DTX 1.0 item_name DTX 2.0 coverage_type
GL GeneralLiability
Auto AutoLiability
WC WorkersCompensation
UMB UmbrellaLiability
Professional Liability OtherCoverage (with coverage_subtype)
Boiler and Machinery PropertyCoverage (with coverage_subtype)
EPLI OtherCoverage (with coverage_subtype)
Business Personal Property PropertyCoverage (with coverage_subtype)
Business Income PropertyCoverage (with coverage_subtype)
Primary & Non-Contributory GeneralLiability (with item_key='primary_non_contributory')

Sentinel date handling: - 3000-01-01expires_at = NULL (permanent/never expires) - 2050-* dates → expires_at = NULL (permanent exemption) - Normal dates → direct mapping

For DOCUMENT exemptions (item_type='DOCUMENT'): - Find the matching DocRequirement for the vendor's RequirementsGroup + document type - Set DocRequirement.is_exempted = True - Store reason in a note or metadata field

Pre-migration action: Build complete item_name → coverage_type mapping from full DTX 1.0 data. The sample shows ~10 unique values; the full dataset may have more.


Decision #8: reviewer_address → Verify against Area 2 AccountEmail

The reviewer_address table (6 columns) stores reviewer email addresses per customer with PHP serialized email_string fields. Area 2 already mapped customer.reviewer_emailAccountEmail(email_type='reviewer').

Verify and supplement: Compare reviewer_address data against the AccountEmail records created by Area 2. - If reviewer_address emails are already captured by customer.reviewer_email, no additional migration needed. - If reviewer_address contains different or additional email addresses, create additional AccountEmail(email_type='reviewer') records. - Deserialize PHP serialized email_string to extract email arrays.

Data quality note: type column is 100% empty (not NULL, just blank strings). level is always 0. These columns are effectively dead.


Decision #9: Billing tables (4) → Don't migrate

Four tables relate to DTX 1.0's billing/revenue system:

Table Columns Structure Content
billing_data 21 Reporting snapshot, no PK Account-level billing metrics (active PHs, compliance %, etc.)
inactivated_billing_data 12 Reporting snapshot, no PK Inactivated vendor billing lifecycle
lost_revenue 9 Reporting snapshot, no PK Revenue tracking for churned vendors
prepay_so_list 1 Flag list Customer IDs on prepay billing plans

Don't migrate. DTX 2.0 has a completely different billing architecture: - Stripe integration: BillingProfile, Invoice, Payment, BillingEvent models in docutrax_billing app - Payment methods: stripe, wire, check, ach_direct, invoice, complimentary, sponsored - Plan tiers: starter, professional, enterprise - prepay_so_listBillingProfile.payment_method (no need for a separate flag table)

The DTX 1.0 billing tables are denormalized snapshots of a legacy billing system that bears no structural resemblance to DTX 2.0's Stripe-based architecture. Historical billing data would need to be archived separately if needed for accounting/audit purposes.

Keep CSVs as archive for financial reference.


Decision #10: work_log → Don't migrate

The work_log table (5 columns) tracks user sessions per customer: user_id, session_id (SHA-1 hash), customer_id, start_time, end_time. No primary key. ~50% of rows have NULL end_time.

Don't migrate. DTX 2.0 has UserActivityLog (docutrax_utils/models.py) which tracks HTTP requests with user, URL, IP, user agent, and payload. This is a fundamentally different (and more detailed) activity tracking approach.

The old session-per-customer logs are historical operational data with no value in the new system.

Keep CSV as archive.


Decision #11: vornado_feed → No target model

The vornado_feed table (41 columns) is a client-specific data import for Vornado Realty Trust — importing tenant/lease data from their ABSINSQRY export. It includes property codes, tenant IDs, lease dates, insurance requirement flags, and processing status.

DTX 2.0 does not have a vornado_feed model or any client-specific data import tables. Area 1 also noted Vornado-specific rights (IDs 1000, 1009) that were dropped as "per-client hacks."

The team must decide: 1. Archive and discard — This was a custom integration for one client. If Vornado is still a customer in DTX 2.0, a new import mechanism should be designed. 2. Build a generic data import framework — Create a reusable CSV/data import system in DTX 2.0 that could handle Vornado-style feeds as well as other client integrations. 3. Migrate as seed data — If Vornado's tenant data needs to be in DTX 2.0, extract the tenant → Record mappings (via PH_ID → vendor mapping) and create Records with the relevant metadata.

Data quality note: Sentinel dates (12/31/2050 for perpetual leases), PH_ID = -1 for rejected records, typos ("reokacnebt" for "replacement", "Noncontributary" for "Noncontributory"), and 100% NULL contact fields.


Known Issues

# Issue Impact Mitigation
1 PHP serialized data in reviewer_address.email_string Requires deserialization to extract email addresses. Use phpserialize Python package during migration.
2 Sentinel dates in exemption (3000-01-01, 2050-*) and vornado_feed (12/31/2050) Need normalization. Map to NULL (permanent/never expires) during migration.
3 7 tables lack primary keys: repo_throughput, billing_data, inactivated_billing_data, lost_revenue, prepay_so_list, work_log, keyword Cannot deduplicate or do incremental migration. Not migrated (Decisions #3, #9, #10) or use composite natural keys (keyword).
4 Column names with special characters (#, %, spaces, ', /, ?) in repo_throughput, billing_data, inactivated_billing_data, vornado_feed Would need normalization for any ORM usage. Not migrated (all are reporting snapshots or client-specific).
5 Exemption item_name mapping may be incomplete Sample shows ~10 coverage names; full dataset may have more. Pre-migration action: query full distinct item_name values.
6 form_field scope unknown from sample Sample only shows custom_field_0 for one customer. Full variety of form_name/model_name/property_name is unclear. Pre-migration action: query full DTX 1.0 for distinct combinations.
7 Orphaned repo_file records Files uploaded but never linked to a certificate or document (no certificate_id and no document_id). Decision #1: upload to S3 as Document with status='new', flag for review.
8 Exemption trailing whitespace in ExemptReason values Minor data quality. Trim during migration.
9 imap_upload duplicate uploads Same vendor sent same file multiple times (no deduplication). Only relevant if imap_upload is preserved (Decision #4).

Pre-Migration Actions

# Action Blocking?
1 Query full form_field scope: Get distinct form_name, model_name, property_name combinations from DTX 1.0 to understand the full custom field landscape. Yes — needed for Decision #5.
2 Query full exemption item_name values: Get distinct item_name values for both COI and DOCUMENT types to complete the coverage_type mapping. Yes — needed for Decision #7.
3 Cross-reference reviewer_address with customer.reviewer_email: Determine if reviewer_address contains data not already captured by Area 2's AccountEmail migration. Yes — needed for Decision #8.
4 Identify orphaned repo_file records: Count repo_file rows where certificate_id IS NULL AND document_id IS NULL AND status != 'DELETED' to gauge the scope of unlinked files. No — but affects Decision #1 scope.
5 Confirm exemption sentinel date values: Query full DTX 1.0 for distinct ExemptExpDate values to identify all sentinel patterns (3000-01-01, 2050-*, others?). No — but needed for accurate mapping.

Migration Order

Step 1: Coordinate with Area 4 file migration utility (repo_file files → S3)
Step 2: Migrate exemption → ManualOverride + DocRequirement.is_exempted
Step 3: Verify reviewer_address against Area 2 AccountEmail (supplement if needed)
Step 4: Resolve form_field and keyword per team decisions
Step 5: Archive billing tables, work_log, repo_file_audit, repo_throughput (no migration)
Step 6: Resolve imap_upload and vornado_feed per team decisions

Cross-Area References

Area Reference Status
Area 2: Customers/Accounts reviewer_address → AccountEmail(email_type='reviewer') Verify — Decision #8
Area 3: Vendors/Records exemption.vendor_id → Record mapping Resolved — use Area 3 mapping
Area 4: Certificates/Documents repo_file file migration utility (shared); repo_file.certificate_id/document_id Resolved — Area 4 Decision #5 defines shared utility
Area 6: Requirements/Compliance exemption.requirements_id → RequirementsGroup/COIRiskProfile Resolved — use Area 6 mapping
Area 7: Extraction/OCR repo_file.ocr_event_id (dropped) Resolved — column not migrated
Area 9: Brokers/Programs program.charge_id FK to billing (dropped); program.envelope_id FK Resolved — program table not migrated

Decision Summary

# Decision Status Approach
1 repo_file → Shared file migration utility (Area 4 coordination) Documented See decision details below
2 repo_file_audit → Don't migrate Documented See decision details below
3 repo_throughput → Don't migrate Documented See decision details below
4 imap_upload → No target model Documented See decision details below
5 form_field → No target model (partial mapping) Documented See decision details below
6 keyword → No target model Documented See decision details below
7 exemption → ManualOverride + DocRequirement.is_exempted Documented See decision details below
8 reviewer_address → Verify against Area 2 AccountEmail Documented See decision details below
9 Billing tables (4) → Don't migrate Documented See decision details below
10 work_log → Don't migrate Documented See decision details below
11 vornado_feed → No target model Documented See decision details below