Migrating file repository, IMAP uploads, form fields, keywords, exemptions, billing data, and miscellaneous tables.
Migration of file repository and miscellaneous data from DTX 1.0 to DTX 2.0. This is Phase 5 — depends on Areas 2, 3 (Customers/Accounts, Vendors/Records).
This area covers 14 tables that span four functional domains: file repository management, billing/revenue tracking, custom field configuration, and client-specific integrations. Most tables in this area are either denormalized reporting snapshots, historical audit trails, or client-specific customizations with limited or no direct DTX 2.0 equivalents.
The most operationally significant item is the repo_file table — the file storage layer connecting DTX 1.0's on-disk EFS files to certificate and document records. Area 4 (Certificates & Documents) already defined the file migration strategy (Decision #5: "File Migration — EFS to S3"), including the 258 GB file repo partition. This area resolves the repo_file metadata and remaining file repo tables.
DTX 2.0 has fundamentally replaced several subsystems: billing is now Stripe-based (docutrax_billing app), activity tracking uses UserActivityLog, and exemptions/overrides use ManualOverride + WaiverRequest + is_exempted flags.
| DTX 1.0 Table | Rows (sample) | Type | DTX 2.0 Target |
|---|---|---|---|
repo_file |
100 | Transactional | Document / Certificate (file fields) — see Area 4 |
repo_file_audit |
100 | Audit trail | — (historical) |
repo_throughput |
100 | Reporting snapshot | — (computed) |
imap_upload |
100 | Transactional | No target model |
form_field |
100 | Configuration | No target model (partial: AccountMeta, Record custom fields) |
keyword |
100 | Configuration | No target model |
exemption |
100 | Transactional | ManualOverride / DocRequirement.is_exempted |
reviewer_address |
100 | Configuration | AccountEmail (email_type='reviewer') — see Area 2 |
billing_data |
100 | Reporting snapshot | — (replaced by Stripe billing) |
inactivated_billing_data |
100 | Reporting snapshot | — (replaced by Stripe billing) |
lost_revenue |
100 | Reporting snapshot | — (replaced by Stripe billing) |
prepay_so_list |
100 | Flag list | — (replaced by BillingProfile.payment_method) |
work_log |
100 | Activity tracking | — (replaced by UserActivityLog) |
vornado_feed |
100 | Client-specific feed | No target model |
| DTX 2.0 Model | App | Purpose |
|---|---|---|
Document |
docutrax_profiles |
File storage via S3 FileField |
Certificate |
docutrax_profiles |
File storage via S3 FileField |
ManualOverride |
docutrax_profiles |
Persistent coverage exemptions/overrides |
WaiverRequest |
docutrax_profiles |
Waiver request workflow |
DocRequirement |
docutrax_profiles |
Document requirements with is_exempted flag |
AccountEmail |
docutrax_hierarchy |
Account-level email addresses |
AccountMeta |
docutrax_hierarchy |
Key-value metadata per account |
BillingProfile |
docutrax_billing |
Stripe-based billing config |
UserActivityLog |
docutrax_utils |
HTTP request-level activity tracking |
| Depends On | What's Needed |
|---|---|
| Area 2: Customers/Accounts | repo_file.customer_id → Account mapping; AccountEmail for reviewer_address |
| Area 3: Vendors/Records | repo_file.vendor_id, exemption.vendor_id → Record mapping |
| Area 4: Certificates/Documents | repo_file.certificate_id/document_id → Certificate/Document mapping; shared file migration utility |
| Area 6: Requirements/Compliance | exemption.requirements_id → RequirementsGroup mapping |
| Depended On By | What It Provides |
|---|---|
| Area 10: Reporting | Repo metrics, billing data for reports |
The repo_file table (27 columns) is the file storage layer connecting on-disk files to certificate and document records. Area 4 already defined the complete file migration strategy (Decision #5: "File Migration — EFS to S3"), including:
/efsdata/filerepo/ (the repo_file partition)Area 11's responsibility: Migrate remaining repo_file records that are NOT already linked to certificates or documents via Area 4's migration. These are "orphaned" or "unlinked" repo files — files uploaded but never processed into a certificate or document record.
Column mapping (for repo_file metadata not covered by Area 4):
| DTX 1.0 | DTX 2.0 | Action |
|---|---|---|
repo_file_id |
— | Used as mapping key; not stored in 2.0 |
customer_id |
via Account mapping | For linking unlinked files |
name |
original_filename |
On Document or Certificate |
treatment |
— | Drop — COI_MANUAL/COI_OCR/DOC processing type not in 2.0 |
vendor_id |
uploaded_by |
Via Record mapping |
status |
— | Drop — SUBMITTED/DELETED/MANUAL_ENTRY; 2.0 has own status workflow |
old_status |
— | Drop |
certificate_id |
Certificate FK |
Via Area 4 mapping (already migrated) |
document_id |
Document FK |
Via Area 4 mapping (already migrated) |
attachment_id |
— | Drop — intermediate link table, resolved by Area 4 |
upload_src |
— | Drop — DnD/Broker/Email upload source not tracked in 2.0 |
upload_email |
— | Drop (100% NULL in sample) |
upload_user_id |
created_by |
Via User mapping |
upload_ip |
— | Drop (100% NULL in sample) |
intake_date |
created |
Timestamp |
repo_filepath |
— | Used for file migration — source of truth for locating the file on EFS |
ocr_event_id |
— | Drop — Area 7 resolved (no OcrEvent model) |
ocr_qa |
— | Drop — PASS/FAIL QA flag not in 2.0 |
display |
— | Drop (always 'no') |
supervisor |
— | Drop (always 'no' or NULL) |
autoextractable |
— | Drop (100% NULL) |
type |
— | Drop — MIME type not stored on Document/Certificate |
size |
— | Drop — file size not stored (available from S3 metadata) |
evt_userid, evt_sessionid |
— | Drop (100% NULL) |
client_id |
— | Drop — denormalized FK |
Approach:
1. During Area 4 file migration, mark which repo_file_id values have been processed.
2. After Area 4 completes, identify remaining repo_file records (those with no certificate_id AND no document_id, or with status=DELETED).
3. For deleted files: do not migrate (they were intentionally removed).
4. For unlinked but non-deleted files: upload to S3, create a Document record with status='new' and flag for manual review.
The repo_file_audit table (29 columns) records BEFORE/AFTER snapshots of repo_file state changes. It is a historical audit trail for file processing operations in DTX 1.0.
Don't migrate. DTX 2.0 tracks document/certificate changes through AuditHistory (evaluation snapshots) and UserActivityLog (HTTP request tracking). The old file-level audit trail has no operational value in the new system.
Keep CSV as archive.
The repo_throughput table (19 columns) is a denormalized reporting snapshot — a single batch-generated snapshot of file processing metrics per service owner. Column names contain special characters (#, %, spaces). No primary key.
Don't migrate. This is a computed cache, not source data. DTX 2.0 will calculate throughput metrics dynamically from actual Document, Certificate, and Evaluation data.
Keep CSV as archive for validation comparison.
The imap_upload table (10 columns) tracks files received via email-based IMAP ingestion — vendors/brokers emailing certificates to a monitored inbox. Each row records the sender email, filename, filesize, and links to a repo_file_id.
DTX 2.0 does not have an IMAP email ingestion system. Documents are uploaded via the web portal (drag-and-drop or magic links).
The team must decide: 1. Archive and discard — Email ingestion is a deprecated workflow. Vendors now upload via the web portal. 2. Build IMAP ingestion in DTX 2.0 — If email-based upload is a desired feature, create an IMapUpload model and processing pipeline. The DTX 1.0 data provides a useful reference for the schema. 3. Migrate as reference data — Store imap_upload records as metadata on the corresponding Document records (via repo_file_id → Document mapping) to preserve upload provenance.
The form_field table (17 columns) stores custom form field definitions per customer/vendor. In the sample, all rows are for form_name=vendor_view, model_name=Vendorhistory, property_name=custom_field_0, with per-vendor value entries.
DTX 2.0 handles custom fields differently:
- Record.internal_id and Record.internal_id1 — two custom text fields per Record
- AccountMeta — key-value metadata per Account (flexible but unstructured)
- Coverage models have hardcoded fields (not dynamic form fields)
The team must decide:
1. Map custom_field_0 values to Record.internal_id — If the custom field data is still needed per vendor, migrate form_field.value where property_name='custom_field_0' to Record.internal_id.
2. Store in AccountMeta — Migrate customer-level form field configurations as AccountMeta key-value pairs.
3. Archive and discard — If the custom form field system is replaced by DTX 2.0's fixed schema.
Note: The sample only shows one property_name. The full dataset likely has custom_field_1 through custom_field_4 (matching the keyword table's CUSTOM_FIELD_1 through CUSTOM_FIELD_4 labels). The full scope is unclear from sample data alone.
Pre-migration action: Query full DTX 1.0 for distinct form_name, model_name, property_name combinations to understand the full scope of custom fields.
The keyword table (3 columns) is a UI label customization system. Customers can rename standard labels — e.g., TRACKED_ITEM_NAME → "Tenants", GROUP_NAME → "Centers", CUSTOM_FIELD_1 → "Lease #".
DTX 2.0 does not have a keyword/label customization model. The UI uses fixed labels.
The team must decide: 1. Store in AccountMeta — Migrate keyword mappings as AccountMeta entries (key=keyword, value=replacement) so the DTX 2.0 UI can eventually support label customization. 2. Store in Directory.settings — Use the JSONField on Directory for per-tenant label overrides. 3. Archive and discard — If DTX 2.0 will use fixed labels for all customers.
Data size: Small — likely hundreds of rows total (6 keywords × number of customers who customized).
The exemption table (11 columns) records vendor-level exemptions from specific coverage or document requirements — e.g., "Sole proprietor" (WC exempt), "WC NOT MANDATORY IN TX", "waived". Each row has a vendor_id, item_type (COI/DOCUMENT), item_name (coverage line or document type), expiry date, and reason.
Column mapping:
| DTX 1.0 | DTX 2.0 | Action |
|---|---|---|
exemption_id |
— | No direct PK mapping (new auto-increment IDs) |
vendor_id |
ManualOverride.requirements_group |
Via Record → RequirementsGroupMember → RequirementsGroup |
item_type = 'COI' |
ManualOverride |
Coverage-level exemption |
item_type = 'DOCUMENT' |
DocRequirement.is_exempted = True |
Document requirement exemption |
item_name |
ManualOverride.coverage_type |
Map values (see below) |
ExemptExpDate |
ManualOverride.expires_at |
Direct; map sentinel dates to NULL |
ExemptReason |
ManualOverride.reason |
Direct (trim trailing whitespace) |
ExemptReviewStatus |
— | All 'COMPLIANT' in sample → implies approved status |
create_user_id |
ManualOverride.approved_by |
Via User mapping |
create_time |
ManualOverride.approved_at |
Direct |
requirements_id |
ManualOverride.coi_risk_profile |
Via Area 6 requirements → COIRiskProfile mapping |
src |
— | Drop — source reference (PH or certificate_id), not in 2.0 |
item_name → coverage_type mapping (COI exemptions):
| DTX 1.0 item_name | DTX 2.0 coverage_type |
|---|---|
GL |
GeneralLiability |
Auto |
AutoLiability |
WC |
WorkersCompensation |
UMB |
UmbrellaLiability |
Professional Liability |
OtherCoverage (with coverage_subtype) |
Boiler and Machinery |
PropertyCoverage (with coverage_subtype) |
EPLI |
OtherCoverage (with coverage_subtype) |
Business Personal Property |
PropertyCoverage (with coverage_subtype) |
Business Income |
PropertyCoverage (with coverage_subtype) |
Primary & Non-Contributory |
GeneralLiability (with item_key='primary_non_contributory') |
Sentinel date handling:
- 3000-01-01 → expires_at = NULL (permanent/never expires)
- 2050-* dates → expires_at = NULL (permanent exemption)
- Normal dates → direct mapping
For DOCUMENT exemptions (item_type='DOCUMENT'):
- Find the matching DocRequirement for the vendor's RequirementsGroup + document type
- Set DocRequirement.is_exempted = True
- Store reason in a note or metadata field
Pre-migration action: Build complete item_name → coverage_type mapping from full DTX 1.0 data. The sample shows ~10 unique values; the full dataset may have more.
The reviewer_address table (6 columns) stores reviewer email addresses per customer with PHP serialized email_string fields. Area 2 already mapped customer.reviewer_email → AccountEmail(email_type='reviewer').
Verify and supplement: Compare reviewer_address data against the AccountEmail records created by Area 2.
- If reviewer_address emails are already captured by customer.reviewer_email, no additional migration needed.
- If reviewer_address contains different or additional email addresses, create additional AccountEmail(email_type='reviewer') records.
- Deserialize PHP serialized email_string to extract email arrays.
Data quality note: type column is 100% empty (not NULL, just blank strings). level is always 0. These columns are effectively dead.
Four tables relate to DTX 1.0's billing/revenue system:
| Table | Columns | Structure | Content |
|---|---|---|---|
billing_data |
21 | Reporting snapshot, no PK | Account-level billing metrics (active PHs, compliance %, etc.) |
inactivated_billing_data |
12 | Reporting snapshot, no PK | Inactivated vendor billing lifecycle |
lost_revenue |
9 | Reporting snapshot, no PK | Revenue tracking for churned vendors |
prepay_so_list |
1 | Flag list | Customer IDs on prepay billing plans |
Don't migrate. DTX 2.0 has a completely different billing architecture:
- Stripe integration: BillingProfile, Invoice, Payment, BillingEvent models in docutrax_billing app
- Payment methods: stripe, wire, check, ach_direct, invoice, complimentary, sponsored
- Plan tiers: starter, professional, enterprise
- prepay_so_list → BillingProfile.payment_method (no need for a separate flag table)
The DTX 1.0 billing tables are denormalized snapshots of a legacy billing system that bears no structural resemblance to DTX 2.0's Stripe-based architecture. Historical billing data would need to be archived separately if needed for accounting/audit purposes.
Keep CSVs as archive for financial reference.
The work_log table (5 columns) tracks user sessions per customer: user_id, session_id (SHA-1 hash), customer_id, start_time, end_time. No primary key. ~50% of rows have NULL end_time.
Don't migrate. DTX 2.0 has UserActivityLog (docutrax_utils/models.py) which tracks HTTP requests with user, URL, IP, user agent, and payload. This is a fundamentally different (and more detailed) activity tracking approach.
The old session-per-customer logs are historical operational data with no value in the new system.
Keep CSV as archive.
The vornado_feed table (41 columns) is a client-specific data import for Vornado Realty Trust — importing tenant/lease data from their ABSINSQRY export. It includes property codes, tenant IDs, lease dates, insurance requirement flags, and processing status.
DTX 2.0 does not have a vornado_feed model or any client-specific data import tables. Area 1 also noted Vornado-specific rights (IDs 1000, 1009) that were dropped as "per-client hacks."
The team must decide:
1. Archive and discard — This was a custom integration for one client. If Vornado is still a customer in DTX 2.0, a new import mechanism should be designed.
2. Build a generic data import framework — Create a reusable CSV/data import system in DTX 2.0 that could handle Vornado-style feeds as well as other client integrations.
3. Migrate as seed data — If Vornado's tenant data needs to be in DTX 2.0, extract the tenant → Record mappings (via PH_ID → vendor mapping) and create Records with the relevant metadata.
Data quality note: Sentinel dates (12/31/2050 for perpetual leases), PH_ID = -1 for rejected records, typos ("reokacnebt" for "replacement", "Noncontributary" for "Noncontributory"), and 100% NULL contact fields.
| # | Issue | Impact | Mitigation |
|---|---|---|---|
| 1 | PHP serialized data in reviewer_address.email_string |
Requires deserialization to extract email addresses. | Use phpserialize Python package during migration. |
| 2 | Sentinel dates in exemption (3000-01-01, 2050-*) and vornado_feed (12/31/2050) |
Need normalization. | Map to NULL (permanent/never expires) during migration. |
| 3 | 7 tables lack primary keys: repo_throughput, billing_data, inactivated_billing_data, lost_revenue, prepay_so_list, work_log, keyword | Cannot deduplicate or do incremental migration. | Not migrated (Decisions #3, #9, #10) or use composite natural keys (keyword). |
| 4 | Column names with special characters (#, %, spaces, ', /, ?) in repo_throughput, billing_data, inactivated_billing_data, vornado_feed |
Would need normalization for any ORM usage. | Not migrated (all are reporting snapshots or client-specific). |
| 5 | Exemption item_name mapping may be incomplete | Sample shows ~10 coverage names; full dataset may have more. | Pre-migration action: query full distinct item_name values. |
| 6 | form_field scope unknown from sample | Sample only shows custom_field_0 for one customer. Full variety of form_name/model_name/property_name is unclear. |
Pre-migration action: query full DTX 1.0 for distinct combinations. |
| 7 | Orphaned repo_file records | Files uploaded but never linked to a certificate or document (no certificate_id and no document_id). | Decision #1: upload to S3 as Document with status='new', flag for review. |
| 8 | Exemption trailing whitespace in ExemptReason values | Minor data quality. | Trim during migration. |
| 9 | imap_upload duplicate uploads | Same vendor sent same file multiple times (no deduplication). | Only relevant if imap_upload is preserved (Decision #4). |
| # | Action | Blocking? |
|---|---|---|
| 1 | Query full form_field scope: Get distinct form_name, model_name, property_name combinations from DTX 1.0 to understand the full custom field landscape. |
Yes — needed for Decision #5. |
| 2 | Query full exemption item_name values: Get distinct item_name values for both COI and DOCUMENT types to complete the coverage_type mapping. |
Yes — needed for Decision #7. |
| 3 | Cross-reference reviewer_address with customer.reviewer_email: Determine if reviewer_address contains data not already captured by Area 2's AccountEmail migration. | Yes — needed for Decision #8. |
| 4 | Identify orphaned repo_file records: Count repo_file rows where certificate_id IS NULL AND document_id IS NULL AND status != 'DELETED' to gauge the scope of unlinked files. |
No — but affects Decision #1 scope. |
| 5 | Confirm exemption sentinel date values: Query full DTX 1.0 for distinct ExemptExpDate values to identify all sentinel patterns (3000-01-01, 2050-*, others?). |
No — but needed for accurate mapping. |
Step 1: Coordinate with Area 4 file migration utility (repo_file files → S3)
Step 2: Migrate exemption → ManualOverride + DocRequirement.is_exempted
Step 3: Verify reviewer_address against Area 2 AccountEmail (supplement if needed)
Step 4: Resolve form_field and keyword per team decisions
Step 5: Archive billing tables, work_log, repo_file_audit, repo_throughput (no migration)
Step 6: Resolve imap_upload and vornado_feed per team decisions
| Area | Reference | Status |
|---|---|---|
| Area 2: Customers/Accounts | reviewer_address → AccountEmail(email_type='reviewer') |
Verify — Decision #8 |
| Area 3: Vendors/Records | exemption.vendor_id → Record mapping |
Resolved — use Area 3 mapping |
| Area 4: Certificates/Documents | repo_file file migration utility (shared); repo_file.certificate_id/document_id |
Resolved — Area 4 Decision #5 defines shared utility |
| Area 6: Requirements/Compliance | exemption.requirements_id → RequirementsGroup/COIRiskProfile |
Resolved — use Area 6 mapping |
| Area 7: Extraction/OCR | repo_file.ocr_event_id (dropped) |
Resolved — column not migrated |
| Area 9: Brokers/Programs | program.charge_id FK to billing (dropped); program.envelope_id FK |
Resolved — program table not migrated |
| # | Decision | Status | Approach |
|---|---|---|---|
| 1 | repo_file → Shared file migration utility (Area 4 coordination) | Documented | See decision details below |
| 2 | repo_file_audit → Don't migrate | Documented | See decision details below |
| 3 | repo_throughput → Don't migrate | Documented | See decision details below |
| 4 | imap_upload → No target model | Documented | See decision details below |
| 5 | form_field → No target model (partial mapping) | Documented | See decision details below |
| 6 | keyword → No target model | Documented | See decision details below |
| 7 | exemption → ManualOverride + DocRequirement.is_exempted | Documented | See decision details below |
| 8 | reviewer_address → Verify against Area 2 AccountEmail | Documented | See decision details below |
| 9 | Billing tables (4) → Don't migrate | Documented | See decision details below |
| 10 | work_log → Don't migrate | Documented | See decision details below |
| 11 | vornado_feed → No target model | Documented | See decision details below |