ChemIQ Complete Schema Documentation
Overview
The ChemIQ module implements a comprehensive global repository architecture for chemical safety documents, consisting of three interconnected systems:
- Product Catalog - Master searchable product repository
- SDS (Safety Data Sheet) Repository - Global SDS document storage
- EPA Label Repository - Global EPA pesticide/fungicide label storage
This unified architecture enables multiple companies to efficiently search, share, and access chemical safety documents while maintaining data isolation and compliance.
Key Design Principles
- Global Repositories: Documents stored once, shared across companies via mapping tables
- Product Catalog Hub: Single search point linking to all document types
- S3 Integration: PDF files stored in AWS S3, metadata in PostgreSQL
- Deduplication: SHA-256 checksums prevent duplicate storage
- Version Tracking: Linked-list pattern for document revision history
- Full-Text Search: PostgreSQL TSVECTOR with GIN indexes
- Structured Data: JSONB for complex data, arrays for lists
- Company Isolation: Many-to-many mapping tables for multi-tenancy
- Scalability: Optimized indexes and partition-ready design
Architecture Overview
Four-Tier Document Management
┌─────────────────────────────────────────────────────────────┐
│ GLOBAL PRODUCT CATALOG │
│ (Master Searchable Repository - Shared) │
│ - Product identification (name, manufacturer, codes) │
│ - Multiple identifiers (CAS, UPC, GTIN, EPA Reg #) │
│ - PRIMARY SEARCH VECTOR (unified search) │
│ - Populated by: API validation, AI agent promotion │
└─────────────┬───────────────────────────────────────────────┘
│ (optional link for enrichment)
▼
┌─────────────────────────────────────────────────────────────┐
│ COMPANY PRODUCT CATALOG (NEW) │
│ (Company-Specific Product Entries) │
│ - Company's own product naming and codes │
│ - Links to global catalog (optional) │
│ - SDS and EPA label attachments │
│ - User contributions stay here (not global) │
└─────────────┬───────────────────────────────────────────────┘
│
├──────────────┬──────────────┬─────────────────┐
▼ ▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────┐ ┌──────────┐
│ SDS Docs │ │ EPA Labels │ │ COA │ │Tech Sheet│
└─────────────┘ └──────────────┘ └──────┘ └──────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────────────┐
│ COMPANY MAPPINGS │
│ (Company-specific isolation & tracking) │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ INVENTORY ITEMS (Simplified) │
│ (Site/Location-specific inventory tracking) │
│ - Links to company product catalog │
│ - Container size, quantity, location only │
└──────────────────────────────────────────────────────────┘
User Workflow
When user adds inventory item (Barcode Search Flow):
- Check Company Product Catalog first - Search
chemiq_company_product_catalogby barcode/product name (company's own products) - If not found, check Global Product Catalog - Search
chemiq_product_catalogby barcode/product name - If found in global - Copy to company catalog, link to global_product_id
- If not found, query External APIs - Open Food Facts, UPC Database, Barcode Lookup
- If API returns data - Save to BOTH global catalog (api_validated) and company catalog
- If still not found - User enters manually → saves to company catalog ONLY (not global)
- Create inventory record linking to
chemiq_company_product_catalog
Key Design Principle:
- User-contributed products go to
chemiq_company_product_catalogONLY - Global
chemiq_product_catalogis populated by:- External API validated products (high confidence)
- AI Agent promotion (future): When 30%+ of companies have same product
Benefits:
- Single search eliminates need to query multiple repositories
- Product metadata stored once per company (no duplication across sites/locations)
- Company isolation - each company has their own product catalog
- Global enrichment - can link to global catalog for additional data
- Data quality tracking - confidence scores and verification status
- One product can have multiple document versions
- Easy to add new document types (COA, tech sheets)
Part 1: Product Catalog System
Table 1: chemiq_product_catalog
Purpose: Master searchable product repository that serves as the central hub for all chemical products.
Key Features:
- Stores master product identification across all companies
- Multiple identifier types (product codes, CAS, UPC, GTIN, EPA reg #)
- PRIMARY search vector for unified product search
- Links to all document types via junction table
- Tracks product lifecycle (active, discontinued, replacements)
Important Columns:
product_id(UUID, PK): Primary identifierproduct_name(String(255), indexed): Product namemanufacturer(String(255), indexed): Manufacturer namemanufacturer_id(String(100)): Manufacturer's internal IDbrand_name(String(255)): Brand/trade nameproduct_codes(JSONB): Array of codes with types:[{"type": "manufacturer", "code": "ABC-123"}]cas_numbers(Text[]): Primary CAS numbersupc_barcodes(Text[]): Multiple barcodesgtin(String(14)): Global Trade Item Numberproduct_category(String(100)): Chemical, Pesticide, etc.product_type(String(100)): Insecticide, Herbicide, Cleaner, etc.formulation_type(String(50)): Liquid, Powder, Granular, etc.epa_reg_number(String(50), indexed): EPA registration number (for pesticides)epa_product_type(String(100)): EPA product classificationis_restricted_use(Boolean, default=false): Restricted use pesticide flagis_active(Boolean, indexed, default=true): Active product flagis_discontinued(Boolean, default=false): Discontinued product flagdiscontinued_date(Date): When product was discontinuedreplacement_product_id(UUID, FK): Links to replacement productsearch_vector(TSVECTOR): PRIMARY SEARCH VECTOR for all product searchescontributed_by_company_id(UUID, FK): Company that contributed this productsource_type(String(20)): manufacturer, user_contributed, epa_pplscreated_at,updated_at(Timestamp)
Indexes:
- GIN index on
search_vector(PRIMARY SEARCH INDEX) - GIN index on
upc_barcodes(array search) - GIN index on
cas_numbers(array search) - B-tree indexes on: product_name, manufacturer, epa_reg_number, is_active
TSVECTOR Trigger:
CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, brand_name, product_category, product_type);
Relationships:
- Self-referencing: replacement_product (for discontinued products)
- One-to-many: product_documents (links to SDSs, EPA labels, etc.)
Use Cases:
- Unified Search: Single search across all products and identifiers
- Barcode Scanning: Quick lookup by UPC/GTIN
- Product Discovery: Find products by name, manufacturer, CAS, EPA reg #
- Product Lifecycle: Track discontinued products and replacements
- Multi-Company Contribution: Companies can add products to shared catalog
Table 2: chemiq_product_documents
Purpose: Polymorphic junction table linking products to various document types (SDSs, EPA labels, COAs, tech sheets).
Key Features:
- Links one product to multiple documents
- Supports multiple document types (polymorphic association)
- Tracks document validity periods
- Supports multiple languages and regions
- CHECK constraints enforce document type integrity
Important Columns:
product_document_id(UUID, PK): Primary identifierproduct_id(UUID, FK, indexed): Links to product catalogdocument_type(String(20), required): sds, epa_label, coa, tech_sheetsds_id(UUID, FK): Links to SDS (if document_type = 'sds')label_id(UUID, FK): Links to EPA label (if document_type = 'epa_label')is_primary(Boolean, default=true): Primary document flagdocument_language(String(10), default='en'): Document languagedocument_region(String(10)): US, CA, EU, etc.valid_from,valid_until(Date): Document validity periodis_current(Boolean, default=true): Current document flagcreated_at(Timestamp)
CHECK Constraint:
(document_type = 'sds' AND sds_id IS NOT NULL AND label_id IS NULL) OR
(document_type = 'epa_label' AND label_id IS NOT NULL AND sds_id IS NULL) OR
(document_type IN ('coa', 'tech_sheet'))
Relationships:
- Many-to-one: product (Product Catalog)
- Many-to-one: sds_document (SDS Documents)
- Many-to-one: epa_label (EPA Label Documents)
Use Cases:
- Link products to multiple SDSs (different languages, regions)
- Link products to EPA labels for pesticides
- Track document versions and validity periods
- Support multi-language document management
Table 3: chemiq_company_product_catalog (NEW)
Purpose: Company-specific product catalog that sits between the global product catalog and inventory. Eliminates product data duplication across sites/locations within a company.
Key Features:
- Company-specific product entries (one record per product per company)
- Optional link to global product catalog for enrichment
- Stores company's own product naming and codes
- SDS and EPA label attachments at product level (not inventory level)
- Data quality tracking (source, confidence, verification status)
- Full-text search via TSVECTOR
- Unique constraint per company on (product_name, manufacturer)
Problem Solved:
Previously, product data (name, manufacturer, SDS) was duplicated in chemiq_inventory for every site/location. Now:
- Product info stored ONCE per company in
chemiq_company_product_catalog - Inventory records only store site/location-specific data (quantity, container size)
- SDS/EPA labels attached to product, not individual inventory items
Important Columns:
company_product_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed, required): Company ownershipglobal_product_id(UUID, FK): Optional link to global catalog for enrichment
Product Identification:
product_name(String(255), required): Product name (company's naming)manufacturer(String(255), required): Manufacturer nameproduct_code(String(100)): Company's internal product codecas_number(String(50)): Primary CAS numberbarcode_upc(String(100), indexed): UPC/EAN barcode (optional)barcode_source(String(20)): How barcode was obtained: scanned, label_extracted, manual, api_validatedinternal_sku(String(100), indexed): Company's internal product code/SKU (separate from UPC barcode)
Product Classification:
product_category(String(100)): Chemical, Pesticide, Cleaning, etc.product_type(String(100)): Insecticide, Herbicide, Solvent, etc.
Pesticide Identification (for EPA-regulated products):
is_pesticide(Boolean, default=false): Flag indicating product is EPA-regulated (insecticide, herbicide, fungicide, disinfectant, etc.)epa_registration_number(String(50), indexed): EPA Registration Number (e.g., 1234-567 or 1234-567-8901)signal_word(String(20)): Required warning level: DANGER, WARNING, or CAUTION
Document Links (company-specific SDS/Label selection):
current_sds_id(UUID, FK): Links to chemiq_sds_documentssds_attached_at(Timestamp): When SDS was attachedsds_attached_by_user_id(UUID, FK): User who attached SDSsds_missing(Boolean, default=true): SDS missing flagcurrent_epa_label_id(UUID, FK): Links to chemiq_epa_label_documentsepa_label_attached_at(Timestamp): When EPA label was attachedepa_label_attached_by_user_id(UUID, FK): User who attached EPA label
Data Quality Tracking:
data_source(String(50), default='user_contributed'): api_validated, user_contributed, label_extracted, admin_curatedconfidence_score(Numeric(3,2), default=0.5): 0.0-1.0 confidenceverification_status(String(20), default='unverified'): unverified, verified, needs_review, flaggedlast_verified_at(Timestamp): Last verification timestampsource_image_id(UUID): Link to label image if extracted via Vision LLM
Classification & Tagging:
hazard_categories(JSONB, default='[]', required): Denormalized array of GHS hazard categories derived from SDS (e.g.,["Flammable", "Acute Toxicity", "Corrosive"]). GIN-indexed for fast filtering.
Audit Fields:
is_active(Boolean, default=true): Active flagcreated_at,updated_at(Timestamp)created_by_user_id,updated_by_user_id(UUID, FK)
Full-Text Search:
search_vector(TSVECTOR): Auto-updated search index
Indexes:
- GIN index on
search_vector - GIN index on
hazard_categories(for JSONB contains queries) - B-tree indexes on: company_id, global_product_id, product_name, manufacturer, barcode_upc, internal_sku, epa_registration_number
- Composite index on (company_id, sds_missing) WHERE sds_missing = true
- Partial index on
is_pesticideWHERE is_pesticide = true (for efficient pesticide product queries) - Unique constraint on (company_id, product_name, manufacturer)
- Unique constraint on (company_id, barcode_upc) WHERE barcode_upc IS NOT NULL - prevents duplicate barcodes per company
TSVECTOR Trigger:
CREATE TRIGGER tsvector_company_product_trigger BEFORE INSERT OR UPDATE
ON chemiq_company_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, product_code, cas_number);
Relationships:
- Many-to-one: company (Companies)
- Many-to-one: global_product (Product Catalog) - optional enrichment link
- Many-to-one: current_sds (SDS Documents)
- Many-to-one: current_epa_label (EPA Label Documents)
- One-to-many: inventory_items (Inventory)
Use Cases:
- Store company-specific product catalog (no duplication across sites)
- Link products to SDSs and EPA labels at company level
- Track data quality and verification status
- Support barcode scanning with company-specific products first
- Enable fuzzy matching and duplicate detection within company
- Identify pesticide products for EPA label search (via
is_pesticideflag andepa_registration_number) - Track barcode source to distinguish trusted (scanned, API-validated) vs manual entries
- Store company's internal SKU/reference codes separate from UPC barcodes
- Future: AI agent promotion to global catalog when 30%+ companies have same product
Example Data Flow:
User adds "Clorox Wipes" at Site 1:
→ Check chemiq_company_product_catalog (not found)
→ Check chemiq_product_catalog (not found)
→ Query external APIs (found!)
→ Create record in chemiq_company_product_catalog (company_product_id = prod-1)
→ Create record in chemiq_inventory (links to prod-1, site-1)
User adds same product at Site 2:
→ Check chemiq_company_product_catalog (FOUND: prod-1)
→ Create record in chemiq_inventory (links to prod-1, site-2)
→ NO product duplication!
Table 4: chemiq_product_images
Purpose: Stores product label images uploaded during inventory entry. Supports multiple images per product for curved/multi-sided labels where product information is spread across different angles.
Key Features:
- Polymorphic association - can link to either company product catalog OR global product catalog
- Multi-image support (1-5 images per product)
- S3 storage with presigned URL generation
- SHA-256 checksums for deduplication
- Tracks images used for AI vision extraction
- Image metadata (dimensions, MIME type, original filename)
Important Columns:
image_id(UUID, PK): Primary identifiercompany_product_id(UUID, FK, indexed): Links to company product catalog (polymorphic)global_product_id(UUID, FK, indexed): Links to global product catalog (polymorphic)
Image Order:
image_order(Integer, default=1): Order for multi-image displays
S3 Storage:
s3_bucket(String(100), required): AWS S3 buckets3_key(Text, required): S3 object keyfile_checksum(String(64), indexed, required): SHA-256 hash for deduplicationfile_size(BigInteger): File size in bytesmime_type(String(50), default='image/jpeg'): MIME type (image/jpeg, image/png)original_filename(String(255)): Original uploaded filename
Image Metadata:
image_width(Integer): Image width in pixelsimage_height(Integer): Image height in pixelsimage_type(String(50), default='label'): Image type (label, product, container, etc.)
Extraction Metadata (for Vision LLM):
used_for_extraction(Boolean, default=false): Flag if image was used for AI label extractionextraction_confidence(Numeric(3,2)): Extraction confidence score (0.00-1.00)
Audit Fields:
uploaded_by_company_id(UUID, FK): Company that uploaded the imageuploaded_by_user_id(UUID, FK): User who uploaded the imagecreated_at(Timestamp): Upload timestamp
CHECK Constraint (Polymorphic Enforcement):
(company_product_id IS NOT NULL AND global_product_id IS NULL) OR
(company_product_id IS NULL AND global_product_id IS NOT NULL)
This constraint ensures each image belongs to exactly one product type (company or global), never both or neither.
Indexes:
- B-tree indexes on: company_product_id, global_product_id, file_checksum
Relationships:
- Many-to-one: company_product (Company Product Catalog) - polymorphic
- Many-to-one: global_product (Product Catalog) - polymorphic
- Many-to-one: uploaded_by_company (Companies)
- Many-to-one: uploaded_by_user (Users)
Use Cases:
- Store product label images during inventory entry
- Support multi-image uploads for curved/multi-sided labels
- Track which images were used for AI vision extraction
- Enable image display when viewing products
- Deduplicate images across products using checksums
- Generate presigned S3 URLs for secure frontend display
Example Data Flow:
User uploads 3 label images during product entry:
→ LabelExtractionService extracts product info via Vision LLM
→ ProductImageService uploads images to S3
→ 3 records created in chemiq_product_images (linked to company_product_id)
→ used_for_extraction = true, extraction_confidence = 0.85
User opens product later:
→ API returns product with images
→ Presigned S3 URLs generated (1 hour expiration)
→ Frontend displays all 3 label images
Part 2: EPA Label Repository
Table 5: chemiq_epa_label_documents
Purpose: Global repository of EPA pesticide/fungicide labels with comprehensive label data.
Key Features:
- Global EPA label storage (shared across companies)
- Complete EPA label metadata (active ingredients, target pests, application instructions)
- S3 storage for label PDFs
- Version tracking with linked-list pattern
- Full-text search on label content
- EPA PPLS API integration support
Important Columns:
label_id(UUID, PK): Primary identifierepa_reg_number(String(50), indexed, required): EPA registration numberepa_est_number(String(50)): EPA establishment numberproduct_name(String(255), indexed, required): Product namemanufacturer(String(255), indexed, required): Manufacturer namemanufacturer_address(Text): Full addressemergency_phone(String(50)): Emergency contact number
Active Ingredients:
active_ingredients(JSONB): Structured array:[{"name": "Glyphosate", "cas": "1071-83-6", "percentage": 41.0}]inert_ingredients_percentage(Numeric(5,2)): Percentage of inert ingredients
Label Classification:
signal_word(String(20), indexed): DANGER, WARNING, CAUTIONepa_product_type(String(100)): Insecticide, Herbicide, Fungicide, etc.formulation_type(String(50)): Liquid, Granular, Wettable Powder, etc.
Hazard Information:
hazard_statements(Text[]): Array of hazard statementsprecautionary_statements(Text[]): Array of precautionary statementsfirst_aid_measures(JSONB): Structured first aid instructions
Use Instructions:
target_pests(JSONB): Structured pest/application data:[{"pest": "ants", "application_rate": "1-2 oz per gallon"}]application_sites(Text[]): [residential, commercial, agricultural]application_methods(Text[]): [spray, bait, dust]restricted_entry_interval(String(100)): REI in hourspreharvest_interval(String(100)): PHI in days
File Storage (S3):
s3_bucket(String(100), required): AWS S3 buckets3_key(Text, required): S3 object keyfile_checksum(String(64), indexed, required): SHA-256 hashfile_size(BigInteger): File size in bytesmime_type(String(50), default='application/pdf')
Parsing Metadata:
label_parsed(Boolean, default=false): Parsing complete flagparse_confidence(Numeric(5,2)): Parse confidence score (0-100)parse_errors(Text): Parse error messages
Version Tracking:
label_version(String(50)): Label version numberrevision_date(Date, indexed, required): Label revision dateexpiration_date(Date): Label expiration dateprevious_version_id(UUID, FK): Links to older versionsuperseded_by_id(UUID, FK): Links to newer versionis_current(Boolean, indexed, default=true): Current version flag
Source Tracking:
source_type(String(20)): ppls_api, epa_website, manual_uploadsource_url(Text): Original source URLppls_last_synced(Timestamp): Last EPA PPLS sync timecontributed_by_company_id(UUID, FK): Contributing company
Full-Text Search:
search_vector(TSVECTOR): Auto-updated search index
Timestamps:
created_at,updated_at(Timestamp)
Indexes:
- GIN index on
search_vector - GIN index on
active_ingredients(JSONB) - GIN index on
target_pests(JSONB) - Unique constraint on (epa_reg_number, revision_date)
- B-tree indexes on: epa_reg_number, product_name, manufacturer, signal_word, revision_date, is_current
TSVECTOR Trigger:
CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_epa_label_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, epa_reg_number, epa_product_type);
Relationships:
- Self-referencing: previous_version, superseded_by (version chain)
- One-to-many: company_mappings, product_links, parse_jobs, change_log
Use Cases:
- Store EPA pesticide labels for lawn care, pest control companies
- Track active ingredients and target pests
- Search by EPA registration number
- Version tracking for label updates
- EPA PPLS API integration for automatic updates
- Company-specific label library management
Table 6: chemiq_company_label_mappings
Purpose: Many-to-many relationship between companies and EPA labels, enabling the global repository model.
Key Features:
- Links companies to EPA labels they use
- Company-specific review status and notes
- Compliance tracking (certifications, training)
- Usage statistics
- Alert management
Important Columns:
mapping_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed): Company identifierlabel_id(UUID, FK, indexed): EPA label identifier
Usage Tracking:
mapped_to_inventory_count(Integer, default=0): Number of inventory items using this label
Company-Specific Metadata:
internal_notes(Text): Company-specific notesreview_status(String(20), indexed, default='pending'): pending, reviewed, approved, flaggedreviewed_by_user_id(UUID, FK): User who reviewedreviewed_at(Timestamp): Review timestamp
Compliance Tracking:
applicator_certification_required(Boolean, default=false): Certification requirement flagrestricted_use(Boolean, default=false): Restricted use flagcompany_training_completed(Boolean, default=false): Training completion flag
Alerts:
expiry_alert_sent(Boolean, default=false): Alert sent flaglast_alert_sent_at(Timestamp): Last alert timestamp
Timestamps:
first_mapped_at(Timestamp): When company first linked this labellast_accessed_at(Timestamp): Last access time
Indexes:
- Unique constraint on (company_id, label_id)
- B-tree index on review_status
Relationships:
- Many-to-one: label_document (EPA Labels)
- Many-to-one: reviewed_by_user (Users)
Use Cases:
- Company-specific label libraries
- Track which companies use which labels
- Company review workflows
- Compliance tracking for restricted use products
- Usage analytics
- Expiry alert management
Table 7: chemiq_label_parse_queue
Purpose: Job queue for asynchronous EPA label parsing operations.
Key Features:
- Priority-based queue (1-10 scale)
- Retry mechanism
- Section-specific parsing
- Error tracking
Important Columns:
job_id(UUID, PK): Job identifierlabel_id(UUID, FK, indexed): Label to parsejob_status(String(20), indexed, default='pending'): pending, processing, completed, failedpriority(Integer, default=5): 1-10, higher = more urgentparse_sections(Text[]): Sections to parse (active_ingredients, target_pests, etc.)started_at,completed_at(Timestamp): Job timingerror_message(Text): Failure detailsretry_count(Integer, default=0): Number of retry attemptscreated_at(Timestamp)
Indexes:
- Composite index on (job_status, priority DESC)
Relationships:
- Many-to-one: label_document (EPA Labels)
Use Cases:
- Background processing of uploaded labels
- Retry failed parsing jobs
- Prioritize parsing for high-priority products
- Monitor parsing performance
Table 8: chemiq_label_change_log
Purpose: Audit trail for all EPA label document changes.
Key Features:
- Complete audit history
- Tracks user and company making changes
- Stores diffs of changes
- Change type categorization
Important Columns:
log_id(UUID, PK): Log entry identifierlabel_id(UUID, FK, indexed): Label documentchange_type(String(50)): created, updated, superseded, expiredchanged_by_user_id(UUID, FK): User who made changechanged_by_company_id(UUID, FK): Company contextchanges(JSONB): Diff of changes madeprevious_values(JSONB): Previous statecreated_at(Timestamp, indexed): When change occurred
Relationships:
- Many-to-one: label_document (EPA Labels)
- Many-to-one: changed_by_user (Users)
Use Cases:
- Compliance audit trails
- Track label updates over time
- Investigate data quality issues
- Generate change reports for regulators
Part 3: SDS Repository System
Table 9: chemiq_sds_documents
Purpose: Global repository of SDS documents with core metadata and S3 storage references.
Key Features:
- Stores product identification (Section 1)
- S3 bucket and key for PDF storage
- SHA-256 checksum for deduplication
- Full-text search via
search_vector(TSVECTOR) - Version tracking with
previous_version_idandsuperseded_by_id - Unique constraint on (manufacturer, product_name, revision_date)
Important Columns:
sds_id(UUID, PK): Primary identifierproduct_name(String(255), indexed): Chemical product namemanufacturer(String(255), indexed): Manufacturer nameproduct_code(String(100)): Manufacturer's product codemanufacturer_address(Text): Full addressemergency_phone(String(50)): Emergency contact number
Document Metadata:
revision_date(Date, indexed, required): SDS revision daterevision_number(String(50)): Revision version numberdocument_language(String(10), default='en'): Document language
File Storage (S3):
s3_bucket(String(100), required): AWS S3 buckets3_key(Text, required): S3 object keyfile_checksum(String(64), indexed, required): SHA-256 hashfile_size(BigInteger): File size in bytesmime_type(String(50), default='application/pdf')
Parsing Metadata:
sds_parsed(Boolean, default=false): Parsing complete flagparse_confidence(Numeric(5,2)): Parse confidence scoreparse_errors(Text): Parse error messages (JSON array)parsed_json(JSONB): Complete LLM-parsed SDS response for reference and debugging
Version Tracking:
previous_version_id(UUID, FK): Links to older versionsuperseded_by_id(UUID, FK): Links to newer versionis_current(Boolean, indexed, default=true): Current version flag
Source Tracking:
source_type(String(20)): web_search, manual_upload, manufacturer_api, pplssource_url(Text): Original source URLcontributed_by_company_id(UUID, FK): Contributing company
Full-Text Search:
search_vector(TSVECTOR): Auto-updated search index
Timestamps:
created_at,updated_at(Timestamp)
Indexes:
- GIN index on
search_vector - Unique constraint on (manufacturer, product_name, revision_date)
- B-tree indexes on: product_name, manufacturer, revision_date, file_checksum, is_current
TSVECTOR Trigger:
CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_sds_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, product_code);
Relationships:
- Self-referencing: previous_version, superseded_by (version chain)
- One-to-one: hazard_info
- One-to-many: composition, sections, company_mappings, parse_jobs, change_log
Use Cases:
- Global SDS storage shared across companies
- Version tracking for SDS updates
- Deduplication via checksums
- Full-text search across all SDSs
- S3 integration for PDF storage
Table 10: chemiq_sds_hazard_info
Purpose: Stores GHS (Global Harmonized System) hazard classification data from SDS Section 2.
Key Features:
- GHS pictograms and signal words
- H-codes (Hazard statements) as array
- P-codes (Precautionary statements) as array
- JSONB for structured hazard classes
- GIN indexes for fast array and JSONB searches
Important Columns:
hazard_id(UUID, PK): Primary identifiersds_id(UUID, FK, indexed): Links to SDS documentsignal_word(String(20), indexed): "Danger" or "Warning"pictograms(JSONB): Array of GHS pictogram codes (e.g., ["GHS02", "GHS07"])hazard_classes(JSONB): Structured hazard data:[{"class": "Flammable liquids", "category": "2", "code": "H225"}]hazard_statements(Text[]): Array of H-codes (e.g., ["H225", "H319"])precautionary_statements(Text[]): Array of P-codes (e.g., ["P210", "P233"])supplemental_hazards(Text): Additional hazard informationhazard_description_vector(TSVECTOR): Full-text search on hazard descriptionscreated_at(Timestamp)
Indexes:
- GIN indexes on pictograms, hazard_classes, hazard_statements
- B-tree index on signal_word
Relationships:
- One-to-one: sds_document (SDS Documents)
Use Cases:
- Search for chemicals by hazard class (flammable, corrosive, toxic)
- Find all SDSs with specific pictograms (e.g., flame, skull)
- Query by H-codes or P-codes
- Generate hazard summaries for locations
Table 11: chemiq_sds_composition
Purpose: Stores chemical composition and ingredients from SDS Section 3.
Key Features:
- Individual ingredient records with CAS numbers
- Concentration ranges and exact values
- Hazardous ingredient flagging
- Support for trade secrets
Important Columns:
composition_id(UUID, PK): Primary identifiersds_id(UUID, FK, indexed): Links to SDS documentchemical_name(String(255), required): Ingredient namecas_number(String(50), indexed): Chemical Abstracts Service numberec_number,reach_number(String(50)): European chemical identifiersconcentration_min,concentration_max(Numeric(5,2)): Concentration rangeconcentration_exact(Numeric(5,2)): Exact concentrationconcentration_text(String(100)): Text representation (e.g., "< 1%")is_trade_secret(Boolean, default=false): Protected ingredient flagis_hazardous(Boolean, indexed, default=false): Hazardous ingredient flagingredient_hazards(Text[]): H-codes specific to this ingredientcreated_at(Timestamp)
Indexes:
- B-tree index on cas_number
- B-tree index on is_hazardous
Relationships:
- Many-to-one: sds_document (SDS Documents)
Use Cases:
- Search for products containing specific chemicals (by CAS)
- Find all hazardous ingredients across inventory
- Calculate total hazardous chemical quantities
- Generate ingredient disclosure reports
Table 12: chemiq_sds_sections
Purpose: Stores SDS sections 4-16 with both structured (JSONB) and raw text content.
Key Features:
- Sections 4-16: First Aid, Fire-Fighting, Accidental Release, Handling & Storage, Exposure Controls, Physical/Chemical Properties, Stability, Toxicological Info, Ecological Info, Disposal, Transport, Regulatory, Other
- JSONB for structured data extraction
- Full-text search on section content
- Check constraint: section_number between 1 and 16
Important Columns:
section_id(UUID, PK): Primary identifiersds_id(UUID, FK, indexed): Links to SDS documentsection_number(Integer, indexed, required): Section number (1-16)section_title(String(255), required): Section titlesection_data(JSONB): Structured parsed datasection_text(Text): Raw text contentsection_text_vector(TSVECTOR): Full-text search indexparsed_at(Timestamp): When section was parsed
CHECK Constraint:
section_number >= 1 AND section_number <= 16
Indexes:
- GIN indexes on section_data and section_text_vector
- B-tree index on section_number
- Unique constraint on (sds_id, section_number)
Relationships:
- Many-to-one: sds_document (SDS Documents)
Use Cases:
- Search for first aid measures across all SDSs
- Query storage requirements by keyword
- Extract disposal information for waste management
- Generate emergency response guides
Table 13: chemiq_company_sds_mappings
Purpose: Many-to-many relationship between companies and SDSs, enabling the global repository model.
Key Features:
- Tracks which companies use which SDSs
- Company-specific review status
- Usage statistics
- Internal notes per company
Important Columns:
mapping_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed): Company identifiersds_id(UUID, FK, indexed): SDS document identifiermapped_to_inventory_count(Integer, default=0): Number of inventory items using this SDSinternal_notes(Text): Company-specific notesreview_status(String(20), indexed, default='pending'): pending, reviewed, approved, flaggedreviewed_by_user_id(UUID, FK): User who reviewedreviewed_at(Timestamp): Review timestampexpiry_alert_sent(Boolean, default=false): Alert trackinglast_alert_sent_at(Timestamp): Last alert timestampfirst_mapped_at(Timestamp): When company first linked this SDSlast_accessed_at(Timestamp): Last access time
Indexes:
- Unique constraint on (company_id, sds_id)
- B-tree index on review_status
Relationships:
- Many-to-one: sds_document (SDS Documents)
- Many-to-one: reviewed_by_user (Users)
Use Cases:
- Isolate company data while sharing SDS repository
- Track company-specific review workflows
- Monitor SDS usage across companies
- Send expiry alerts per company
Table 14: chemiq_sds_parse_queue
Purpose: Job queue for asynchronous SDS parsing operations.
Key Features:
- Priority-based queue (1-10 scale)
- Retry mechanism
- Section-specific parsing
- Error tracking
Important Columns:
job_id(UUID, PK): Job identifiersds_id(UUID, FK, indexed): SDS to parsejob_status(String(20), indexed, default='pending'): pending, processing, completed, failedpriority(Integer, default=5): 1-10, higher = more urgentparse_sections(Integer[], default=[1-16]): Which sections to parsestarted_at,completed_at(Timestamp): Job timingerror_message(Text): Failure detailsretry_count(Integer, default=0): Number of retry attemptscreated_at(Timestamp)
Indexes:
- Composite index on (job_status, priority DESC)
Relationships:
- Many-to-one: sds_document (SDS Documents)
Use Cases:
- Background processing of uploaded SDSs
- Retry failed parsing jobs
- Prioritize parsing for high-priority chemicals
- Monitor parsing performance
Table 15: chemiq_sds_change_log
Purpose: Audit trail for all SDS document changes.
Key Features:
- Complete audit history
- Tracks user and company making changes
- Stores diffs of changes
- Change type categorization
Important Columns:
log_id(UUID, PK): Log entry identifiersds_id(UUID, FK, indexed): SDS documentchange_type(String(50)): created, updated, superseded, deprecatedchanged_by_user_id(UUID, FK): User who made changechanged_by_company_id(UUID, FK): Company contextchanges(JSONB): Diff of changes madeprevious_values(JSONB): Previous statecreated_at(Timestamp, indexed): When change occurred
Relationships:
- Many-to-one: sds_document (SDS Documents)
- Many-to-one: changed_by_user (Users)
Use Cases:
- Compliance audit trails
- Track SDS updates over time
- Investigate data quality issues
- Generate change reports for regulators
Table 16: chemiq_sds_web_search_cache
Purpose: Caches Perplexity API web search results for SDS documents to reduce API costs and improve response times.
Key Features:
- Company-scoped cache entries
- 7-day TTL for cache entries
- Normalized search criteria for matching
- Tracks cache hits and access patterns
- Links to imported SDS when search results are used
Important Columns:
cache_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed, required): Company that performed the search
Search Criteria (normalized for matching):
product_name_normalized(String(255), required): Normalized product name (lowercase, common suffixes removed)manufacturer_normalized(String(255), required): Normalized manufacturer namebarcode_upc(String(100)): Optional barcode for more specific matchingsearch_type(String(20), default='sds', required): Type of search: sds, epa_label
Search Metadata:
search_query_used(Text, required): The actual query sent to Perplexity APIsearched_at(Timestamp with TZ, required): When the search was performed
Cached Results:
results(JSONB, required): Array of WebSDSResult objects:[
{
"title": "Product SDS - Manufacturer",
"url": "https://example.com/sds.pdf",
"snippet": "Safety Data Sheet for...",
"source_domain": "example.com",
"date": "2024-01-15",
"is_pdf": true,
"confidence_score": 0.85
}
]result_count(Integer, default=0): Number of results cached
Tracking:
created_by_user_id(UUID, FK): User who initiated the searchtimes_accessed(Integer, default=1): Number of times cache was accessedlast_accessed_at(Timestamp with TZ, required): Last access time
Cleanup Tracking:
sds_imported(Boolean, default=false): Whether an SDS was imported from these resultsimported_sds_id(UUID, FK): Links to the imported SDS document
Indexes:
- B-tree index on company_id
- B-tree index on searched_at
- B-tree index on sds_imported
- Unique constraint on (company_id, product_name_normalized, manufacturer_normalized)
Relationships:
- Many-to-one: company (Companies)
- Many-to-one: created_by_user (Users)
- Many-to-one: imported_sds (SDS Documents)
Use Cases:
- Cache Perplexity API search results to avoid repeated API calls
- Share cache within company (same product searches return cached results)
- Track which searches led to SDS imports
- Enable cache cleanup after successful imports
- Monitor API usage patterns
Cache Strategy:
- User searches for SDS on web
- Check cache for matching (company_id, product_name_normalized, manufacturer_normalized)
- If cache hit and not expired (< 7 days) and not imported, return cached results
- If cache miss, call Perplexity API and cache results
- When user imports SDS from results, mark cache as imported
Table 17: chemiq_sds_bulk_upload_jobs
Purpose: Tracks bulk upload jobs for ZIP files containing multiple SDS PDFs, enabling background processing of large-scale SDS imports.
Key Features:
- Asynchronous processing for large uploads (500+ files)
- Job status tracking (pending, extracting, processing, completed, failed)
- Progress metrics (total files, processed, successful, failed, duplicates)
- Detailed per-file results stored as JSONB
- S3 storage for uploaded ZIP files
- Company and user association for audit
Important Columns:
job_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed, required): Company that initiated the uploaduploaded_by_user_id(UUID, FK): User who uploaded the ZIP
File Information:
s3_bucket(String(100), required): S3 bucket for uploaded ZIPs3_key(Text, required): S3 object keyfile_name(String(255), required): Original ZIP filenamefile_size(BigInteger): File size in bytes
Job Status:
job_status(String(20), indexed, default='pending'): pending, extracting, processing, completed, failed
Progress Tracking:
total_files(Integer, default=0): Total PDFs found in ZIPprocessed_files(Integer, default=0): Number processed so farsuccessful_uploads(Integer, default=0): Successful uploadsfailed_uploads(Integer, default=0): Failed uploadsduplicates_found(Integer, default=0): Duplicate files skipped
Results:
results(JSONB): Detailed per-file results array:[
{"filename": "product_a.pdf", "success": true, "sds_id": "uuid", "is_duplicate": false},
{"filename": "product_b.pdf", "success": false, "error_message": "Invalid PDF", "is_duplicate": false}
]error_message(Text): Job-level error message if failed
Timestamps:
created_at(Timestamp, required): When job was createdstarted_at(Timestamp): When processing startedcompleted_at(Timestamp): When processing completed
Indexes:
- B-tree index on company_id
- B-tree index on job_status
Relationships:
- Many-to-one: company (Companies)
- Many-to-one: uploaded_by_user (Users)
Use Cases:
- Large-scale SDS library migrations (500+ documents)
- Background processing that survives browser closure
- Progress tracking and status polling from frontend
- Detailed audit trail of bulk import operations
- Error tracking for failed file processing
Processing Flow:
- User uploads ZIP file via
/bulk-upload-zipendpoint - ZIP is uploaded to S3, job record created with
status='pending' - Background worker picks up pending jobs
- Worker extracts ZIP, processes each PDF
- Progress updated in real-time (processed_files, successful_uploads, etc.)
- Frontend polls
/bulk-upload-status/{job_id}every 5 seconds - Job marked
completedorfailedwhen done
Example Job Status Response:
{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"job_status": "processing",
"file_name": "sds_library_2024.zip",
"total_files": 500,
"processed_files": 250,
"successful_uploads": 240,
"failed_uploads": 5,
"duplicates_found": 5
}
Part 4: Inventory Integration
Table 18: chemiq_inventory (Simplified)
Purpose: Site/location-specific chemical inventory tracking. Now simplified to only store inventory-specific data, with product information moved to chemiq_company_product_catalog.
Schema Change Summary: After the company product catalog decomposition, this table is significantly simplified:
- REMOVED: product_name, manufacturer, product_code, cas_number, barcode_upc (moved to company product catalog)
- REMOVED: current_sds_id, sds_missing, sds_attached_at, sds_attached_by_user_id (moved to company product catalog)
- REMOVED: current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id (moved to company product catalog)
- REMOVED: product_catalog_id (replaced by company_product_id)
- ADDED: company_product_id (links to chemiq_company_product_catalog)
Key Columns (Simplified Structure):
chemical_id(UUID, PK): Primary identifiercompany_product_id(UUID, FK, indexed, required): NEW - Links to chemiq_company_product_catalogcompany_id(UUID, FK, indexed): Company identifiersite_id(UUID, FK, indexed): Site identifierlocation_id(UUID, FK): Location identifier (optional)
Inventory-Specific Data (retained):
container_size(Numeric): Container size valuesize_unit(String(20)): Unit of measure (gal, oz, L, kg, etc.)quantity(Integer, default=1): Number of containers at this location
Note: batch_lot_number and expiration_date are stored in chemiq_inventory_attributes table, not directly in this table.
Audit Fields:
is_active(Boolean, default=true): Active flagcreated_at,updated_at(Timestamp)created_by_user_id,updated_by_user_id(UUID, FK)
Indexes:
- B-tree indexes on: company_product_id, company_id, site_id, location_id
- Composite index on (company_id, site_id, is_active)
- Unique constraint on (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true - prevents duplicate products at same location
Relationships:
- Many-to-one: company_product (Company Product Catalog) - PRIMARY LINK
- Many-to-one: company, site, location
- Many-to-one: created_by_user, updated_by_user
- One-to-many: attributes, reviews_pending, label_images
Benefits of Simplified Structure:
| Before (Denormalized) | After (Normalized) |
|---|---|
| Product data duplicated per site/location | Product data stored once per company |
| SDS attached per inventory item | SDS attached to product (shared across sites) |
| 15+ columns per inventory record | ~8 columns per inventory record |
| Inconsistent naming across sites | Single canonical name per product per company |
Example:
Before (denormalized):
| chemical_id | site_id | product_name | manufacturer | current_sds_id | quantity |
|-------------|---------|----------------|--------------|----------------|----------|
| inv-1 | site-1 | Clorox Wipes | Clorox | sds-123 | 5 |
| inv-2 | site-2 | Clorox Wipes | Clorox | sds-123 | 10 | ← Duplicate!
After (normalized):
chemiq_company_product_catalog:
| company_product_id | product_name | manufacturer | current_sds_id |
|-------------------|---------------|--------------|----------------|
| prod-1 | Clorox Wipes | Clorox | sds-123 |
chemiq_inventory:
| chemical_id | company_product_id | site_id | quantity |
|-------------|-------------------|---------|----------|
| inv-1 | prod-1 | site-1 | 5 |
| inv-2 | prod-1 | site-2 | 10 | ← No duplication!
Use Cases:
- Track inventory quantities per site/location
- Link to company product catalog for product details
- Support multi-site inventory management
- Container and quantity tracking only
- Simplified data model with no product duplication
Table 18b: chemiq_inventory_attributes
Purpose: Stores extended inventory item attributes such as batch/lot numbers, expiration dates, manufacturing dates, and storage conditions. Separated from the main inventory table to keep the core inventory record lightweight.
Key Features:
- One-to-one relationship with inventory items
- Batch/lot tracking for traceability
- Expiration and manufacture date tracking
- Storage condition notes
- Cascading delete with parent inventory record
Important Columns:
| Column | Type | Constraints | Description |
|---|---|---|---|
| attribute_id | UUID | PK | Primary identifier |
| chemical_id | UUID | FK, indexed, required | Links to chemiq_inventory |
| batch_lot_number | String(100) | Batch or lot number | |
| expiration_date | Date | Product expiration date | |
| manufacture_date | Date | Product manufacture date | |
| storage_conditions | Text | Storage condition notes | |
| notes | Text | General notes | |
| created_at | Timestamp | required, default=now | Creation time |
Indexes:
- B-tree index on
chemical_id
Relationships:
- Many-to-one:
chemical→ ChemIQInventory (CASCADE delete)
Use Cases:
- Track batch/lot numbers for product recalls
- Monitor expiration dates for shelf life management
- Record storage conditions for compliance
- Support traceability requirements
Table 18c: chemiq_sds_reviews_pending
Purpose: Tracks inventory items that need SDS review — typically created when chemicals are added without an SDS attached, or when bulk imports identify gaps.
Key Features:
- Tracks SDS review status per inventory item
- Company and site scoped
- Assignable to specific users for resolution
- Status workflow: pending → resolved
- Cascading delete with parent inventory, company, and site records
Important Columns:
| Column | Type | Constraints | Description |
|---|---|---|---|
| review_id | UUID | PK | Primary identifier |
| chemical_id | UUID | FK, required | Links to chemiq_inventory |
| company_id | UUID | FK, required | Company scope |
| site_id | UUID | FK, indexed, required | Site scope |
| review_reason | String(50) | required | Reason for review (e.g., missing_sds, expired_sds) |
| status | String(20) | required, default='pending' | pending, resolved |
| assigned_to_user_id | UUID | FK | User assigned to resolve |
| created_at | Timestamp | required, default=now | Creation time |
| resolved_at | Timestamp | When review was resolved |
Indexes:
- B-tree index on
site_id - B-tree index on
status
Relationships:
- Many-to-one:
chemical→ ChemIQInventory (CASCADE delete) - Many-to-one:
company→ Company (CASCADE delete) - Many-to-one:
site→ CompanySite (CASCADE delete) - Many-to-one:
assigned_to_user→ User (SET NULL on delete)
Use Cases:
- Track missing SDS documents after bulk imports
- Create review tasks for safety coordinators
- Monitor SDS compliance gaps per site
- Generate "Missing SDS" reports for Sentinel alerts
- Support fix-and-retry workflow for bulk imports
Search and Performance Optimization
Full-Text Search Strategy
Primary Search Flow (Updated for Company Product Catalog):
- User searches for product by name, manufacturer, barcode, CAS, EPA reg #
- First: Query
chemiq_company_product_catalog.search_vector(company's own products) - If not found: Query
chemiq_product_catalog.search_vector(global catalog) - If not found: Query external APIs (Open Food Facts, UPC Database)
- Product Catalog returns products with links to all documents
- Optional: Drill down to SDS/Label search if needed
TSVECTOR Columns with GIN Indexes:
chemiq_company_product_catalog.search_vector- COMPANY SEARCH (product name, manufacturer, product code, CAS)chemiq_product_catalog.search_vector- GLOBAL SEARCH (product name, manufacturer, brand, category)chemiq_epa_label_documents.search_vector- Label contentchemiq_sds_documents.search_vector- SDS metadatachemiq_sds_hazard_info.hazard_description_vector- Hazard descriptionschemiq_sds_sections.section_text_vector- Section content
Auto-Update Triggers:
- All
search_vectorcolumns have triggers for automatic updates on INSERT/UPDATE
Array and JSONB Search
GIN Indexes for Fast Queries:
chemiq_product_catalog.upc_barcodes(Text[])chemiq_product_catalog.cas_numbers(Text[])chemiq_epa_label_documents.active_ingredients(JSONB)chemiq_epa_label_documents.target_pests(JSONB)chemiq_sds_hazard_info.pictograms(JSONB)chemiq_sds_hazard_info.hazard_classes(JSONB)chemiq_sds_hazard_info.hazard_statements(Text[])chemiq_sds_sections.section_data(JSONB)
Deduplication Strategy
Product Catalog:
- No strict uniqueness (multiple sources can contribute same product)
- Merge/deduplication handled at application level
SDS Documents:
- Primary: SHA-256 checksum (
file_checksum) - Secondary: Unique constraint on (manufacturer, product_name, revision_date)
EPA Label Documents:
- Primary: SHA-256 checksum (
file_checksum) - Secondary: Unique constraint on (epa_reg_number, revision_date)
Version Tracking
Linked-List Pattern
All document types (Product Catalog, SDS, EPA Labels) use the same version tracking pattern:
Fields:
previous_version_id: Links to older versionsuperseded_by_id: Links to newer version (or NULL if current)is_current: Boolean flag for active version
Example Version Chain:
Doc v1.0 (2020) <- previous_version_id <- Doc v2.0 (2022) <- previous_version_id <- Doc v3.0 (2024)
| | |
superseded_by_id -> superseded_by_id -> is_current = true
Replacement Products (Product Catalog only):
replacement_product_id: Links to replacement product for discontinued items
Use Cases:
- Track document revision history
- Compare changes between versions
- Alert companies when new versions available
- Maintain compliance with version requirements
- Handle product discontinuation and replacements
Unified Search Architecture
Example Search Flow
User searches: "Roundup Pro Max"
-- Step 1: Search Product Catalog (PRIMARY SEARCH)
SELECT
p.product_id,
p.product_name,
p.manufacturer,
p.epa_reg_number,
p.cas_numbers,
p.upc_barcodes
FROM chemiq_product_catalog p
WHERE p.search_vector @@ to_tsquery('english', 'roundup & pro & max')
AND p.is_active = true
ORDER BY ts_rank(p.search_vector, to_tsquery('english', 'roundup & pro & max')) DESC
LIMIT 10;
-- Step 2: Get all documents for selected product
SELECT
pd.document_type,
pd.is_primary,
pd.document_language,
pd.document_region,
s.product_name AS sds_product_name,
s.revision_date AS sds_revision_date,
l.product_name AS label_product_name,
l.epa_reg_number,
l.revision_date AS label_revision_date
FROM chemiq_product_documents pd
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE pd.product_id = :product_id
AND pd.is_current = true;
-- Step 3: Check if company already has access
SELECT * FROM chemiq_company_sds_mappings
WHERE company_id = :company_id AND sds_id = :sds_id;
SELECT * FROM chemiq_company_label_mappings
WHERE company_id = :company_id AND label_id = :label_id;
-- Step 4: Create mappings if needed (application logic)
-- Step 5: Link to inventory item
Database Migrations
Migration 1: SDS Global Schema
File: 2025_12_03_1826-16a8bffcb66e_redesign_sds_global_schema.py
Tables Created:
- chemiq_sds_documents
- chemiq_sds_hazard_info
- chemiq_sds_composition
- chemiq_sds_sections
- chemiq_company_sds_mappings
- chemiq_sds_parse_queue
- chemiq_sds_change_log
Updates: chemiq_inventory (added current_sds_id, sds_attached_at, sds_attached_by_user_id)
Migration 2: Product Catalog and EPA Labels
File: 2025_12_03_2346-e15d8abd9d92_add_product_catalog_and_epa_labels.py
Tables Created:
- chemiq_product_catalog
- chemiq_epa_label_documents
- chemiq_product_documents
- chemiq_company_label_mappings
- chemiq_label_parse_queue
- chemiq_label_change_log
Updates: chemiq_inventory (added product_catalog_id, current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id)
Migration 3: Product Images
File: 2025_12_17_2333-9a5f77cdd3a6_add_chemiq_product_images_table.py
Tables Created:
- chemiq_product_images
Key Features:
- Polymorphic association to company or global product catalog
- S3 storage for product label images
- Multi-image support for curved/multi-sided labels
- Tracks images used for Vision LLM extraction
Migration 4: Unique Constraints for Barcode and Inventory Location
File: 2025_12_25_1545-b7aa5678347e_add_unique_constraints_for_barcode_and_.py
Constraints Added:
uq_company_product_barcodeon chemiq_company_product_catalog (company_id, barcode_upc) WHERE barcode_upc IS NOT NULLuq_inventory_product_locationon chemiq_inventory (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true
Key Features:
- Prevents duplicate barcodes per company in product catalog
- Prevents same product appearing twice at same location in inventory
- Includes data cleanup logic to merge existing duplicates before applying constraints
Migration 5: Add search_type to SDS Web Search Cache
File: 2026_01_31_2019-8d1fcec31661_add_search_type_to_sds_web_search_cache.py
Changes:
- Added
search_typecolumn (String(20), default='sds') tochemiq_sds_web_search_cache
Migration 6: Classification & Tagging
File: 2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py
Tables Created:
- chemiq_company_tags - company-scoped custom tags
- chemiq_product_tags - product-to-tag junction
- chemiq_ai_hazard_summaries - AI hazard summaries
Column Added:
hazard_categories(JSONB, default='[]') onchemiq_company_product_catalogwith GIN index
SQLAlchemy Models
SDS Models
Location: tellus-ehs-hazcom-service/app/db/models/chemiq_sds.py
Models:
SDSDocument- Main SDS model (includesparsed_jsonJSONB column for LLM responses)SDSHazardInfo- Hazard informationSDSComposition- Chemical ingredientsSDSSection- SDS sections 4-16CompanySDSMapping- Company relationshipsSDSParseJob- Parse queueSDSChangeLog- Audit trailSDSWebSearchCache- Perplexity API search result cacheSDSBulkUploadJob- ZIP bulk upload job tracking
Product Catalog & EPA Label Models
Location: tellus-ehs-hazcom-service/app/db/models/chemiq_product_catalog.py
Models:
ProductCatalog- Master product repositoryCompanyProductCatalog- Company-specific product catalogProductImage- Product label images (S3 storage, polymorphic to company/global products)EPALabelDocument- EPA label repositoryProductDocument- Junction table (polymorphic)CompanyLabelMapping- Company-label relationshipsLabelParseJob- Parse queueLabelChangeLog- Audit trail
Key Relationships:
ProductCatalog
├── product_documents (one-to-many) → ProductDocument
│ ├── sds_document (many-to-one) → SDSDocument
│ └── epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── replacement_product (self-referencing)
CompanyProductCatalog
├── current_sds (many-to-one) → SDSDocument
├── current_epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── global_product (many-to-one) → ProductCatalog
├── inventory_items (one-to-many) → ChemIQInventory
├── product_tags (one-to-many) → ProductTag
├── ai_hazard_summary (one-to-one) → AIHazardSummary
ProductImage
├── company_product (many-to-one) → CompanyProductCatalog (polymorphic)
├── global_product (many-to-one) → ProductCatalog (polymorphic)
├── uploaded_by_company (many-to-one) → Company
├── uploaded_by_user (many-to-one) → User
EPALabelDocument
├── company_mappings (one-to-many) → CompanyLabelMapping
├── product_links (one-to-many) → ProductDocument
├── parse_jobs (one-to-many) → LabelParseJob
├── change_log (one-to-many) → LabelChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)
SDSDocument
├── hazard_info (one-to-one) → SDSHazardInfo
├── composition (one-to-many) → SDSComposition
├── sections (one-to-many) → SDSSection
├── company_mappings (one-to-many) → CompanySDSMapping
├── parse_jobs (one-to-many) → SDSParseJob
├── change_log (one-to-many) → SDSChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)
Example Queries
Search product by barcode
SELECT * FROM chemiq_product_catalog
WHERE '885126012345' = ANY(upc_barcodes);
Find product and all its documents
SELECT
p.product_name,
p.manufacturer,
pd.document_type,
COALESCE(s.product_name, l.product_name) AS doc_product_name,
COALESCE(s.revision_date, l.revision_date) AS doc_revision_date
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE p.product_id = :product_id
AND pd.is_current = true;
Search EPA labels by target pest
SELECT * FROM chemiq_epa_label_documents
WHERE target_pests @> '[{"pest": "ants"}]'::jsonb
AND is_current = true;
Find all products containing a specific chemical (CAS)
SELECT
p.product_name,
p.manufacturer,
c.chemical_name,
c.cas_number,
c.concentration_max
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
JOIN chemiq_sds_composition c ON s.sds_id = c.sds_id
WHERE c.cas_number = '67-64-1' -- Acetone
AND p.is_active = true
AND s.is_current = true;
Get company's label library with usage stats
SELECT
l.epa_reg_number,
l.product_name,
l.manufacturer,
l.signal_word,
m.mapped_to_inventory_count,
m.review_status,
m.restricted_use
FROM chemiq_epa_label_documents l
JOIN chemiq_company_label_mappings m ON l.label_id = m.label_id
WHERE m.company_id = :company_id
AND l.is_current = true
ORDER BY m.mapped_to_inventory_count DESC;
Compliance and Regulatory
OSHA Hazard Communication Standard (HazCom 2012)
Requirements Met:
- Section 1: Product Identification ✓
- Section 2: Hazard Classification (GHS) ✓
- Section 3: Composition/Ingredients ✓
- Sections 4-16: Required information ✓
- Version tracking for updates ✓
- Audit trail for changes ✓
EPA FIFRA (Pesticides)
Integration: chemiq_epa_label_documents table
PPLS API: Automatic updates for pesticide labels via ppls_last_synced
EPA Reg No: Tracked in both Product Catalog and EPA Labels
Compliance Tracking: Company-specific certifications and training via chemiq_company_label_mappings
Data Retention
Active Documents: Kept indefinitely Superseded Documents: Retained for 30+ years (OSHA/EPA requirement) Change Logs: Retained for 7+ years (audit requirement) Discontinued Products: Flagged but never deleted (regulatory requirement)
Part 5: Bulk Import and Supporting Tables
Table 19: chemiq_inventory_uploads
Purpose: Tracks bulk upload operations for chemical inventory, providing import history, metrics, and audit trails.
Key Features:
- Bulk import tracking for CSV and invoice PDF uploads
- Import summary with detailed row-level metrics
- Job status tracking (pending, processing, completed, failed)
- Error tracking and reporting
- SDS gap detection
- Compliance audit trail
Important Columns:
upload_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed): Company that performed the uploadsite_id(UUID, FK, indexed): Target site for the importuploaded_by_user_id(UUID, FK): User who initiated the upload
File Information:
file_name(String(255), required): Original filenamefile_type(String(20)): csv, pdf_invoices3_bucket(String(100)): S3 bucket for uploaded files3_key(Text): S3 object keyfile_size(BigInteger): File size in bytes
Upload Status:
upload_status(String(20), indexed, default='pending'): pending, processing, completed, failed, partial_success
Import Metrics:
total_rows(Integer, default=0): Total rows in filerows_created(Integer, default=0): Successfully created recordsrows_updated(Integer, default=0): Updated existing recordsrows_skipped(Integer, default=0): Skipped (duplicates or validation failures)rows_errored(Integer, default=0): Failed to process
SDS Tracking:
unresolved_sds_count(Integer, default=0): Number of items without SDS attachedauto_matched_sds_count(Integer, default=0): Number of auto-matched SDSs
Processing Metadata:
processing_mode(String(20)): transactional, fix_and_retryvalidation_errors(JSONB): Array of row-level validation errors:[{"row": 5, "field": "product_name", "error": "Product name is required"}]error_summary(Text): Human-readable error summary
Timing:
started_at(Timestamp): When processing startedcompleted_at(Timestamp): When processing completedcreated_at(Timestamp): When upload was initiated
Indexes:
- B-tree indexes on: company_id, site_id, uploaded_by_user_id, upload_status
- Composite index on (company_id, created_at DESC) for history queries
Relationships:
- Many-to-one: company (Companies)
- Many-to-one: site (Sites)
- Many-to-one: uploaded_by_user (Users)
Use Cases:
- Track all bulk import operations for audit purposes
- Provide detailed feedback to users about import success/failures
- Monitor SDS gaps after bulk imports
- Generate import history reports
- Trigger Sentinel alerts for missing SDSs (Section 4.6.4)
- Support partial imports with fix-and-retry workflow
Example Import Summary:
{
"upload_id": "123e4567-e89b-12d3-a456-426614174000",
"file_name": "chemical_inventory_2024.csv",
"total_rows": 100,
"rows_created": 85,
"rows_updated": 10,
"rows_skipped": 3,
"rows_errored": 2,
"unresolved_sds_count": 15,
"upload_status": "completed"
}
Integration Points:
- Section 4.6 (Bulk Upload) in Chemical Inventory requirements
- Triggers Sentinel "Missing SDS Alert" when unresolved_sds_count > 0
- Creates records in
chemiq_sds_reviews_pendingfor unresolved items - Used by bulk import API endpoints for progress tracking
Table 22: background_job_logs
Purpose: Tracks execution of background jobs from tellus-ehs-background-service. Records job status, duration, processing counts, and any errors.
Key Features:
- Job identification (name and type)
- Status tracking with timestamps
- Optional company/site scope
- Processing statistics (records processed, success, failed)
- Error tracking and detailed JSONB storage
Important Columns:
log_id(UUID, PK): Primary identifierjob_name(String(100), indexed, required): Name of the job (e.g., "sds_bulk_upload_processor")job_type(String(50), required): Type of job (e.g., "scheduled", "triggered", "manual")status(String(20), indexed, default='running'): running, completed, failedstarted_at(Timestamp with TZ, indexed): When job startedcompleted_at(Timestamp with TZ): When job completedduration_seconds(Numeric(10,3)): Job duration in secondscompany_id(UUID, FK, optional): For company-scoped jobssite_id(UUID, FK, optional): For site-scoped jobsrecords_processed(Integer, default=0): Total records processedrecords_success(Integer, default=0): Successful recordsrecords_failed(Integer, default=0): Failed recordserror_message(Text): Error details if job faileddetails(JSONB): Additional job-specific details
Indexes:
- B-tree indexes on: job_name, status, started_at
Use Cases:
- Track background job execution history
- Monitor job performance and success rates
- Debug failed jobs with error messages
- Generate job execution reports
- Audit trail for scheduled processing
Table 23: chemiq_label_images (Legacy)
Purpose: Stores OCR-extracted product label images uploaded during inventory entry. This is a legacy table that predates chemiq_product_images.
Key Features:
- Links to inventory items
- Stores image path
- OCR extracted text and confidence score
- Legacy support for older inventory entries
Important Columns:
| Column | Type | Constraints | Description |
|---|---|---|---|
| image_id | UUID | PK | Primary identifier |
| chemical_id | UUID | FK, indexed | Links to inventory item |
| image_path | Text | required | Path to stored image |
| ocr_extracted_data | Text | OCR extracted text | |
| ocr_confidence | Numeric | OCR confidence score | |
| uploaded_at | Timestamp | Upload timestamp |
Note: New implementations should use chemiq_product_images instead, which supports the company product catalog architecture.
Relationships:
- Many-to-one: chemical (ChemIQ Inventory)
Use Cases:
- Legacy inventory label image storage
- OCR text extraction for product identification
- Backward compatibility with older inventory entries
Table 24: ref_chemical_registry
Purpose: Reference table containing a master registry of chemicals with CAS numbers and molecular information.
Key Features:
- Master chemical reference data
- CAS number registry
- Molecular formulas and weights
- UN hazard numbers
- State of matter classification
Important Columns:
| Column | Type | Constraints | Description |
|---|---|---|---|
| chemical_id | UUID | PK | Primary identifier |
| cas_number | Text | indexed | CAS registry number |
| primary_name | Text | Primary chemical name | |
| molecular_formula | Text | Chemical formula | |
| molecular_weight | Numeric | Molecular weight | |
| state_of_matter | Text | solid, liquid, gas | |
| un_number | Text | UN hazard classification number | |
| description | Text | Chemical description | |
| created_at | Timestamp | Creation time | |
| updated_at | Timestamp | Last update time |
Indexes:
- B-tree index on
cas_number
Relationships:
- One-to-many: synonyms (Chemical Synonyms)
Use Cases:
- Master chemical reference lookup
- CAS number validation
- Hazard classification reference
- Chemical property database
Table 25: ref_chemical_synonyms
Purpose: Stores alternative names and synonyms for chemicals in the registry.
Key Features:
- Multiple names per chemical
- Language support
- Preferred name flagging
- Synonym type categorization
Important Columns:
| Column | Type | Constraints | Description |
|---|---|---|---|
| synonym_id | UUID | PK | Primary identifier |
| chemical_id | UUID | FK, indexed | Links to chemical registry |
| synonym | Text | Alternative name | |
| synonym_type | Text | Type: trade_name, common_name, iupac, etc. | |
| language_code | Text | ISO language code | |
| is_preferred | Boolean | Preferred name flag | |
| created_at | Timestamp | Creation time |
Indexes:
- B-tree index on
chemical_id - B-tree index on
synonym
Relationships:
- Many-to-one: chemical (Chemical Registry)
Use Cases:
- Chemical name lookup by synonym
- Multi-language chemical names
- Trade name to CAS number mapping
- Search enhancement with synonyms
Part 6: Chemical Enrichment, Compliance & PPE Systems
This section documents the tables added for advanced chemical data enrichment, site quantity tracking, regulatory compliance checking, and PPE recommendation features.
Migration Files:
2026_01_15_1200-a1b2c3d4e5f7_add_chemical_enrichment_and_compliance_tables.py- 20 tables2026_01_15_1300-b2c3d4e5f6g8_add_ghs_hazard_codes_reference.py- 2 tables
Table 26: chemiq_pubchem_cache
Purpose: Caches chemical property data fetched from PubChem and other external sources to reduce API calls and improve response times.
Key Features:
- External chemical data cache (properties, exposure limits, GHS data)
- Multi-source support (PubChem, NIOSH, manufacturer)
- Priority-based refresh cycles (30/90/180 days)
- Tracks fetch statistics and cache performance
Important Columns:
cache_id(UUID, PK): Primary identifiercas_number(String(20), indexed, required): CAS registry numberpubchem_cid(Integer): PubChem Compound ID
Chemical Identification:
iupac_name(String(500)): IUPAC systematic namecommon_name(String(255)): Common chemical namemolecular_formula(String(100)): Chemical formulamolecular_weight(Numeric): Molecular weight in g/molcanonical_smiles(Text): Canonical SMILES notation for chemical structureinchi(Text): IUPAC International Chemical Identifier (InChI)inchi_key(String(27), indexed): InChIKey hash for fast structure lookupschemical_family(String(100)): Chemical family classificationchemical_class(String(100)): Chemical class
Physical Properties:
vapor_pressure_mmhg(Numeric): Vapor pressure at 25°Cboiling_point_c(Numeric): Boiling point in Celsiusflash_point_c(Numeric): Flash point in Celsiusspecific_gravity(Numeric): Specific gravitywater_solubility_mg_l(Numeric): Water solubility in mg/L
Exposure Limits:
osha_pel_ppm(Numeric): OSHA Permissible Exposure Limit (ppm)osha_pel_mg_m3(Numeric): OSHA PEL (mg/m³)niosh_rel_ppm(Numeric): NIOSH Recommended Exposure Limit (ppm)niosh_rel_mg_m3(Numeric): NIOSH REL (mg/m³)niosh_idlh_ppm(Numeric): NIOSH IDLH valueacgih_tlv_ppm(Numeric): ACGIH Threshold Limit Value
GHS Classification:
ghs_classification(JSONB): Structured GHS hazard data
Cache Metadata:
source(String(50)): Data source (pubchem, niosh, manufacturer)source_url(String(500)): Source URLfetched_at(Timestamp): When data was fetchedlast_refreshed_at(Timestamp): Last refresh timefetch_count(Integer): Number of times fetchedpriority_level(Integer, 1-3): Cache priority (1=high, 3=low)created_at,updated_at(Timestamp)
Indexes:
- Unique constraint on
cas_number - B-tree index on
pubchem_cid - B-tree index on
inchi_key(for structure-based lookups)
Use Cases:
- Cache PubChem API responses to avoid rate limiting
- Store NIOSH Pocket Guide data for exposure limits
- Enable offline chemical property lookup
- Pre-populate common industrial chemicals
- Structure-based chemical matching using InChIKey
Table 27: chemiq_chemical_fetch_queue
Purpose: Background job queue for fetching chemical data from external APIs (PubChem, NIOSH).
Key Features:
- Priority-based queue processing
- Retry mechanism with max attempts
- Tracks source context (SDS, inventory, manual)
- Company-scoped requests
Important Columns:
queue_id(UUID, PK): Primary identifiercas_number(String(20), indexed, required): CAS number to fetchsource_type(String(50), required): sds_parse, inventory_add, manual_requestsource_id(UUID): Reference to source recordcompany_id(UUID, FK): Requesting companypriority(String(20)): high, normal, lowstatus(String(20), indexed): pending, processing, completed, failedattempts(Integer): Current attempt countmax_attempts(Integer): Maximum retry attemptslast_attempt_at(Timestamp): Last attempt timeerror_message(Text): Error details if failedcreated_at(Timestamp)processed_at(Timestamp): When processing completed
Indexes:
- Composite index on (status, priority)
- B-tree index on
cas_number
Use Cases:
- Queue chemical enrichment requests from SDS parsing
- Batch process multiple chemical lookups
- Retry failed API requests with backoff
- Track enrichment request sources
Table 28: chemiq_cache_statistics
Purpose: Tracks cache performance metrics for monitoring and optimization.
Important Columns:
stat_id(UUID, PK): Primary identifierperiod_start(Date, required): Period start dateperiod_end(Date, required): Period end datetotal_lookups(Integer): Total cache lookupscache_hits(Integer): Successful cache hitscache_misses(Integer): Cache misses requiring API fetchfetch_successes(Integer): Successful API fetchesfetch_failures(Integer): Failed API fetchesavg_fetch_time_ms(Numeric): Average fetch latencytop_chemicals(JSONB): Most frequently accessed chemicalscreated_at(Timestamp)
Use Cases:
- Monitor cache hit rates (target: >95%)
- Track API performance and reliability
- Identify frequently accessed chemicals for pre-caching
- Generate performance reports
Table 29: chemiq_unit_conversions
Purpose: Reference table for unit conversion factors used in quantity aggregation.
Key Features:
- Standard units: lbs (weight), gal (volume)
- Bidirectional conversion factors
- Supports weight, volume, and count units
Important Columns:
conversion_id(UUID, PK): Primary identifierunit_code(String(20), required): Unit identifier (oz, gal, L, kg)unit_name(String(50), required): Full unit nameunit_type(String(20), required): weight, volumeto_standard_factor(Numeric, required): Conversion factor to standard unitstandard_unit(String(20), required): Standard unit (lbs, gal)abbreviation(String(10), required): Display abbreviationcreated_at(Timestamp)
Seed Data (14 records):
- Weight: oz, lbs, g, kg
- Volume: fl_oz, cup, pt, qt, gal, mL, L
Use Cases:
- Convert inventory quantities to standard units
- Support Tier II reporting (lbs required)
- Enable fire code threshold comparisons
Table 30: chemiq_product_density
Purpose: Stores density values for volume-to-weight conversion.
Important Columns:
density_id(UUID, PK): Primary identifiercas_number(String(20), indexed): CAS number for chemical-specific densityproduct_id(UUID, FK): Product-specific densitydensity_value(Numeric, required): Density valuedensity_unit(String(20), required): Density unit (g/mL, lb/gal)density_lbs_per_gal(Numeric, required): Normalized densitydata_source(String(100)): Source of density datacreated_at,updated_at(Timestamp)
Seed Data:
- Water baseline: 8.34 lbs/gal
Use Cases:
- Convert gallons to pounds for Tier II reporting
- Calculate weight from volume measurements
- Support fire code quantity calculations
Table 31: chemiq_site_chemical_summary
Purpose: Aggregated chemical quantities per site for Tier II reporting and threshold monitoring.
Key Features:
- Per-chemical totals (lbs, gallons, containers)
- Tracks contributing products
- Fire code classification
- Tier II threshold tracking
Important Columns:
summary_id(UUID, PK): Primary identifiercompany_id(UUID, FK, required): Company identifiersite_id(UUID, FK, required): Site identifiercas_number(String(20), required): Chemical CAS numberchemical_name(String(255), required): Chemical nametotal_quantity_lbs(Numeric): Total weight in poundstotal_quantity_gal(Numeric): Total volume in gallonstotal_containers(Integer): Container countproduct_count(Integer): Number of contributing productscontributing_products(JSONB): Product detailsis_hazardous(Boolean): Hazardous classificationhazard_classes(JSONB): GHS hazard classesphysical_state(String(20)): solid, liquid, gasmax_daily_quantity_lbs(Numeric): Max daily quantity (Tier II)avg_daily_quantity_lbs(Numeric): Average daily quantity (Tier II)days_on_site(Integer): Days present (Tier II)fire_code_category(String(50)): Fire code hazard categoryfire_code_class(String(20)): Fire code hazard classstorage_locations(JSONB): Storage location detailstier2_threshold_lbs(Integer): Tier II reporting thresholdfire_permit_threshold(Numeric): Fire permit thresholdexceeds_tier2_threshold(Boolean): Tier II threshold exceededexceeds_fire_threshold(Boolean): Fire threshold exceededlast_calculated_at(Timestamp): Last aggregation timecalculation_method(String(50)): Calculation method usedcreated_at,updated_at(Timestamp)
Indexes:
- Composite index on (company_id, site_id)
- B-tree index on
cas_number
Use Cases:
- Generate Tier II reports
- Monitor fire code compliance
- Track site-level chemical totals
- Alert on threshold exceedances
Table 32: chemiq_site_hazard_summary
Purpose: Aggregated quantities by hazard class per site for fire code compliance.
Important Columns:
summary_id(UUID, PK): Primary identifiercompany_id(UUID, FK, required): Company identifiersite_id(UUID, FK, required): Site identifierhazard_category(String(50), required): Hazard category (Flammable, Corrosive)hazard_class(String(100), required): Specific hazard classghs_code(String(20)): GHS hazard codefire_code_category(String(50)): IFC categoryfire_code_class(String(20)): IFC classtotal_quantity_lbs(Numeric): Total weighttotal_quantity_gal(Numeric): Total volumetotal_containers(Integer): Container countchemical_count(Integer): Number of chemicalschemicals(JSONB): Contributing chemical detailsfire_permit_limit(Numeric): Fire permit thresholdexceeds_permit_limit(Boolean): Threshold exceededpercent_of_limit(Numeric): Percentage of limit usedlast_calculated_at(Timestamp)created_at,updated_at(Timestamp)
Use Cases:
- Fire department permit applications
- Hazard class quantity tracking
- Threshold monitoring by hazard type
Table 33: chemiq_site_quantity_history
Purpose: Daily snapshots of chemical quantities for Tier II reporting (max/avg daily calculations).
Key Features:
- 7-year retention for regulatory compliance
- Daily automated snapshots
- Supports Tier II max/avg calculations
Important Columns:
history_id(UUID, PK): Primary identifiercompany_id(UUID, FK, required): Company identifiersite_id(UUID, FK, required): Site identifiercas_number(String(20), required): Chemical CAS numberchemical_name(String(255), required): Chemical namesnapshot_date(Date, indexed, required): Snapshot datequantity_lbs(Numeric): Quantity in poundsquantity_gal(Numeric): Quantity in gallonscontainer_count(Integer): Container countis_present(Boolean): Chemical present on this datesource(String(50)): Snapshot source (daily_snapshot, manual)created_at(Timestamp)
Indexes:
- Composite index on (company_id, site_id, snapshot_date)
- B-tree index on
cas_number
Use Cases:
- Calculate Tier II max daily quantity
- Calculate Tier II average daily quantity
- Calculate days on site
- Historical quantity tracking
Table 34: chemiq_fire_code_thresholds
Purpose: Reference table for fire code quantity limits by hazard category.
Important Columns:
threshold_id(UUID, PK): Primary identifierhazard_category(String(50), required): Hazard categoryhazard_class(String(20)): Specific classstorage_type(String(50), required): Storage typemax_quantity_gal(Numeric): Maximum gallonsmax_quantity_lbs(Numeric): Maximum poundsmax_quantity_cu_ft(Numeric): Maximum cubic feetcode_reference(String(100)): Code citationjurisdiction(String(50)): Jurisdiction (IFC, state)notes(Text): Additional notescreated_at(Timestamp)
Seed Data (13 records): Based on IFC Table 5003.1.1(1) for flammable liquids, corrosives, oxidizers, etc.
Use Cases:
- Fire code compliance checking
- Permit threshold calculations
- Automated threshold alerts
Table 35: chemiq_regulatory_lists
Purpose: Master regulatory list tracking chemicals on OSHA, EPA, IARC, and state lists.
Key Features:
- OSHA PEL/specific standards tracking
- EPA SARA 313 (TRI) and CERCLA RQ
- California Prop 65
- IARC/NTP carcinogen classifications
Important Columns:
list_id(UUID, PK): Primary identifiercas_number(String(20), indexed, required): CAS numberchemical_name(String(255)): Chemical name
OSHA Fields:
is_osha_pel_listed(Boolean): Has OSHA PELhas_osha_specific_standard(Boolean): Has substance-specific standardosha_standard_citation(String(50)): Standard citation (e.g., 1910.1028)is_niosh_rel_listed(Boolean): Has NIOSH RELis_acgih_tlv_listed(Boolean): Has ACGIH TLV
EPA SARA 313 (TRI):
is_epa_sara_313(Boolean): On TRI listsara_313_threshold_lbs(Integer): Reporting threshold (default 10,000)sara_313_category(String(50)): Categorysara_313_pbt(Boolean): Persistent bioaccumulative toxic
EPA CERCLA:
is_epa_cercla(Boolean): On CERCLA listcercla_rq_lbs(Integer): Reportable quantity
California Prop 65:
is_california_prop65(Boolean): On Prop 65 listprop65_type(String(20)): cancer, reproductive, bothprop65_listing_date(Date): When listedprop65_nsrl_ug(Numeric): No Significant Risk Levelprop65_madl_ug(Numeric): Maximum Allowable Dose Level
Carcinogen Classification:
is_carcinogen(Boolean): Known/suspected carcinogencarcinogen_source(String(20)): IARC, NTP, OSHAcarcinogen_classification(String(20)): Group 1, 2A, 2B, etc.
EU REACH:
is_eu_reach(Boolean): On REACH listreach_status(String(50)): REACH status
Metadata:
last_verified_at(Timestamp): Last verification datesource_urls(Text[]): Source URLsnotes(Text): Additional notescreated_at,updated_at(Timestamp)
Indexes:
- Unique constraint on
cas_number
Use Cases:
- Compliance checking against regulatory lists
- TRI reporting determination
- Prop 65 warning requirements
- Carcinogen identification
Table 36: chemiq_company_compliance_settings
Purpose: Company-specific compliance settings for customized checking.
Important Columns:
setting_id(UUID, PK): Primary identifiercompany_id(UUID, FK, required): Company identifierships_to_california(Boolean): Ships to CA (triggers Prop 65)employee_count(Integer): Employee count (affects TRI)naics_code(String(10)): NAICS code (affects TRI)use_niosh_rel_over_pel(Boolean): Use NIOSH REL instead of PELexposure_monitoring_frequency(String(20)): Monitoring frequencynotify_on_compliance_issues(Boolean): Enable notificationscompliance_notification_emails(Text[]): Notification recipientscreated_at,updated_at(Timestamp)
Indexes:
- Unique constraint on
company_id
Use Cases:
- Configure Prop 65 requirements
- Set TRI reporting thresholds
- Customize exposure monitoring
- Enable compliance alerts
Table 37: chemiq_compliance_results
Purpose: Stores compliance check results per product/chemical.
Important Columns:
result_id(UUID, PK): Primary identifiercompany_id(UUID, FK, required): Company identifiersite_id(UUID, FK): Site scopeinventory_id(UUID, FK): Inventory itemsds_id(UUID, FK): SDS documentcheck_date(Timestamp): When check was performedcompliance_score(Integer): 0-100 scoreis_compliant(Boolean): Overall compliance status
Check Results (JSONB):
hazcom_result: HazCom compliance detailsexposure_monitoring_result: Exposure monitoring requirementstri_reporting_result: TRI reporting determinationcercla_result: CERCLA RQ statusprop65_result: Prop 65 requirementscarcinogen_result: Carcinogen classification
Action Items:
required_actions(Text[]): Required actionswarnings(Text[]): Warning messageschecked_by(String(50)): Check method (auto, manual)created_at(Timestamp)
Indexes:
- B-tree indexes on company_id, site_id, sds_id
Use Cases:
- Store compliance check results
- Track compliance history
- Generate compliance reports
- Identify action items
Table 38: ppe_glove_chemical_resistance
Purpose: Chemical-to-glove-material compatibility ratings.
Important Columns:
resistance_id(UUID, PK): Primary identifiercas_number(String(20), indexed, required): Chemical CAS numberchemical_name(String(255), required): Chemical namechemical_family(String(100)): Chemical family
Material Ratings (VG/G/F/P/NR):
nitrile_rating(String(10)): Nitrile rubber ratingbutyl_rating(String(10)): Butyl rubber ratingneoprene_rating(String(10)): Neoprene ratingpvc_rating(String(10)): PVC ratingviton_rating(String(10)): Viton ratinglatex_rating(String(10)): Natural latex rating
Breakthrough Times (minutes):
nitrile_breakthrough_min(Integer)butyl_breakthrough_min(Integer)neoprene_breakthrough_min(Integer)viton_breakthrough_min(Integer)
Recommendations:
recommended_thickness_mil(Integer): Recommended thicknesssource(String(255)): Data sourcenotes(Text): Additional notescreated_at,updated_at(Timestamp)
Seed Data (41 records): Common industrial chemicals with material ratings from Ansell Chemical Resistance Guide:
- Solvents: Acetone, MEK, Toluene, Xylene, Hexane, Heptane, etc.
- Alcohols: Methanol, Ethanol, Isopropanol, Butanol
- Acids: Hydrochloric, Sulfuric, Nitric, Phosphoric, Acetic
- Bases: Sodium Hydroxide, Potassium Hydroxide, Ammonia
- Aldehydes: Formaldehyde, Glutaraldehyde
- Chlorinated: Methylene Chloride, Trichloroethylene, Perchloroethylene
Use Cases:
- Glove material selection
- Breakthrough time calculations
- PPE recommendation generation
Table 39: ppe_eye_protection_reference
Purpose: Hazard type to eye protection mapping.
Important Columns:
reference_id(UUID, PK): Primary identifierhazard_type(String(50), required): Hazard typeghs_hazard_code(String(10)): GHS codeminimum_protection(String(100), required): Minimum requiredrecommended_protection(String(100)): Recommended levelansi_marking_required(String(20)): ANSI marking (Z87.1)face_shield_recommended(Boolean): Face shield neededfull_face_respirator_option(Boolean): Can use full-face respiratorselection_reasoning(Text): Selection rationalecreated_at(Timestamp)
Seed Data (8 records): Hazard types mapped to eye protection requirements:
- chemical_splash: Safety goggles with indirect venting (Z87.1+D3)
- dust: Safety goggles or glasses with side shields (Z87.1)
- vapor: Safety goggles with indirect venting (Z87.1+D3)
- corrosive_splash: Face shield + safety goggles (Z87.1+D3)
- severe_eye_damage: Face shield + safety goggles (Z87.1+D3)
- eye_irritation: Safety glasses with side shields (Z87.1)
- general_chemical: Safety goggles (Z87.1+D3)
- uv_radiation: Tinted safety glasses (Z87.1+U6)
Use Cases:
- Eye protection selection
- GHS-based recommendations
- ANSI compliance
Table 40: ppe_respiratory_reference
Purpose: Chemical to respirator cartridge requirements.
Important Columns:
reference_id(UUID, PK): Primary identifiercas_number(String(20), indexed): CAS numberchemical_name(String(255)): Chemical namechemical_family(String(100)): Chemical family
Exposure Limits:
osha_pel_ppm(Numeric): OSHA PELosha_pel_mgm3(Numeric): OSHA PEL (mg/m³)niosh_rel_ppm(Numeric): NIOSH RELniosh_rel_mgm3(Numeric): NIOSH REL (mg/m³)niosh_idlh_ppm(Numeric): IDLH value
Cartridge Selection:
cartridge_type(String(50)): OV, AG, OV/AG, etc.cartridge_color(String(50)): Black, yellow, etc.particulate_filter(String(20)): N95, P100, etc.
Physical Properties:
vapor_pressure_mmhg(Numeric): Vapor pressureis_particulate(Boolean): Particulate hazardoil_present(Boolean): Oil present
Respirator Requirements:
min_respirator_type(String(50)): APR, PAPR, SAR, SCBAmin_apf(Integer): Minimum APF requiredrequires_supplied_air(Boolean): Requires SAR/SCBAspecial_considerations(Text): Special notessource(String(100)): Data sourcecreated_at,updated_at(Timestamp)
Seed Data (19 records): Common chemicals with respiratory protection requirements from NIOSH Pocket Guide:
- Organic Solvents: Acetone, Toluene, Xylene, MEK, Hexane (OV cartridge, black)
- Acid Gases: HCl, Sulfuric acid mist, Nitric acid (AG cartridge, white/yellow)
- Ammonia: Ammonia (AM cartridge, green)
- Formaldehyde: Formaldehyde (Formaldehyde cartridge, olive)
- Particulates: Silica dust, Metal fumes (N95/P100 filters)
- Multi-gas: Combination OV/AG cartridges for mixed exposures
- High-hazard: IDLH chemicals requiring supplied air (SCBA/SAR)
Use Cases:
- Respirator cartridge selection
- APF calculations
- IDLH determinations
Table 41: ppe_body_protection_reference
Purpose: Hazard to body protection level mapping.
Important Columns:
reference_id(UUID, PK): Primary identifierhazard_type(String(50), required): Hazard typeghs_hazard_code(String(10)): GHS codeminimum_protection(String(100), required): Minimum protectionrecommended_protection(String(100)): Recommended levelprotection_level(String(5)): EPA level (A, B, C, D)min_material_type(String(50)): Minimum materialselection_reasoning(Text): Selection rationalecreated_at(Timestamp)
Seed Data (8 records): Hazard severity to EPA protection level mapping:
- Level D: General work, minimal hazard (standard coveralls)
- Level C: Known contaminants, APR adequate (Tyvek + APR)
- Level B: High concentrations, splash hazard (Tychem + SCBA)
- Level A: Highest protection, IDLH/unknown (fully encapsulated + SCBA)
- Plus specific hazard types: corrosive, oxidizer, flammable, toxic
Use Cases:
- Body protection selection
- EPA protection levels
- Chemical suit requirements
Table 42: ppe_body_material_resistance
Purpose: Chemical-to-body-protection material compatibility.
Important Columns:
resistance_id(UUID, PK): Primary identifiercas_number(String(20), indexed, required): CAS numberchemical_name(String(255), required): Chemical name
Material Ratings:
tyvek_rating(String(10)): Tyvek ratingtychem_qc_rating(String(10)): Tychem QC ratingtychem_2000_rating(String(10)): Tychem 2000 ratingtychem_4000_rating(String(10)): Tychem 4000 ratingtychem_6000_rating(String(10)): Tychem 6000 ratingsaranex_rating(String(10)): Saranex rating
Breakthrough Times:
tychem_2000_breakthrough_min(Integer)tychem_4000_breakthrough_min(Integer)source(String(100)): Data sourcecreated_at(Timestamp)
Seed Data (8 records): Common chemicals with DuPont Tyvek/Tychem material ratings:
- Solvents: Acetone, Toluene, Xylene (Tychem 2000/4000 required)
- Acids: Sulfuric acid, Hydrochloric acid (Tychem 4000/6000 required)
- Bases: Sodium Hydroxide (Tychem QC/2000 adequate)
- Water-based: General aqueous solutions (Tyvek adequate)
- High-hazard: Concentrated acids/bases (Tychem 6000/Saranex required)
Use Cases:
- Chemical suit material selection
- Breakthrough time calculations
Table 43: ppe_products
Purpose: Catalog of available PPE products with specifications.
Important Columns:
product_id(UUID, PK): Primary identifierproduct_name(String(255), required): Product namemanufacturer(String(100), required): Manufacturerproduct_code(String(100)): SKU/model numberppe_category(String(50), required): hand, eye, respiratory, bodyppe_type(String(50), required): gloves, goggles, respirator, coverallppe_subtype(String(50)): Specific subtype
Glove Fields:
glove_material(String(50)): Material typeglove_thickness_mil(Numeric): Thickness in milsglove_length_inches(Numeric): Lengthglove_ansi_level(Integer): Cut resistance level
Eye Protection Fields:
eye_ansi_marking(String(20)): ANSI markingeye_lens_type(String(50)): Lens typeeye_has_side_shields(Boolean): Side shieldseye_indirect_vent(Boolean): Indirect venting
Respirator Fields:
respirator_type(String(50)): APR, PAPR, etc.respirator_apf(Integer): Assigned Protection Factorrespirator_cartridge_type(String(50)): Cartridge typerespirator_filter_rating(String(10)): N95, P100, etc.respirator_niosh_approved(Boolean): NIOSH approved
Body Protection Fields:
body_coverage(String(50)): Full, partial, etc.body_material(String(50)): Material typebody_protection_level(String(5)): EPA levelbody_seam_type(String(50)): Seam construction
Common Fields:
certifications(Text[]): Certifications listsize_options(String(100)): Available sizespurchase_url(String(500)): Purchase linkprice_range(String(50)): Price rangepack_quantity(Integer): Units per packis_active(Boolean): Active flagcreated_at,updated_at(Timestamp)
Indexes:
- B-tree indexes on ppe_category, ppe_type, manufacturer
Seed Data (20 records): Representative PPE products across all categories:
- Gloves (5): Nitrile (Ansell Sol-Vex 37-175), Butyl (North B-174), Neoprene (Ansell Neox 09-430), PVC (Ansell Snorkel), Viton (North F-091)
- Eye Protection (5): Safety glasses with side shields, Chemical splash goggles (indirect vent), Face shields, Full-face respirator visors, UV-protective glasses
- Respirators (5): Half-face APR (3M 6000), Full-face APR (3M 6800), PAPR (3M Versaflo), N95 disposable (3M 8210), P100 cartridge set
- Body Protection (5): Tyvek coverall, Tychem QC coverall, Tychem 2000 suit, Tychem 4000 suit, Apron (chemical resistant)
Use Cases:
- PPE product catalog
- Product recommendations
- Specification matching
Table 44: ppe_sds_recommendations
Purpose: Generated PPE recommendations per SDS document.
Important Columns:
recommendation_id(UUID, PK): Primary identifiersds_id(UUID, FK, required): SDS document
Glove Recommendation:
glove_material(String(50)): Recommended materialglove_thickness_mil(Integer): Recommended thicknessglove_ansi_level(Integer): Cut resistanceglove_reasoning(Text): Selection rationaleglove_confidence(Numeric): Confidence score
Eye Protection:
eye_protection_type(String(50)): Type recommendedeye_ansi_marking(String(20)): ANSI markingeye_face_shield_required(Boolean): Face shield neededeye_reasoning(Text): Selection rationaleeye_confidence(Numeric): Confidence score
Respiratory:
respirator_type(String(50)): Type recommendedrespirator_cartridge(String(50)): Cartridge typerespirator_filter(String(10)): Filter ratingrespirator_apf_required(Integer): Minimum APFrespirator_reasoning(Text): Selection rationalerespirator_confidence(Numeric): Confidence score
Body Protection:
body_protection_type(String(50)): Type recommendedbody_material(String(50)): Material typebody_protection_level(String(5)): EPA levelbody_reasoning(Text): Selection rationalebody_confidence(Numeric): Confidence score
Overall:
overall_confidence(Numeric): Overall confidencegenerated_at(Timestamp): Generation timegenerated_by(String(50)): Generation method (auto, manual)last_reviewed_at(Timestamp): Last reviewreviewed_by(UUID, FK): Reviewercreated_at(Timestamp)
Indexes:
- Unique constraint on
sds_id
Use Cases:
- Store PPE recommendations per SDS
- Track recommendation confidence
- Enable human review workflow
Table 45: ppe_ghs_hazard_mapping
Purpose: Maps GHS hazard codes to PPE requirements (PPE-triggering codes only).
Important Columns:
mapping_id(UUID, PK): Primary identifierghs_code(String(10), required): GHS hazard codehazard_statement(Text): Hazard statement texthazard_category(String(50)): Hazard category
PPE Requirements:
eye_protection_min(String(50)): Minimum eye protectioneye_protection_recommended(String(50)): Recommended eye protectionhand_protection_min(String(50)): Minimum hand protectionrespirator_required(Boolean): Respirator requiredrespirator_type_min(String(50)): Minimum respiratorbody_protection_min(String(50)): Minimum body protectionnotes(Text): Additional notescreated_at(Timestamp)
Seed Data (13 codes): H310, H311, H312, H314, H315, H317, H318, H319, H330, H331, H332, H334, H335
Use Cases:
- Quick PPE lookup by GHS code
- SDS-to-PPE mapping
- Automated recommendations
Table 46: chemiq_ghs_hazard_codes
Purpose: Complete reference table for all GHS hazard codes (H-codes).
Key Features:
- All physical hazards (H200 series)
- All health hazards (H300 series)
- All environmental hazards (H400 series)
- Combined hazard statements
- PPE triggering flags
Important Columns:
hazard_id(UUID, PK): Primary identifiercode(String(30), indexed, required): Hazard codecode_type(String(20), required): hazard, precautionary, combinedhazard_class(String(100)): Hazard class namehazard_category(String(100)): Category designationstatement(Text, required): Hazard statement textsignal_word(String(20)): Danger, Warningpictogram_codes(Text[]): GHS pictograms (GHS01-GHS09)category_number(Integer): Category numberis_physical(Boolean): Physical hazard flagis_health(Boolean): Health hazard flagis_environmental(Boolean): Environmental hazard flagtriggers_ppe(Boolean): Triggers PPE requirementppe_categories(Text[]): PPE categories requirednotes(Text): Additional notescreated_at(Timestamp)
Seed Data (94 codes):
- 29 physical hazards (H200 series)
- 57 health hazards (H300 series)
- 8 environmental hazards (H400 series)
- 13 combined hazard statements
Indexes:
- Unique constraint on
code - B-tree indexes on code_type, hazard_class
Use Cases:
- GHS code validation
- Hazard statement lookup
- Pictogram determination
- PPE requirement lookup
Table 47: chemiq_ghs_pictograms
Purpose: Reference table for GHS pictograms (GHS01-GHS09).
Important Columns:
pictogram_id(UUID, PK): Primary identifiercode(String(10), required): Pictogram code (GHS01-GHS09)name(String(100), required): Pictogram namesymbol_description(Text, required): Symbol descriptionhazard_types(Text[]): Associated hazard typescreated_at(Timestamp)
Seed Data (9 pictograms):
- GHS01: Exploding Bomb
- GHS02: Flame
- GHS03: Flame Over Circle
- GHS04: Gas Cylinder
- GHS05: Corrosion
- GHS06: Skull and Crossbones
- GHS07: Exclamation Mark
- GHS08: Health Hazard
- GHS09: Environment
Use Cases:
- Pictogram lookup
- SDS label generation
- Hazard visualization
Part 7: Classification & Tagging System
This section documents the tables added for chemical classification (GHS hazard categories), custom company tags, and AI-generated hazard summaries.
Migration File:
2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py
Table 48: chemiq_company_tags
Purpose: Company-scoped custom tags for organizing and categorizing chemicals.
Key Features:
- Company-specific tag definitions with color coding
- Unique tag names per company
- Cascading delete when company is removed
Important Columns:
tag_id(UUID, PK): Primary identifiercompany_id(UUID, FK, indexed, required): Company that owns this tagname(String(50), required): Tag display namecolor(String(7), default='#6B7280', required): Hex color code for displaydescription(String(255)): Optional tag descriptioncreated_by_user_id(UUID, FK): User who created the tagcreated_at(Timestamp with TZ, default=now): Creation timeupdated_at(Timestamp with TZ): Last update time
Indexes:
- B-tree index on
company_id - Unique constraint on (company_id, name)
Relationships:
- Many-to-one: company → Company (CASCADE delete)
- Many-to-one: created_by_user → User (SET NULL on delete)
- One-to-many: product_tags → ProductTag
Use Cases:
- Create custom organizational tags (e.g., "High Priority", "Needs Review", "Lab Only")
- Color-coded tag display in inventory list and details page
- Company-specific categorization beyond GHS hazard classes
Table 49: chemiq_product_tags
Purpose: Junction table linking company products to tags (many-to-many).
Key Features:
- Links products to company tags
- Tracks who assigned the tag
- Cascading deletes from both product and tag sides
Important Columns:
id(UUID, PK): Primary identifiercompany_product_id(UUID, FK, indexed, required): Links to company product catalogtag_id(UUID, FK, indexed, required): Links to company tagassigned_by_user_id(UUID, FK): User who assigned the tagassigned_at(Timestamp with TZ, default=now): Assignment time
Indexes:
- B-tree index on
company_product_id - B-tree index on
tag_id - Unique constraint on (company_product_id, tag_id)
Relationships:
- Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
- Many-to-one: tag → CompanyTag (CASCADE delete)
- Many-to-one: assigned_by_user → User (SET NULL on delete)
Use Cases:
- Assign multiple tags to a product
- Filter inventory by tag
- Track tag assignment history
- Dashboard tag distribution charts
Table 50: chemiq_ai_hazard_summaries
Purpose: Stores AI-generated plain-English hazard summaries for products with parsed SDS data.
Key Features:
- One summary per product (unique index on company_product_id)
- Risk level classification (low/moderate/high/extreme)
- Key hazards as JSONB array
- Tracks which SDS and AI model were used for generation
Important Columns:
id(UUID, PK): Primary identifiercompany_product_id(UUID, FK, unique index, required): Links to company product catalogsummary_text(Text, required): AI-generated plain-English hazard summaryhandling_guidance(Text): AI-generated handling and storage guidancerisk_level(String(20)): Overall risk level: low, moderate, high, extremekey_hazards(JSONB): Array of top hazard phrases (e.g.,["Highly flammable", "Causes skin irritation"])model_used(String(50)): AI model used for generation (e.g., "gpt-4o", "claude-3")generated_at(Timestamp with TZ, default=now): When summary was generatedsds_document_id(UUID, FK): SDS document used as source for generation
Indexes:
- Unique index on
company_product_id(one summary per product)
Relationships:
- Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
- Many-to-one: sds_document → SDSDocument (SET NULL on delete)
Use Cases:
- Display AI-generated hazard summary on product details page
- Risk level badges in inventory list
- Handling guidance for employees without chemistry background
- Regenerate summary when SDS is updated
Summary
The ChemIQ Complete Schema provides a comprehensive, unified architecture for managing chemical safety documents across multiple companies:
Key Benefits
- Unified Search: Two-tier product catalog (company → global) for fast, relevant results
- No Product Duplication: Company product catalog eliminates duplicate product data across sites/locations
- Document Flexibility: One product can link to multiple document types (SDS, EPA Label, COA, tech sheets)
- Global Repositories: SDSs and EPA Labels shared across companies (no duplication)
- Company Isolation: Company product catalog and mapping tables maintain data separation
- Data Quality Tracking: Confidence scores, verification status, and data source tracking
- Performance: Extensive indexing (GIN, B-tree) for fast search across large datasets
- Compliance: Complete audit trails, version tracking, and retention policies
- Scalability: Partition-ready design for millions of documents
- Integration: S3 storage, async parsing queues, EPA PPLS API, external barcode APIs
Total Tables: 50
Product Catalog System (4 tables):
- chemiq_product_catalog (global shared catalog)
- chemiq_product_documents (polymorphic junction table)
- chemiq_company_product_catalog - company-specific products
- chemiq_product_images - product label images for curved/multi-sided labels
EPA Label Repository (4 tables): 5. chemiq_epa_label_documents 6. chemiq_company_label_mappings 7. chemiq_label_parse_queue 8. chemiq_label_change_log
SDS Repository (9 tables): 9. chemiq_sds_documents 10. chemiq_sds_hazard_info 11. chemiq_sds_composition 12. chemiq_sds_sections 13. chemiq_company_sds_mappings 14. chemiq_sds_parse_queue 15. chemiq_sds_change_log 16. chemiq_sds_web_search_cache - caches Perplexity API web search results 17. chemiq_sds_bulk_upload_jobs - tracks ZIP bulk upload background jobs
Supporting Tables (8 tables): 18. chemiq_inventory (simplified - links to company product catalog) 19. chemiq_inventory_attributes - stores batch/lot numbers and expiration dates 20. chemiq_sds_reviews_pending 21. chemiq_inventory_uploads 22. background_job_logs - tracks background service job execution 23. chemiq_label_images - legacy label images (use chemiq_product_images for new implementations) 24. ref_chemical_registry - master chemical CAS number registry 25. ref_chemical_synonyms - chemical name synonyms and aliases
Chemical Enrichment (3 tables): 26. chemiq_pubchem_cache - PubChem/NIOSH chemical data cache 27. chemiq_chemical_fetch_queue - background fetch job queue 28. chemiq_cache_statistics - cache performance metrics
Site Quantity Aggregation (6 tables): 29. chemiq_unit_conversions - unit conversion factors 30. chemiq_product_density - volume-to-weight conversion 31. chemiq_site_chemical_summary - per-chemical site totals 32. chemiq_site_hazard_summary - per-hazard-class site totals 33. chemiq_site_quantity_history - daily snapshots for Tier II 34. chemiq_fire_code_thresholds - fire code limits
Compliance Checking (3 tables): 35. chemiq_regulatory_lists - OSHA/EPA/IARC/Prop65 lists 36. chemiq_company_compliance_settings - company compliance config 37. chemiq_compliance_results - compliance check results
PPE Recommendation (8 tables): 38. ppe_glove_chemical_resistance - glove material ratings 39. ppe_eye_protection_reference - eye protection mapping 40. ppe_respiratory_reference - respirator requirements 41. ppe_body_protection_reference - body protection levels 42. ppe_body_material_resistance - suit material ratings 43. ppe_products - PPE product catalog 44. ppe_sds_recommendations - generated PPE recommendations 45. ppe_ghs_hazard_mapping - GHS to PPE mapping
GHS Reference (2 tables): 46. chemiq_ghs_hazard_codes - complete GHS H-codes (94 codes) 47. chemiq_ghs_pictograms - GHS pictograms (9 pictograms)
Classification & Tagging (3 tables): 48. chemiq_company_tags - company-scoped custom tags with color coding 49. chemiq_product_tags - product-to-tag junction table 50. chemiq_ai_hazard_summaries - AI-generated plain-English hazard summaries
Seed Data Summary
| Table | Records | Source | Description |
|---|---|---|---|
chemiq_unit_conversions | 14 | Standard | Unit conversion factors (oz→lbs, gal→lbs, etc.) |
chemiq_fire_code_thresholds | 13 | IFC 5003.1.1 | Fire code quantity limits by hazard class |
chemiq_product_density | 1 | Standard | Water baseline density |
chemiq_ghs_hazard_codes | 94 | GHS Rev 9 | Complete GHS H-codes (physical/health/environmental) |
chemiq_ghs_pictograms | 9 | GHS Rev 9 | All GHS pictograms (GHS01-GHS09) |
chemiq_pubchem_cache | ~300 | NIOSH/PubChem | Pre-seeded common industrial chemicals with exposure limits |
ppe_glove_chemical_resistance | 41 | Ansell Guide | Glove material ratings for common chemicals |
ppe_eye_protection_reference | 8 | OSHA/ANSI | Hazard type to eye protection mapping |
ppe_respiratory_reference | 19 | NIOSH Guide | Respirator cartridge requirements by chemical |
ppe_body_protection_reference | 8 | EPA/OSHA | Hazard severity to protection level mapping |
ppe_body_material_resistance | 8 | DuPont | Tyvek/Tychem material compatibility ratings |
ppe_products | 20 | Various | Representative PPE product catalog |
ppe_ghs_hazard_mapping | 13 | GHS/OSHA | GHS hazard codes to PPE requirements |
Note: Seed data is populated via Alembic migrations in tellus-ehs-hazcom-service/alembic/versions/.
Schema Evolution Note
The addition of chemiq_company_product_catalog represents a key architectural improvement:
- Before: Product data was denormalized in
chemiq_inventory, causing duplication across sites/locations - After: Product data is normalized in company product catalog, inventory only stores site/location-specific quantities
This change supports the incremental product catalog building strategy outlined in BARCODE_PRODUCT_CATALOG_STRATEGY.md.
This unified architecture supports the core mission of the Tellus EHS platform: storing chemical inventory documents and generating actionable insights based on composition, hazards, and safety information.