Skip to main content

ChemIQ Complete Schema Documentation

Overview

The ChemIQ module implements a comprehensive global repository architecture for chemical safety documents, consisting of three interconnected systems:

  1. Product Catalog - Master searchable product repository
  2. SDS (Safety Data Sheet) Repository - Global SDS document storage
  3. EPA Label Repository - Global EPA pesticide/fungicide label storage

This unified architecture enables multiple companies to efficiently search, share, and access chemical safety documents while maintaining data isolation and compliance.

Key Design Principles

  • Global Repositories: Documents stored once, shared across companies via mapping tables
  • Product Catalog Hub: Single search point linking to all document types
  • S3 Integration: PDF files stored in AWS S3, metadata in PostgreSQL
  • Deduplication: SHA-256 checksums prevent duplicate storage
  • Version Tracking: Linked-list pattern for document revision history
  • Full-Text Search: PostgreSQL TSVECTOR with GIN indexes
  • Structured Data: JSONB for complex data, arrays for lists
  • Company Isolation: Many-to-many mapping tables for multi-tenancy
  • Scalability: Optimized indexes and partition-ready design

Architecture Overview

Four-Tier Document Management

┌─────────────────────────────────────────────────────────────┐
│ GLOBAL PRODUCT CATALOG │
│ (Master Searchable Repository - Shared) │
│ - Product identification (name, manufacturer, codes) │
│ - Multiple identifiers (CAS, UPC, GTIN, EPA Reg #) │
│ - PRIMARY SEARCH VECTOR (unified search) │
│ - Populated by: API validation, AI agent promotion │
└─────────────┬───────────────────────────────────────────────┘
│ (optional link for enrichment)

┌─────────────────────────────────────────────────────────────┐
│ COMPANY PRODUCT CATALOG (NEW) │
│ (Company-Specific Product Entries) │
│ - Company's own product naming and codes │
│ - Links to global catalog (optional) │
│ - SDS and EPA label attachments │
│ - User contributions stay here (not global) │
└─────────────┬───────────────────────────────────────────────┘

├──────────────┬──────────────┬─────────────────┐
▼ ▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────┐ ┌──────────┐
│ SDS Docs │ │ EPA Labels │ │ COA │ │Tech Sheet│
└─────────────┘ └──────────────┘ └──────┘ └──────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────────────┐
│ COMPANY MAPPINGS │
│ (Company-specific isolation & tracking) │
└──────────────────────────────────────────────────────────┘


┌──────────────────────────────────────────────────────────┐
│ INVENTORY ITEMS (Simplified) │
│ (Site/Location-specific inventory tracking) │
│ - Links to company product catalog │
│ - Container size, quantity, location only │
└──────────────────────────────────────────────────────────┘

User Workflow

When user adds inventory item (Barcode Search Flow):

  1. Check Company Product Catalog first - Search chemiq_company_product_catalog by barcode/product name (company's own products)
  2. If not found, check Global Product Catalog - Search chemiq_product_catalog by barcode/product name
  3. If found in global - Copy to company catalog, link to global_product_id
  4. If not found, query External APIs - Open Food Facts, UPC Database, Barcode Lookup
  5. If API returns data - Save to BOTH global catalog (api_validated) and company catalog
  6. If still not found - User enters manually → saves to company catalog ONLY (not global)
  7. Create inventory record linking to chemiq_company_product_catalog

Key Design Principle:

  • User-contributed products go to chemiq_company_product_catalog ONLY
  • Global chemiq_product_catalog is populated by:
    • External API validated products (high confidence)
    • AI Agent promotion (future): When 30%+ of companies have same product

Benefits:

  • Single search eliminates need to query multiple repositories
  • Product metadata stored once per company (no duplication across sites/locations)
  • Company isolation - each company has their own product catalog
  • Global enrichment - can link to global catalog for additional data
  • Data quality tracking - confidence scores and verification status
  • One product can have multiple document versions
  • Easy to add new document types (COA, tech sheets)

Part 1: Product Catalog System

Table 1: chemiq_product_catalog

Purpose: Master searchable product repository that serves as the central hub for all chemical products.

Key Features:

  • Stores master product identification across all companies
  • Multiple identifier types (product codes, CAS, UPC, GTIN, EPA reg #)
  • PRIMARY search vector for unified product search
  • Links to all document types via junction table
  • Tracks product lifecycle (active, discontinued, replacements)

Important Columns:

  • product_id (UUID, PK): Primary identifier
  • product_name (String(255), indexed): Product name
  • manufacturer (String(255), indexed): Manufacturer name
  • manufacturer_id (String(100)): Manufacturer's internal ID
  • brand_name (String(255)): Brand/trade name
  • product_codes (JSONB): Array of codes with types: [{"type": "manufacturer", "code": "ABC-123"}]
  • cas_numbers (Text[]): Primary CAS numbers
  • upc_barcodes (Text[]): Multiple barcodes
  • gtin (String(14)): Global Trade Item Number
  • product_category (String(100)): Chemical, Pesticide, etc.
  • product_type (String(100)): Insecticide, Herbicide, Cleaner, etc.
  • formulation_type (String(50)): Liquid, Powder, Granular, etc.
  • epa_reg_number (String(50), indexed): EPA registration number (for pesticides)
  • epa_product_type (String(100)): EPA product classification
  • is_restricted_use (Boolean, default=false): Restricted use pesticide flag
  • is_active (Boolean, indexed, default=true): Active product flag
  • is_discontinued (Boolean, default=false): Discontinued product flag
  • discontinued_date (Date): When product was discontinued
  • replacement_product_id (UUID, FK): Links to replacement product
  • search_vector (TSVECTOR): PRIMARY SEARCH VECTOR for all product searches
  • contributed_by_company_id (UUID, FK): Company that contributed this product
  • source_type (String(20)): manufacturer, user_contributed, epa_ppls
  • created_at, updated_at (Timestamp)

Indexes:

  • GIN index on search_vector (PRIMARY SEARCH INDEX)
  • GIN index on upc_barcodes (array search)
  • GIN index on cas_numbers (array search)
  • B-tree indexes on: product_name, manufacturer, epa_reg_number, is_active

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, brand_name, product_category, product_type);

Relationships:

  • Self-referencing: replacement_product (for discontinued products)
  • One-to-many: product_documents (links to SDSs, EPA labels, etc.)

Use Cases:

  • Unified Search: Single search across all products and identifiers
  • Barcode Scanning: Quick lookup by UPC/GTIN
  • Product Discovery: Find products by name, manufacturer, CAS, EPA reg #
  • Product Lifecycle: Track discontinued products and replacements
  • Multi-Company Contribution: Companies can add products to shared catalog

Table 2: chemiq_product_documents

Purpose: Polymorphic junction table linking products to various document types (SDSs, EPA labels, COAs, tech sheets).

Key Features:

  • Links one product to multiple documents
  • Supports multiple document types (polymorphic association)
  • Tracks document validity periods
  • Supports multiple languages and regions
  • CHECK constraints enforce document type integrity

Important Columns:

  • product_document_id (UUID, PK): Primary identifier
  • product_id (UUID, FK, indexed): Links to product catalog
  • document_type (String(20), required): sds, epa_label, coa, tech_sheet
  • sds_id (UUID, FK): Links to SDS (if document_type = 'sds')
  • label_id (UUID, FK): Links to EPA label (if document_type = 'epa_label')
  • is_primary (Boolean, default=true): Primary document flag
  • document_language (String(10), default='en'): Document language
  • document_region (String(10)): US, CA, EU, etc.
  • valid_from, valid_until (Date): Document validity period
  • is_current (Boolean, default=true): Current document flag
  • created_at (Timestamp)

CHECK Constraint:

(document_type = 'sds' AND sds_id IS NOT NULL AND label_id IS NULL) OR
(document_type = 'epa_label' AND label_id IS NOT NULL AND sds_id IS NULL) OR
(document_type IN ('coa', 'tech_sheet'))

Relationships:

  • Many-to-one: product (Product Catalog)
  • Many-to-one: sds_document (SDS Documents)
  • Many-to-one: epa_label (EPA Label Documents)

Use Cases:

  • Link products to multiple SDSs (different languages, regions)
  • Link products to EPA labels for pesticides
  • Track document versions and validity periods
  • Support multi-language document management

Table 3: chemiq_company_product_catalog (NEW)

Purpose: Company-specific product catalog that sits between the global product catalog and inventory. Eliminates product data duplication across sites/locations within a company.

Key Features:

  • Company-specific product entries (one record per product per company)
  • Optional link to global product catalog for enrichment
  • Stores company's own product naming and codes
  • SDS and EPA label attachments at product level (not inventory level)
  • Data quality tracking (source, confidence, verification status)
  • Full-text search via TSVECTOR
  • Unique constraint per company on (product_name, manufacturer)

Problem Solved: Previously, product data (name, manufacturer, SDS) was duplicated in chemiq_inventory for every site/location. Now:

  • Product info stored ONCE per company in chemiq_company_product_catalog
  • Inventory records only store site/location-specific data (quantity, container size)
  • SDS/EPA labels attached to product, not individual inventory items

Important Columns:

  • company_product_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed, required): Company ownership
  • global_product_id (UUID, FK): Optional link to global catalog for enrichment

Product Identification:

  • product_name (String(255), required): Product name (company's naming)
  • manufacturer (String(255), required): Manufacturer name
  • product_code (String(100)): Company's internal product code
  • cas_number (String(50)): Primary CAS number
  • barcode_upc (String(100), indexed): UPC/EAN barcode (optional)
  • barcode_source (String(20)): How barcode was obtained: scanned, label_extracted, manual, api_validated
  • internal_sku (String(100), indexed): Company's internal product code/SKU (separate from UPC barcode)

Product Classification:

  • product_category (String(100)): Chemical, Pesticide, Cleaning, etc.
  • product_type (String(100)): Insecticide, Herbicide, Solvent, etc.

Pesticide Identification (for EPA-regulated products):

  • is_pesticide (Boolean, default=false): Flag indicating product is EPA-regulated (insecticide, herbicide, fungicide, disinfectant, etc.)
  • epa_registration_number (String(50), indexed): EPA Registration Number (e.g., 1234-567 or 1234-567-8901)
  • signal_word (String(20)): Required warning level: DANGER, WARNING, or CAUTION

Document Links (company-specific SDS/Label selection):

  • current_sds_id (UUID, FK): Links to chemiq_sds_documents
  • sds_attached_at (Timestamp): When SDS was attached
  • sds_attached_by_user_id (UUID, FK): User who attached SDS
  • sds_missing (Boolean, default=true): SDS missing flag
  • current_epa_label_id (UUID, FK): Links to chemiq_epa_label_documents
  • epa_label_attached_at (Timestamp): When EPA label was attached
  • epa_label_attached_by_user_id (UUID, FK): User who attached EPA label

Data Quality Tracking:

  • data_source (String(50), default='user_contributed'): api_validated, user_contributed, label_extracted, admin_curated
  • confidence_score (Numeric(3,2), default=0.5): 0.0-1.0 confidence
  • verification_status (String(20), default='unverified'): unverified, verified, needs_review, flagged
  • last_verified_at (Timestamp): Last verification timestamp
  • source_image_id (UUID): Link to label image if extracted via Vision LLM

Classification & Tagging:

  • hazard_categories (JSONB, default='[]', required): Denormalized array of GHS hazard categories derived from SDS (e.g., ["Flammable", "Acute Toxicity", "Corrosive"]). GIN-indexed for fast filtering.

Audit Fields:

  • is_active (Boolean, default=true): Active flag
  • created_at, updated_at (Timestamp)
  • created_by_user_id, updated_by_user_id (UUID, FK)

Full-Text Search:

  • search_vector (TSVECTOR): Auto-updated search index

Indexes:

  • GIN index on search_vector
  • GIN index on hazard_categories (for JSONB contains queries)
  • B-tree indexes on: company_id, global_product_id, product_name, manufacturer, barcode_upc, internal_sku, epa_registration_number
  • Composite index on (company_id, sds_missing) WHERE sds_missing = true
  • Partial index on is_pesticide WHERE is_pesticide = true (for efficient pesticide product queries)
  • Unique constraint on (company_id, product_name, manufacturer)
  • Unique constraint on (company_id, barcode_upc) WHERE barcode_upc IS NOT NULL - prevents duplicate barcodes per company

TSVECTOR Trigger:

CREATE TRIGGER tsvector_company_product_trigger BEFORE INSERT OR UPDATE
ON chemiq_company_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, product_code, cas_number);

Relationships:

  • Many-to-one: company (Companies)
  • Many-to-one: global_product (Product Catalog) - optional enrichment link
  • Many-to-one: current_sds (SDS Documents)
  • Many-to-one: current_epa_label (EPA Label Documents)
  • One-to-many: inventory_items (Inventory)

Use Cases:

  • Store company-specific product catalog (no duplication across sites)
  • Link products to SDSs and EPA labels at company level
  • Track data quality and verification status
  • Support barcode scanning with company-specific products first
  • Enable fuzzy matching and duplicate detection within company
  • Identify pesticide products for EPA label search (via is_pesticide flag and epa_registration_number)
  • Track barcode source to distinguish trusted (scanned, API-validated) vs manual entries
  • Store company's internal SKU/reference codes separate from UPC barcodes
  • Future: AI agent promotion to global catalog when 30%+ companies have same product

Example Data Flow:

User adds "Clorox Wipes" at Site 1:
→ Check chemiq_company_product_catalog (not found)
→ Check chemiq_product_catalog (not found)
→ Query external APIs (found!)
→ Create record in chemiq_company_product_catalog (company_product_id = prod-1)
→ Create record in chemiq_inventory (links to prod-1, site-1)

User adds same product at Site 2:
→ Check chemiq_company_product_catalog (FOUND: prod-1)
→ Create record in chemiq_inventory (links to prod-1, site-2)
→ NO product duplication!

Table 4: chemiq_product_images

Purpose: Stores product label images uploaded during inventory entry. Supports multiple images per product for curved/multi-sided labels where product information is spread across different angles.

Key Features:

  • Polymorphic association - can link to either company product catalog OR global product catalog
  • Multi-image support (1-5 images per product)
  • S3 storage with presigned URL generation
  • SHA-256 checksums for deduplication
  • Tracks images used for AI vision extraction
  • Image metadata (dimensions, MIME type, original filename)

Important Columns:

  • image_id (UUID, PK): Primary identifier
  • company_product_id (UUID, FK, indexed): Links to company product catalog (polymorphic)
  • global_product_id (UUID, FK, indexed): Links to global product catalog (polymorphic)

Image Order:

  • image_order (Integer, default=1): Order for multi-image displays

S3 Storage:

  • s3_bucket (String(100), required): AWS S3 bucket
  • s3_key (Text, required): S3 object key
  • file_checksum (String(64), indexed, required): SHA-256 hash for deduplication
  • file_size (BigInteger): File size in bytes
  • mime_type (String(50), default='image/jpeg'): MIME type (image/jpeg, image/png)
  • original_filename (String(255)): Original uploaded filename

Image Metadata:

  • image_width (Integer): Image width in pixels
  • image_height (Integer): Image height in pixels
  • image_type (String(50), default='label'): Image type (label, product, container, etc.)

Extraction Metadata (for Vision LLM):

  • used_for_extraction (Boolean, default=false): Flag if image was used for AI label extraction
  • extraction_confidence (Numeric(3,2)): Extraction confidence score (0.00-1.00)

Audit Fields:

  • uploaded_by_company_id (UUID, FK): Company that uploaded the image
  • uploaded_by_user_id (UUID, FK): User who uploaded the image
  • created_at (Timestamp): Upload timestamp

CHECK Constraint (Polymorphic Enforcement):

(company_product_id IS NOT NULL AND global_product_id IS NULL) OR
(company_product_id IS NULL AND global_product_id IS NOT NULL)

This constraint ensures each image belongs to exactly one product type (company or global), never both or neither.

Indexes:

  • B-tree indexes on: company_product_id, global_product_id, file_checksum

Relationships:

  • Many-to-one: company_product (Company Product Catalog) - polymorphic
  • Many-to-one: global_product (Product Catalog) - polymorphic
  • Many-to-one: uploaded_by_company (Companies)
  • Many-to-one: uploaded_by_user (Users)

Use Cases:

  • Store product label images during inventory entry
  • Support multi-image uploads for curved/multi-sided labels
  • Track which images were used for AI vision extraction
  • Enable image display when viewing products
  • Deduplicate images across products using checksums
  • Generate presigned S3 URLs for secure frontend display

Example Data Flow:

User uploads 3 label images during product entry:
→ LabelExtractionService extracts product info via Vision LLM
→ ProductImageService uploads images to S3
→ 3 records created in chemiq_product_images (linked to company_product_id)
→ used_for_extraction = true, extraction_confidence = 0.85

User opens product later:
→ API returns product with images
→ Presigned S3 URLs generated (1 hour expiration)
→ Frontend displays all 3 label images

Part 2: EPA Label Repository

Table 5: chemiq_epa_label_documents

Purpose: Global repository of EPA pesticide/fungicide labels with comprehensive label data.

Key Features:

  • Global EPA label storage (shared across companies)
  • Complete EPA label metadata (active ingredients, target pests, application instructions)
  • S3 storage for label PDFs
  • Version tracking with linked-list pattern
  • Full-text search on label content
  • EPA PPLS API integration support

Important Columns:

  • label_id (UUID, PK): Primary identifier
  • epa_reg_number (String(50), indexed, required): EPA registration number
  • epa_est_number (String(50)): EPA establishment number
  • product_name (String(255), indexed, required): Product name
  • manufacturer (String(255), indexed, required): Manufacturer name
  • manufacturer_address (Text): Full address
  • emergency_phone (String(50)): Emergency contact number

Active Ingredients:

  • active_ingredients (JSONB): Structured array:
    [{"name": "Glyphosate", "cas": "1071-83-6", "percentage": 41.0}]
  • inert_ingredients_percentage (Numeric(5,2)): Percentage of inert ingredients

Label Classification:

  • signal_word (String(20), indexed): DANGER, WARNING, CAUTION
  • epa_product_type (String(100)): Insecticide, Herbicide, Fungicide, etc.
  • formulation_type (String(50)): Liquid, Granular, Wettable Powder, etc.

Hazard Information:

  • hazard_statements (Text[]): Array of hazard statements
  • precautionary_statements (Text[]): Array of precautionary statements
  • first_aid_measures (JSONB): Structured first aid instructions

Use Instructions:

  • target_pests (JSONB): Structured pest/application data:
    [{"pest": "ants", "application_rate": "1-2 oz per gallon"}]
  • application_sites (Text[]): [residential, commercial, agricultural]
  • application_methods (Text[]): [spray, bait, dust]
  • restricted_entry_interval (String(100)): REI in hours
  • preharvest_interval (String(100)): PHI in days

File Storage (S3):

  • s3_bucket (String(100), required): AWS S3 bucket
  • s3_key (Text, required): S3 object key
  • file_checksum (String(64), indexed, required): SHA-256 hash
  • file_size (BigInteger): File size in bytes
  • mime_type (String(50), default='application/pdf')

Parsing Metadata:

  • label_parsed (Boolean, default=false): Parsing complete flag
  • parse_confidence (Numeric(5,2)): Parse confidence score (0-100)
  • parse_errors (Text): Parse error messages

Version Tracking:

  • label_version (String(50)): Label version number
  • revision_date (Date, indexed, required): Label revision date
  • expiration_date (Date): Label expiration date
  • previous_version_id (UUID, FK): Links to older version
  • superseded_by_id (UUID, FK): Links to newer version
  • is_current (Boolean, indexed, default=true): Current version flag

Source Tracking:

  • source_type (String(20)): ppls_api, epa_website, manual_upload
  • source_url (Text): Original source URL
  • ppls_last_synced (Timestamp): Last EPA PPLS sync time
  • contributed_by_company_id (UUID, FK): Contributing company

Full-Text Search:

  • search_vector (TSVECTOR): Auto-updated search index

Timestamps:

  • created_at, updated_at (Timestamp)

Indexes:

  • GIN index on search_vector
  • GIN index on active_ingredients (JSONB)
  • GIN index on target_pests (JSONB)
  • Unique constraint on (epa_reg_number, revision_date)
  • B-tree indexes on: epa_reg_number, product_name, manufacturer, signal_word, revision_date, is_current

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_epa_label_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, epa_reg_number, epa_product_type);

Relationships:

  • Self-referencing: previous_version, superseded_by (version chain)
  • One-to-many: company_mappings, product_links, parse_jobs, change_log

Use Cases:

  • Store EPA pesticide labels for lawn care, pest control companies
  • Track active ingredients and target pests
  • Search by EPA registration number
  • Version tracking for label updates
  • EPA PPLS API integration for automatic updates
  • Company-specific label library management

Table 6: chemiq_company_label_mappings

Purpose: Many-to-many relationship between companies and EPA labels, enabling the global repository model.

Key Features:

  • Links companies to EPA labels they use
  • Company-specific review status and notes
  • Compliance tracking (certifications, training)
  • Usage statistics
  • Alert management

Important Columns:

  • mapping_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed): Company identifier
  • label_id (UUID, FK, indexed): EPA label identifier

Usage Tracking:

  • mapped_to_inventory_count (Integer, default=0): Number of inventory items using this label

Company-Specific Metadata:

  • internal_notes (Text): Company-specific notes
  • review_status (String(20), indexed, default='pending'): pending, reviewed, approved, flagged
  • reviewed_by_user_id (UUID, FK): User who reviewed
  • reviewed_at (Timestamp): Review timestamp

Compliance Tracking:

  • applicator_certification_required (Boolean, default=false): Certification requirement flag
  • restricted_use (Boolean, default=false): Restricted use flag
  • company_training_completed (Boolean, default=false): Training completion flag

Alerts:

  • expiry_alert_sent (Boolean, default=false): Alert sent flag
  • last_alert_sent_at (Timestamp): Last alert timestamp

Timestamps:

  • first_mapped_at (Timestamp): When company first linked this label
  • last_accessed_at (Timestamp): Last access time

Indexes:

  • Unique constraint on (company_id, label_id)
  • B-tree index on review_status

Relationships:

  • Many-to-one: label_document (EPA Labels)
  • Many-to-one: reviewed_by_user (Users)

Use Cases:

  • Company-specific label libraries
  • Track which companies use which labels
  • Company review workflows
  • Compliance tracking for restricted use products
  • Usage analytics
  • Expiry alert management

Table 7: chemiq_label_parse_queue

Purpose: Job queue for asynchronous EPA label parsing operations.

Key Features:

  • Priority-based queue (1-10 scale)
  • Retry mechanism
  • Section-specific parsing
  • Error tracking

Important Columns:

  • job_id (UUID, PK): Job identifier
  • label_id (UUID, FK, indexed): Label to parse
  • job_status (String(20), indexed, default='pending'): pending, processing, completed, failed
  • priority (Integer, default=5): 1-10, higher = more urgent
  • parse_sections (Text[]): Sections to parse (active_ingredients, target_pests, etc.)
  • started_at, completed_at (Timestamp): Job timing
  • error_message (Text): Failure details
  • retry_count (Integer, default=0): Number of retry attempts
  • created_at (Timestamp)

Indexes:

  • Composite index on (job_status, priority DESC)

Relationships:

  • Many-to-one: label_document (EPA Labels)

Use Cases:

  • Background processing of uploaded labels
  • Retry failed parsing jobs
  • Prioritize parsing for high-priority products
  • Monitor parsing performance

Table 8: chemiq_label_change_log

Purpose: Audit trail for all EPA label document changes.

Key Features:

  • Complete audit history
  • Tracks user and company making changes
  • Stores diffs of changes
  • Change type categorization

Important Columns:

  • log_id (UUID, PK): Log entry identifier
  • label_id (UUID, FK, indexed): Label document
  • change_type (String(50)): created, updated, superseded, expired
  • changed_by_user_id (UUID, FK): User who made change
  • changed_by_company_id (UUID, FK): Company context
  • changes (JSONB): Diff of changes made
  • previous_values (JSONB): Previous state
  • created_at (Timestamp, indexed): When change occurred

Relationships:

  • Many-to-one: label_document (EPA Labels)
  • Many-to-one: changed_by_user (Users)

Use Cases:

  • Compliance audit trails
  • Track label updates over time
  • Investigate data quality issues
  • Generate change reports for regulators

Part 3: SDS Repository System

Table 9: chemiq_sds_documents

Purpose: Global repository of SDS documents with core metadata and S3 storage references.

Key Features:

  • Stores product identification (Section 1)
  • S3 bucket and key for PDF storage
  • SHA-256 checksum for deduplication
  • Full-text search via search_vector (TSVECTOR)
  • Version tracking with previous_version_id and superseded_by_id
  • Unique constraint on (manufacturer, product_name, revision_date)

Important Columns:

  • sds_id (UUID, PK): Primary identifier
  • product_name (String(255), indexed): Chemical product name
  • manufacturer (String(255), indexed): Manufacturer name
  • product_code (String(100)): Manufacturer's product code
  • manufacturer_address (Text): Full address
  • emergency_phone (String(50)): Emergency contact number

Document Metadata:

  • revision_date (Date, indexed, required): SDS revision date
  • revision_number (String(50)): Revision version number
  • document_language (String(10), default='en'): Document language

File Storage (S3):

  • s3_bucket (String(100), required): AWS S3 bucket
  • s3_key (Text, required): S3 object key
  • file_checksum (String(64), indexed, required): SHA-256 hash
  • file_size (BigInteger): File size in bytes
  • mime_type (String(50), default='application/pdf')

Parsing Metadata:

  • sds_parsed (Boolean, default=false): Parsing complete flag
  • parse_confidence (Numeric(5,2)): Parse confidence score
  • parse_errors (Text): Parse error messages (JSON array)
  • parsed_json (JSONB): Complete LLM-parsed SDS response for reference and debugging

Version Tracking:

  • previous_version_id (UUID, FK): Links to older version
  • superseded_by_id (UUID, FK): Links to newer version
  • is_current (Boolean, indexed, default=true): Current version flag

Source Tracking:

  • source_type (String(20)): web_search, manual_upload, manufacturer_api, ppls
  • source_url (Text): Original source URL
  • contributed_by_company_id (UUID, FK): Contributing company

Full-Text Search:

  • search_vector (TSVECTOR): Auto-updated search index

Timestamps:

  • created_at, updated_at (Timestamp)

Indexes:

  • GIN index on search_vector
  • Unique constraint on (manufacturer, product_name, revision_date)
  • B-tree indexes on: product_name, manufacturer, revision_date, file_checksum, is_current

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_sds_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
product_name, manufacturer, product_code);

Relationships:

  • Self-referencing: previous_version, superseded_by (version chain)
  • One-to-one: hazard_info
  • One-to-many: composition, sections, company_mappings, parse_jobs, change_log

Use Cases:

  • Global SDS storage shared across companies
  • Version tracking for SDS updates
  • Deduplication via checksums
  • Full-text search across all SDSs
  • S3 integration for PDF storage

Table 10: chemiq_sds_hazard_info

Purpose: Stores GHS (Global Harmonized System) hazard classification data from SDS Section 2.

Key Features:

  • GHS pictograms and signal words
  • H-codes (Hazard statements) as array
  • P-codes (Precautionary statements) as array
  • JSONB for structured hazard classes
  • GIN indexes for fast array and JSONB searches

Important Columns:

  • hazard_id (UUID, PK): Primary identifier
  • sds_id (UUID, FK, indexed): Links to SDS document
  • signal_word (String(20), indexed): "Danger" or "Warning"
  • pictograms (JSONB): Array of GHS pictogram codes (e.g., ["GHS02", "GHS07"])
  • hazard_classes (JSONB): Structured hazard data:
    [{"class": "Flammable liquids", "category": "2", "code": "H225"}]
  • hazard_statements (Text[]): Array of H-codes (e.g., ["H225", "H319"])
  • precautionary_statements (Text[]): Array of P-codes (e.g., ["P210", "P233"])
  • supplemental_hazards (Text): Additional hazard information
  • hazard_description_vector (TSVECTOR): Full-text search on hazard descriptions
  • created_at (Timestamp)

Indexes:

  • GIN indexes on pictograms, hazard_classes, hazard_statements
  • B-tree index on signal_word

Relationships:

  • One-to-one: sds_document (SDS Documents)

Use Cases:

  • Search for chemicals by hazard class (flammable, corrosive, toxic)
  • Find all SDSs with specific pictograms (e.g., flame, skull)
  • Query by H-codes or P-codes
  • Generate hazard summaries for locations

Table 11: chemiq_sds_composition

Purpose: Stores chemical composition and ingredients from SDS Section 3.

Key Features:

  • Individual ingredient records with CAS numbers
  • Concentration ranges and exact values
  • Hazardous ingredient flagging
  • Support for trade secrets

Important Columns:

  • composition_id (UUID, PK): Primary identifier
  • sds_id (UUID, FK, indexed): Links to SDS document
  • chemical_name (String(255), required): Ingredient name
  • cas_number (String(50), indexed): Chemical Abstracts Service number
  • ec_number, reach_number (String(50)): European chemical identifiers
  • concentration_min, concentration_max (Numeric(5,2)): Concentration range
  • concentration_exact (Numeric(5,2)): Exact concentration
  • concentration_text (String(100)): Text representation (e.g., "< 1%")
  • is_trade_secret (Boolean, default=false): Protected ingredient flag
  • is_hazardous (Boolean, indexed, default=false): Hazardous ingredient flag
  • ingredient_hazards (Text[]): H-codes specific to this ingredient
  • created_at (Timestamp)

Indexes:

  • B-tree index on cas_number
  • B-tree index on is_hazardous

Relationships:

  • Many-to-one: sds_document (SDS Documents)

Use Cases:

  • Search for products containing specific chemicals (by CAS)
  • Find all hazardous ingredients across inventory
  • Calculate total hazardous chemical quantities
  • Generate ingredient disclosure reports

Table 12: chemiq_sds_sections

Purpose: Stores SDS sections 4-16 with both structured (JSONB) and raw text content.

Key Features:

  • Sections 4-16: First Aid, Fire-Fighting, Accidental Release, Handling & Storage, Exposure Controls, Physical/Chemical Properties, Stability, Toxicological Info, Ecological Info, Disposal, Transport, Regulatory, Other
  • JSONB for structured data extraction
  • Full-text search on section content
  • Check constraint: section_number between 1 and 16

Important Columns:

  • section_id (UUID, PK): Primary identifier
  • sds_id (UUID, FK, indexed): Links to SDS document
  • section_number (Integer, indexed, required): Section number (1-16)
  • section_title (String(255), required): Section title
  • section_data (JSONB): Structured parsed data
  • section_text (Text): Raw text content
  • section_text_vector (TSVECTOR): Full-text search index
  • parsed_at (Timestamp): When section was parsed

CHECK Constraint:

section_number >= 1 AND section_number <= 16

Indexes:

  • GIN indexes on section_data and section_text_vector
  • B-tree index on section_number
  • Unique constraint on (sds_id, section_number)

Relationships:

  • Many-to-one: sds_document (SDS Documents)

Use Cases:

  • Search for first aid measures across all SDSs
  • Query storage requirements by keyword
  • Extract disposal information for waste management
  • Generate emergency response guides

Table 13: chemiq_company_sds_mappings

Purpose: Many-to-many relationship between companies and SDSs, enabling the global repository model.

Key Features:

  • Tracks which companies use which SDSs
  • Company-specific review status
  • Usage statistics
  • Internal notes per company

Important Columns:

  • mapping_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed): Company identifier
  • sds_id (UUID, FK, indexed): SDS document identifier
  • mapped_to_inventory_count (Integer, default=0): Number of inventory items using this SDS
  • internal_notes (Text): Company-specific notes
  • review_status (String(20), indexed, default='pending'): pending, reviewed, approved, flagged
  • reviewed_by_user_id (UUID, FK): User who reviewed
  • reviewed_at (Timestamp): Review timestamp
  • expiry_alert_sent (Boolean, default=false): Alert tracking
  • last_alert_sent_at (Timestamp): Last alert timestamp
  • first_mapped_at (Timestamp): When company first linked this SDS
  • last_accessed_at (Timestamp): Last access time

Indexes:

  • Unique constraint on (company_id, sds_id)
  • B-tree index on review_status

Relationships:

  • Many-to-one: sds_document (SDS Documents)
  • Many-to-one: reviewed_by_user (Users)

Use Cases:

  • Isolate company data while sharing SDS repository
  • Track company-specific review workflows
  • Monitor SDS usage across companies
  • Send expiry alerts per company

Table 14: chemiq_sds_parse_queue

Purpose: Job queue for asynchronous SDS parsing operations.

Key Features:

  • Priority-based queue (1-10 scale)
  • Retry mechanism
  • Section-specific parsing
  • Error tracking

Important Columns:

  • job_id (UUID, PK): Job identifier
  • sds_id (UUID, FK, indexed): SDS to parse
  • job_status (String(20), indexed, default='pending'): pending, processing, completed, failed
  • priority (Integer, default=5): 1-10, higher = more urgent
  • parse_sections (Integer[], default=[1-16]): Which sections to parse
  • started_at, completed_at (Timestamp): Job timing
  • error_message (Text): Failure details
  • retry_count (Integer, default=0): Number of retry attempts
  • created_at (Timestamp)

Indexes:

  • Composite index on (job_status, priority DESC)

Relationships:

  • Many-to-one: sds_document (SDS Documents)

Use Cases:

  • Background processing of uploaded SDSs
  • Retry failed parsing jobs
  • Prioritize parsing for high-priority chemicals
  • Monitor parsing performance

Table 15: chemiq_sds_change_log

Purpose: Audit trail for all SDS document changes.

Key Features:

  • Complete audit history
  • Tracks user and company making changes
  • Stores diffs of changes
  • Change type categorization

Important Columns:

  • log_id (UUID, PK): Log entry identifier
  • sds_id (UUID, FK, indexed): SDS document
  • change_type (String(50)): created, updated, superseded, deprecated
  • changed_by_user_id (UUID, FK): User who made change
  • changed_by_company_id (UUID, FK): Company context
  • changes (JSONB): Diff of changes made
  • previous_values (JSONB): Previous state
  • created_at (Timestamp, indexed): When change occurred

Relationships:

  • Many-to-one: sds_document (SDS Documents)
  • Many-to-one: changed_by_user (Users)

Use Cases:

  • Compliance audit trails
  • Track SDS updates over time
  • Investigate data quality issues
  • Generate change reports for regulators

Table 16: chemiq_sds_web_search_cache

Purpose: Caches Perplexity API web search results for SDS documents to reduce API costs and improve response times.

Key Features:

  • Company-scoped cache entries
  • 7-day TTL for cache entries
  • Normalized search criteria for matching
  • Tracks cache hits and access patterns
  • Links to imported SDS when search results are used

Important Columns:

  • cache_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed, required): Company that performed the search

Search Criteria (normalized for matching):

  • product_name_normalized (String(255), required): Normalized product name (lowercase, common suffixes removed)
  • manufacturer_normalized (String(255), required): Normalized manufacturer name
  • barcode_upc (String(100)): Optional barcode for more specific matching
  • search_type (String(20), default='sds', required): Type of search: sds, epa_label

Search Metadata:

  • search_query_used (Text, required): The actual query sent to Perplexity API
  • searched_at (Timestamp with TZ, required): When the search was performed

Cached Results:

  • results (JSONB, required): Array of WebSDSResult objects:
    [
    {
    "title": "Product SDS - Manufacturer",
    "url": "https://example.com/sds.pdf",
    "snippet": "Safety Data Sheet for...",
    "source_domain": "example.com",
    "date": "2024-01-15",
    "is_pdf": true,
    "confidence_score": 0.85
    }
    ]
  • result_count (Integer, default=0): Number of results cached

Tracking:

  • created_by_user_id (UUID, FK): User who initiated the search
  • times_accessed (Integer, default=1): Number of times cache was accessed
  • last_accessed_at (Timestamp with TZ, required): Last access time

Cleanup Tracking:

  • sds_imported (Boolean, default=false): Whether an SDS was imported from these results
  • imported_sds_id (UUID, FK): Links to the imported SDS document

Indexes:

  • B-tree index on company_id
  • B-tree index on searched_at
  • B-tree index on sds_imported
  • Unique constraint on (company_id, product_name_normalized, manufacturer_normalized)

Relationships:

  • Many-to-one: company (Companies)
  • Many-to-one: created_by_user (Users)
  • Many-to-one: imported_sds (SDS Documents)

Use Cases:

  • Cache Perplexity API search results to avoid repeated API calls
  • Share cache within company (same product searches return cached results)
  • Track which searches led to SDS imports
  • Enable cache cleanup after successful imports
  • Monitor API usage patterns

Cache Strategy:

  1. User searches for SDS on web
  2. Check cache for matching (company_id, product_name_normalized, manufacturer_normalized)
  3. If cache hit and not expired (< 7 days) and not imported, return cached results
  4. If cache miss, call Perplexity API and cache results
  5. When user imports SDS from results, mark cache as imported

Table 17: chemiq_sds_bulk_upload_jobs

Purpose: Tracks bulk upload jobs for ZIP files containing multiple SDS PDFs, enabling background processing of large-scale SDS imports.

Key Features:

  • Asynchronous processing for large uploads (500+ files)
  • Job status tracking (pending, extracting, processing, completed, failed)
  • Progress metrics (total files, processed, successful, failed, duplicates)
  • Detailed per-file results stored as JSONB
  • S3 storage for uploaded ZIP files
  • Company and user association for audit

Important Columns:

  • job_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed, required): Company that initiated the upload
  • uploaded_by_user_id (UUID, FK): User who uploaded the ZIP

File Information:

  • s3_bucket (String(100), required): S3 bucket for uploaded ZIP
  • s3_key (Text, required): S3 object key
  • file_name (String(255), required): Original ZIP filename
  • file_size (BigInteger): File size in bytes

Job Status:

  • job_status (String(20), indexed, default='pending'): pending, extracting, processing, completed, failed

Progress Tracking:

  • total_files (Integer, default=0): Total PDFs found in ZIP
  • processed_files (Integer, default=0): Number processed so far
  • successful_uploads (Integer, default=0): Successful uploads
  • failed_uploads (Integer, default=0): Failed uploads
  • duplicates_found (Integer, default=0): Duplicate files skipped

Results:

  • results (JSONB): Detailed per-file results array:
    [
    {"filename": "product_a.pdf", "success": true, "sds_id": "uuid", "is_duplicate": false},
    {"filename": "product_b.pdf", "success": false, "error_message": "Invalid PDF", "is_duplicate": false}
    ]
  • error_message (Text): Job-level error message if failed

Timestamps:

  • created_at (Timestamp, required): When job was created
  • started_at (Timestamp): When processing started
  • completed_at (Timestamp): When processing completed

Indexes:

  • B-tree index on company_id
  • B-tree index on job_status

Relationships:

  • Many-to-one: company (Companies)
  • Many-to-one: uploaded_by_user (Users)

Use Cases:

  • Large-scale SDS library migrations (500+ documents)
  • Background processing that survives browser closure
  • Progress tracking and status polling from frontend
  • Detailed audit trail of bulk import operations
  • Error tracking for failed file processing

Processing Flow:

  1. User uploads ZIP file via /bulk-upload-zip endpoint
  2. ZIP is uploaded to S3, job record created with status='pending'
  3. Background worker picks up pending jobs
  4. Worker extracts ZIP, processes each PDF
  5. Progress updated in real-time (processed_files, successful_uploads, etc.)
  6. Frontend polls /bulk-upload-status/{job_id} every 5 seconds
  7. Job marked completed or failed when done

Example Job Status Response:

{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"job_status": "processing",
"file_name": "sds_library_2024.zip",
"total_files": 500,
"processed_files": 250,
"successful_uploads": 240,
"failed_uploads": 5,
"duplicates_found": 5
}

Part 4: Inventory Integration

Table 18: chemiq_inventory (Simplified)

Purpose: Site/location-specific chemical inventory tracking. Now simplified to only store inventory-specific data, with product information moved to chemiq_company_product_catalog.

Schema Change Summary: After the company product catalog decomposition, this table is significantly simplified:

  • REMOVED: product_name, manufacturer, product_code, cas_number, barcode_upc (moved to company product catalog)
  • REMOVED: current_sds_id, sds_missing, sds_attached_at, sds_attached_by_user_id (moved to company product catalog)
  • REMOVED: current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id (moved to company product catalog)
  • REMOVED: product_catalog_id (replaced by company_product_id)
  • ADDED: company_product_id (links to chemiq_company_product_catalog)

Key Columns (Simplified Structure):

  • chemical_id (UUID, PK): Primary identifier
  • company_product_id (UUID, FK, indexed, required): NEW - Links to chemiq_company_product_catalog
  • company_id (UUID, FK, indexed): Company identifier
  • site_id (UUID, FK, indexed): Site identifier
  • location_id (UUID, FK): Location identifier (optional)

Inventory-Specific Data (retained):

  • container_size (Numeric): Container size value
  • size_unit (String(20)): Unit of measure (gal, oz, L, kg, etc.)
  • quantity (Integer, default=1): Number of containers at this location

Note: batch_lot_number and expiration_date are stored in chemiq_inventory_attributes table, not directly in this table.

Audit Fields:

  • is_active (Boolean, default=true): Active flag
  • created_at, updated_at (Timestamp)
  • created_by_user_id, updated_by_user_id (UUID, FK)

Indexes:

  • B-tree indexes on: company_product_id, company_id, site_id, location_id
  • Composite index on (company_id, site_id, is_active)
  • Unique constraint on (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true - prevents duplicate products at same location

Relationships:

  • Many-to-one: company_product (Company Product Catalog) - PRIMARY LINK
  • Many-to-one: company, site, location
  • Many-to-one: created_by_user, updated_by_user
  • One-to-many: attributes, reviews_pending, label_images

Benefits of Simplified Structure:

Before (Denormalized)After (Normalized)
Product data duplicated per site/locationProduct data stored once per company
SDS attached per inventory itemSDS attached to product (shared across sites)
15+ columns per inventory record~8 columns per inventory record
Inconsistent naming across sitesSingle canonical name per product per company

Example:

Before (denormalized):
| chemical_id | site_id | product_name | manufacturer | current_sds_id | quantity |
|-------------|---------|----------------|--------------|----------------|----------|
| inv-1 | site-1 | Clorox Wipes | Clorox | sds-123 | 5 |
| inv-2 | site-2 | Clorox Wipes | Clorox | sds-123 | 10 | ← Duplicate!

After (normalized):
chemiq_company_product_catalog:
| company_product_id | product_name | manufacturer | current_sds_id |
|-------------------|---------------|--------------|----------------|
| prod-1 | Clorox Wipes | Clorox | sds-123 |

chemiq_inventory:
| chemical_id | company_product_id | site_id | quantity |
|-------------|-------------------|---------|----------|
| inv-1 | prod-1 | site-1 | 5 |
| inv-2 | prod-1 | site-2 | 10 | ← No duplication!

Use Cases:

  • Track inventory quantities per site/location
  • Link to company product catalog for product details
  • Support multi-site inventory management
  • Container and quantity tracking only
  • Simplified data model with no product duplication

Table 18b: chemiq_inventory_attributes

Purpose: Stores extended inventory item attributes such as batch/lot numbers, expiration dates, manufacturing dates, and storage conditions. Separated from the main inventory table to keep the core inventory record lightweight.

Key Features:

  • One-to-one relationship with inventory items
  • Batch/lot tracking for traceability
  • Expiration and manufacture date tracking
  • Storage condition notes
  • Cascading delete with parent inventory record

Important Columns:

ColumnTypeConstraintsDescription
attribute_idUUIDPKPrimary identifier
chemical_idUUIDFK, indexed, requiredLinks to chemiq_inventory
batch_lot_numberString(100)Batch or lot number
expiration_dateDateProduct expiration date
manufacture_dateDateProduct manufacture date
storage_conditionsTextStorage condition notes
notesTextGeneral notes
created_atTimestamprequired, default=nowCreation time

Indexes:

  • B-tree index on chemical_id

Relationships:

  • Many-to-one: chemical → ChemIQInventory (CASCADE delete)

Use Cases:

  • Track batch/lot numbers for product recalls
  • Monitor expiration dates for shelf life management
  • Record storage conditions for compliance
  • Support traceability requirements

Table 18c: chemiq_sds_reviews_pending

Purpose: Tracks inventory items that need SDS review — typically created when chemicals are added without an SDS attached, or when bulk imports identify gaps.

Key Features:

  • Tracks SDS review status per inventory item
  • Company and site scoped
  • Assignable to specific users for resolution
  • Status workflow: pending → resolved
  • Cascading delete with parent inventory, company, and site records

Important Columns:

ColumnTypeConstraintsDescription
review_idUUIDPKPrimary identifier
chemical_idUUIDFK, requiredLinks to chemiq_inventory
company_idUUIDFK, requiredCompany scope
site_idUUIDFK, indexed, requiredSite scope
review_reasonString(50)requiredReason for review (e.g., missing_sds, expired_sds)
statusString(20)required, default='pending'pending, resolved
assigned_to_user_idUUIDFKUser assigned to resolve
created_atTimestamprequired, default=nowCreation time
resolved_atTimestampWhen review was resolved

Indexes:

  • B-tree index on site_id
  • B-tree index on status

Relationships:

  • Many-to-one: chemical → ChemIQInventory (CASCADE delete)
  • Many-to-one: company → Company (CASCADE delete)
  • Many-to-one: site → CompanySite (CASCADE delete)
  • Many-to-one: assigned_to_user → User (SET NULL on delete)

Use Cases:

  • Track missing SDS documents after bulk imports
  • Create review tasks for safety coordinators
  • Monitor SDS compliance gaps per site
  • Generate "Missing SDS" reports for Sentinel alerts
  • Support fix-and-retry workflow for bulk imports

Search and Performance Optimization

Full-Text Search Strategy

Primary Search Flow (Updated for Company Product Catalog):

  1. User searches for product by name, manufacturer, barcode, CAS, EPA reg #
  2. First: Query chemiq_company_product_catalog.search_vector (company's own products)
  3. If not found: Query chemiq_product_catalog.search_vector (global catalog)
  4. If not found: Query external APIs (Open Food Facts, UPC Database)
  5. Product Catalog returns products with links to all documents
  6. Optional: Drill down to SDS/Label search if needed

TSVECTOR Columns with GIN Indexes:

  1. chemiq_company_product_catalog.search_vector - COMPANY SEARCH (product name, manufacturer, product code, CAS)
  2. chemiq_product_catalog.search_vector - GLOBAL SEARCH (product name, manufacturer, brand, category)
  3. chemiq_epa_label_documents.search_vector - Label content
  4. chemiq_sds_documents.search_vector - SDS metadata
  5. chemiq_sds_hazard_info.hazard_description_vector - Hazard descriptions
  6. chemiq_sds_sections.section_text_vector - Section content

Auto-Update Triggers:

  • All search_vector columns have triggers for automatic updates on INSERT/UPDATE

GIN Indexes for Fast Queries:

  1. chemiq_product_catalog.upc_barcodes (Text[])
  2. chemiq_product_catalog.cas_numbers (Text[])
  3. chemiq_epa_label_documents.active_ingredients (JSONB)
  4. chemiq_epa_label_documents.target_pests (JSONB)
  5. chemiq_sds_hazard_info.pictograms (JSONB)
  6. chemiq_sds_hazard_info.hazard_classes (JSONB)
  7. chemiq_sds_hazard_info.hazard_statements (Text[])
  8. chemiq_sds_sections.section_data (JSONB)

Deduplication Strategy

Product Catalog:

  • No strict uniqueness (multiple sources can contribute same product)
  • Merge/deduplication handled at application level

SDS Documents:

  • Primary: SHA-256 checksum (file_checksum)
  • Secondary: Unique constraint on (manufacturer, product_name, revision_date)

EPA Label Documents:

  • Primary: SHA-256 checksum (file_checksum)
  • Secondary: Unique constraint on (epa_reg_number, revision_date)

Version Tracking

Linked-List Pattern

All document types (Product Catalog, SDS, EPA Labels) use the same version tracking pattern:

Fields:

  • previous_version_id: Links to older version
  • superseded_by_id: Links to newer version (or NULL if current)
  • is_current: Boolean flag for active version

Example Version Chain:

Doc v1.0 (2020) <- previous_version_id <- Doc v2.0 (2022) <- previous_version_id <- Doc v3.0 (2024)
| | |
superseded_by_id -> superseded_by_id -> is_current = true

Replacement Products (Product Catalog only):

  • replacement_product_id: Links to replacement product for discontinued items

Use Cases:

  • Track document revision history
  • Compare changes between versions
  • Alert companies when new versions available
  • Maintain compliance with version requirements
  • Handle product discontinuation and replacements

Unified Search Architecture

Example Search Flow

User searches: "Roundup Pro Max"

-- Step 1: Search Product Catalog (PRIMARY SEARCH)
SELECT
p.product_id,
p.product_name,
p.manufacturer,
p.epa_reg_number,
p.cas_numbers,
p.upc_barcodes
FROM chemiq_product_catalog p
WHERE p.search_vector @@ to_tsquery('english', 'roundup & pro & max')
AND p.is_active = true
ORDER BY ts_rank(p.search_vector, to_tsquery('english', 'roundup & pro & max')) DESC
LIMIT 10;

-- Step 2: Get all documents for selected product
SELECT
pd.document_type,
pd.is_primary,
pd.document_language,
pd.document_region,
s.product_name AS sds_product_name,
s.revision_date AS sds_revision_date,
l.product_name AS label_product_name,
l.epa_reg_number,
l.revision_date AS label_revision_date
FROM chemiq_product_documents pd
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE pd.product_id = :product_id
AND pd.is_current = true;

-- Step 3: Check if company already has access
SELECT * FROM chemiq_company_sds_mappings
WHERE company_id = :company_id AND sds_id = :sds_id;

SELECT * FROM chemiq_company_label_mappings
WHERE company_id = :company_id AND label_id = :label_id;

-- Step 4: Create mappings if needed (application logic)
-- Step 5: Link to inventory item

Database Migrations

Migration 1: SDS Global Schema

File: 2025_12_03_1826-16a8bffcb66e_redesign_sds_global_schema.py

Tables Created:

  1. chemiq_sds_documents
  2. chemiq_sds_hazard_info
  3. chemiq_sds_composition
  4. chemiq_sds_sections
  5. chemiq_company_sds_mappings
  6. chemiq_sds_parse_queue
  7. chemiq_sds_change_log

Updates: chemiq_inventory (added current_sds_id, sds_attached_at, sds_attached_by_user_id)

Migration 2: Product Catalog and EPA Labels

File: 2025_12_03_2346-e15d8abd9d92_add_product_catalog_and_epa_labels.py

Tables Created:

  1. chemiq_product_catalog
  2. chemiq_epa_label_documents
  3. chemiq_product_documents
  4. chemiq_company_label_mappings
  5. chemiq_label_parse_queue
  6. chemiq_label_change_log

Updates: chemiq_inventory (added product_catalog_id, current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id)

Migration 3: Product Images

File: 2025_12_17_2333-9a5f77cdd3a6_add_chemiq_product_images_table.py

Tables Created:

  1. chemiq_product_images

Key Features:

  • Polymorphic association to company or global product catalog
  • S3 storage for product label images
  • Multi-image support for curved/multi-sided labels
  • Tracks images used for Vision LLM extraction

Migration 4: Unique Constraints for Barcode and Inventory Location

File: 2025_12_25_1545-b7aa5678347e_add_unique_constraints_for_barcode_and_.py

Constraints Added:

  1. uq_company_product_barcode on chemiq_company_product_catalog (company_id, barcode_upc) WHERE barcode_upc IS NOT NULL
  2. uq_inventory_product_location on chemiq_inventory (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true

Key Features:

  • Prevents duplicate barcodes per company in product catalog
  • Prevents same product appearing twice at same location in inventory
  • Includes data cleanup logic to merge existing duplicates before applying constraints

Migration 5: Add search_type to SDS Web Search Cache

File: 2026_01_31_2019-8d1fcec31661_add_search_type_to_sds_web_search_cache.py

Changes:

  • Added search_type column (String(20), default='sds') to chemiq_sds_web_search_cache

Migration 6: Classification & Tagging

File: 2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py

Tables Created:

  1. chemiq_company_tags - company-scoped custom tags
  2. chemiq_product_tags - product-to-tag junction
  3. chemiq_ai_hazard_summaries - AI hazard summaries

Column Added:

  • hazard_categories (JSONB, default='[]') on chemiq_company_product_catalog with GIN index

SQLAlchemy Models

SDS Models

Location: tellus-ehs-hazcom-service/app/db/models/chemiq_sds.py

Models:

  1. SDSDocument - Main SDS model (includes parsed_json JSONB column for LLM responses)
  2. SDSHazardInfo - Hazard information
  3. SDSComposition - Chemical ingredients
  4. SDSSection - SDS sections 4-16
  5. CompanySDSMapping - Company relationships
  6. SDSParseJob - Parse queue
  7. SDSChangeLog - Audit trail
  8. SDSWebSearchCache - Perplexity API search result cache
  9. SDSBulkUploadJob - ZIP bulk upload job tracking

Product Catalog & EPA Label Models

Location: tellus-ehs-hazcom-service/app/db/models/chemiq_product_catalog.py

Models:

  1. ProductCatalog - Master product repository
  2. CompanyProductCatalog - Company-specific product catalog
  3. ProductImage - Product label images (S3 storage, polymorphic to company/global products)
  4. EPALabelDocument - EPA label repository
  5. ProductDocument - Junction table (polymorphic)
  6. CompanyLabelMapping - Company-label relationships
  7. LabelParseJob - Parse queue
  8. LabelChangeLog - Audit trail

Key Relationships:

ProductCatalog
├── product_documents (one-to-many) → ProductDocument
│ ├── sds_document (many-to-one) → SDSDocument
│ └── epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── replacement_product (self-referencing)

CompanyProductCatalog
├── current_sds (many-to-one) → SDSDocument
├── current_epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── global_product (many-to-one) → ProductCatalog
├── inventory_items (one-to-many) → ChemIQInventory
├── product_tags (one-to-many) → ProductTag
├── ai_hazard_summary (one-to-one) → AIHazardSummary

ProductImage
├── company_product (many-to-one) → CompanyProductCatalog (polymorphic)
├── global_product (many-to-one) → ProductCatalog (polymorphic)
├── uploaded_by_company (many-to-one) → Company
├── uploaded_by_user (many-to-one) → User

EPALabelDocument
├── company_mappings (one-to-many) → CompanyLabelMapping
├── product_links (one-to-many) → ProductDocument
├── parse_jobs (one-to-many) → LabelParseJob
├── change_log (one-to-many) → LabelChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)

SDSDocument
├── hazard_info (one-to-one) → SDSHazardInfo
├── composition (one-to-many) → SDSComposition
├── sections (one-to-many) → SDSSection
├── company_mappings (one-to-many) → CompanySDSMapping
├── parse_jobs (one-to-many) → SDSParseJob
├── change_log (one-to-many) → SDSChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)

Example Queries

Search product by barcode

SELECT * FROM chemiq_product_catalog
WHERE '885126012345' = ANY(upc_barcodes);

Find product and all its documents

SELECT
p.product_name,
p.manufacturer,
pd.document_type,
COALESCE(s.product_name, l.product_name) AS doc_product_name,
COALESCE(s.revision_date, l.revision_date) AS doc_revision_date
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE p.product_id = :product_id
AND pd.is_current = true;

Search EPA labels by target pest

SELECT * FROM chemiq_epa_label_documents
WHERE target_pests @> '[{"pest": "ants"}]'::jsonb
AND is_current = true;

Find all products containing a specific chemical (CAS)

SELECT
p.product_name,
p.manufacturer,
c.chemical_name,
c.cas_number,
c.concentration_max
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
JOIN chemiq_sds_composition c ON s.sds_id = c.sds_id
WHERE c.cas_number = '67-64-1' -- Acetone
AND p.is_active = true
AND s.is_current = true;

Get company's label library with usage stats

SELECT
l.epa_reg_number,
l.product_name,
l.manufacturer,
l.signal_word,
m.mapped_to_inventory_count,
m.review_status,
m.restricted_use
FROM chemiq_epa_label_documents l
JOIN chemiq_company_label_mappings m ON l.label_id = m.label_id
WHERE m.company_id = :company_id
AND l.is_current = true
ORDER BY m.mapped_to_inventory_count DESC;

Compliance and Regulatory

OSHA Hazard Communication Standard (HazCom 2012)

Requirements Met:

  • Section 1: Product Identification ✓
  • Section 2: Hazard Classification (GHS) ✓
  • Section 3: Composition/Ingredients ✓
  • Sections 4-16: Required information ✓
  • Version tracking for updates ✓
  • Audit trail for changes ✓

EPA FIFRA (Pesticides)

Integration: chemiq_epa_label_documents table PPLS API: Automatic updates for pesticide labels via ppls_last_synced EPA Reg No: Tracked in both Product Catalog and EPA Labels Compliance Tracking: Company-specific certifications and training via chemiq_company_label_mappings

Data Retention

Active Documents: Kept indefinitely Superseded Documents: Retained for 30+ years (OSHA/EPA requirement) Change Logs: Retained for 7+ years (audit requirement) Discontinued Products: Flagged but never deleted (regulatory requirement)


Part 5: Bulk Import and Supporting Tables

Table 19: chemiq_inventory_uploads

Purpose: Tracks bulk upload operations for chemical inventory, providing import history, metrics, and audit trails.

Key Features:

  • Bulk import tracking for CSV and invoice PDF uploads
  • Import summary with detailed row-level metrics
  • Job status tracking (pending, processing, completed, failed)
  • Error tracking and reporting
  • SDS gap detection
  • Compliance audit trail

Important Columns:

  • upload_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed): Company that performed the upload
  • site_id (UUID, FK, indexed): Target site for the import
  • uploaded_by_user_id (UUID, FK): User who initiated the upload

File Information:

  • file_name (String(255), required): Original filename
  • file_type (String(20)): csv, pdf_invoice
  • s3_bucket (String(100)): S3 bucket for uploaded file
  • s3_key (Text): S3 object key
  • file_size (BigInteger): File size in bytes

Upload Status:

  • upload_status (String(20), indexed, default='pending'): pending, processing, completed, failed, partial_success

Import Metrics:

  • total_rows (Integer, default=0): Total rows in file
  • rows_created (Integer, default=0): Successfully created records
  • rows_updated (Integer, default=0): Updated existing records
  • rows_skipped (Integer, default=0): Skipped (duplicates or validation failures)
  • rows_errored (Integer, default=0): Failed to process

SDS Tracking:

  • unresolved_sds_count (Integer, default=0): Number of items without SDS attached
  • auto_matched_sds_count (Integer, default=0): Number of auto-matched SDSs

Processing Metadata:

  • processing_mode (String(20)): transactional, fix_and_retry
  • validation_errors (JSONB): Array of row-level validation errors:
    [{"row": 5, "field": "product_name", "error": "Product name is required"}]
  • error_summary (Text): Human-readable error summary

Timing:

  • started_at (Timestamp): When processing started
  • completed_at (Timestamp): When processing completed
  • created_at (Timestamp): When upload was initiated

Indexes:

  • B-tree indexes on: company_id, site_id, uploaded_by_user_id, upload_status
  • Composite index on (company_id, created_at DESC) for history queries

Relationships:

  • Many-to-one: company (Companies)
  • Many-to-one: site (Sites)
  • Many-to-one: uploaded_by_user (Users)

Use Cases:

  • Track all bulk import operations for audit purposes
  • Provide detailed feedback to users about import success/failures
  • Monitor SDS gaps after bulk imports
  • Generate import history reports
  • Trigger Sentinel alerts for missing SDSs (Section 4.6.4)
  • Support partial imports with fix-and-retry workflow

Example Import Summary:

{
"upload_id": "123e4567-e89b-12d3-a456-426614174000",
"file_name": "chemical_inventory_2024.csv",
"total_rows": 100,
"rows_created": 85,
"rows_updated": 10,
"rows_skipped": 3,
"rows_errored": 2,
"unresolved_sds_count": 15,
"upload_status": "completed"
}

Integration Points:

  • Section 4.6 (Bulk Upload) in Chemical Inventory requirements
  • Triggers Sentinel "Missing SDS Alert" when unresolved_sds_count > 0
  • Creates records in chemiq_sds_reviews_pending for unresolved items
  • Used by bulk import API endpoints for progress tracking

Table 22: background_job_logs

Purpose: Tracks execution of background jobs from tellus-ehs-background-service. Records job status, duration, processing counts, and any errors.

Key Features:

  • Job identification (name and type)
  • Status tracking with timestamps
  • Optional company/site scope
  • Processing statistics (records processed, success, failed)
  • Error tracking and detailed JSONB storage

Important Columns:

  • log_id (UUID, PK): Primary identifier
  • job_name (String(100), indexed, required): Name of the job (e.g., "sds_bulk_upload_processor")
  • job_type (String(50), required): Type of job (e.g., "scheduled", "triggered", "manual")
  • status (String(20), indexed, default='running'): running, completed, failed
  • started_at (Timestamp with TZ, indexed): When job started
  • completed_at (Timestamp with TZ): When job completed
  • duration_seconds (Numeric(10,3)): Job duration in seconds
  • company_id (UUID, FK, optional): For company-scoped jobs
  • site_id (UUID, FK, optional): For site-scoped jobs
  • records_processed (Integer, default=0): Total records processed
  • records_success (Integer, default=0): Successful records
  • records_failed (Integer, default=0): Failed records
  • error_message (Text): Error details if job failed
  • details (JSONB): Additional job-specific details

Indexes:

  • B-tree indexes on: job_name, status, started_at

Use Cases:

  • Track background job execution history
  • Monitor job performance and success rates
  • Debug failed jobs with error messages
  • Generate job execution reports
  • Audit trail for scheduled processing

Table 23: chemiq_label_images (Legacy)

Purpose: Stores OCR-extracted product label images uploaded during inventory entry. This is a legacy table that predates chemiq_product_images.

Key Features:

  • Links to inventory items
  • Stores image path
  • OCR extracted text and confidence score
  • Legacy support for older inventory entries

Important Columns:

ColumnTypeConstraintsDescription
image_idUUIDPKPrimary identifier
chemical_idUUIDFK, indexedLinks to inventory item
image_pathTextrequiredPath to stored image
ocr_extracted_dataTextOCR extracted text
ocr_confidenceNumericOCR confidence score
uploaded_atTimestampUpload timestamp

Note: New implementations should use chemiq_product_images instead, which supports the company product catalog architecture.

Relationships:

  • Many-to-one: chemical (ChemIQ Inventory)

Use Cases:

  • Legacy inventory label image storage
  • OCR text extraction for product identification
  • Backward compatibility with older inventory entries

Table 24: ref_chemical_registry

Purpose: Reference table containing a master registry of chemicals with CAS numbers and molecular information.

Key Features:

  • Master chemical reference data
  • CAS number registry
  • Molecular formulas and weights
  • UN hazard numbers
  • State of matter classification

Important Columns:

ColumnTypeConstraintsDescription
chemical_idUUIDPKPrimary identifier
cas_numberTextindexedCAS registry number
primary_nameTextPrimary chemical name
molecular_formulaTextChemical formula
molecular_weightNumericMolecular weight
state_of_matterTextsolid, liquid, gas
un_numberTextUN hazard classification number
descriptionTextChemical description
created_atTimestampCreation time
updated_atTimestampLast update time

Indexes:

  • B-tree index on cas_number

Relationships:

  • One-to-many: synonyms (Chemical Synonyms)

Use Cases:

  • Master chemical reference lookup
  • CAS number validation
  • Hazard classification reference
  • Chemical property database

Table 25: ref_chemical_synonyms

Purpose: Stores alternative names and synonyms for chemicals in the registry.

Key Features:

  • Multiple names per chemical
  • Language support
  • Preferred name flagging
  • Synonym type categorization

Important Columns:

ColumnTypeConstraintsDescription
synonym_idUUIDPKPrimary identifier
chemical_idUUIDFK, indexedLinks to chemical registry
synonymTextAlternative name
synonym_typeTextType: trade_name, common_name, iupac, etc.
language_codeTextISO language code
is_preferredBooleanPreferred name flag
created_atTimestampCreation time

Indexes:

  • B-tree index on chemical_id
  • B-tree index on synonym

Relationships:

  • Many-to-one: chemical (Chemical Registry)

Use Cases:

  • Chemical name lookup by synonym
  • Multi-language chemical names
  • Trade name to CAS number mapping
  • Search enhancement with synonyms

Part 6: Chemical Enrichment, Compliance & PPE Systems

This section documents the tables added for advanced chemical data enrichment, site quantity tracking, regulatory compliance checking, and PPE recommendation features.

Migration Files:

  • 2026_01_15_1200-a1b2c3d4e5f7_add_chemical_enrichment_and_compliance_tables.py - 20 tables
  • 2026_01_15_1300-b2c3d4e5f6g8_add_ghs_hazard_codes_reference.py - 2 tables

Table 26: chemiq_pubchem_cache

Purpose: Caches chemical property data fetched from PubChem and other external sources to reduce API calls and improve response times.

Key Features:

  • External chemical data cache (properties, exposure limits, GHS data)
  • Multi-source support (PubChem, NIOSH, manufacturer)
  • Priority-based refresh cycles (30/90/180 days)
  • Tracks fetch statistics and cache performance

Important Columns:

  • cache_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed, required): CAS registry number
  • pubchem_cid (Integer): PubChem Compound ID

Chemical Identification:

  • iupac_name (String(500)): IUPAC systematic name
  • common_name (String(255)): Common chemical name
  • molecular_formula (String(100)): Chemical formula
  • molecular_weight (Numeric): Molecular weight in g/mol
  • canonical_smiles (Text): Canonical SMILES notation for chemical structure
  • inchi (Text): IUPAC International Chemical Identifier (InChI)
  • inchi_key (String(27), indexed): InChIKey hash for fast structure lookups
  • chemical_family (String(100)): Chemical family classification
  • chemical_class (String(100)): Chemical class

Physical Properties:

  • vapor_pressure_mmhg (Numeric): Vapor pressure at 25°C
  • boiling_point_c (Numeric): Boiling point in Celsius
  • flash_point_c (Numeric): Flash point in Celsius
  • specific_gravity (Numeric): Specific gravity
  • water_solubility_mg_l (Numeric): Water solubility in mg/L

Exposure Limits:

  • osha_pel_ppm (Numeric): OSHA Permissible Exposure Limit (ppm)
  • osha_pel_mg_m3 (Numeric): OSHA PEL (mg/m³)
  • niosh_rel_ppm (Numeric): NIOSH Recommended Exposure Limit (ppm)
  • niosh_rel_mg_m3 (Numeric): NIOSH REL (mg/m³)
  • niosh_idlh_ppm (Numeric): NIOSH IDLH value
  • acgih_tlv_ppm (Numeric): ACGIH Threshold Limit Value

GHS Classification:

  • ghs_classification (JSONB): Structured GHS hazard data

Cache Metadata:

  • source (String(50)): Data source (pubchem, niosh, manufacturer)
  • source_url (String(500)): Source URL
  • fetched_at (Timestamp): When data was fetched
  • last_refreshed_at (Timestamp): Last refresh time
  • fetch_count (Integer): Number of times fetched
  • priority_level (Integer, 1-3): Cache priority (1=high, 3=low)
  • created_at, updated_at (Timestamp)

Indexes:

  • Unique constraint on cas_number
  • B-tree index on pubchem_cid
  • B-tree index on inchi_key (for structure-based lookups)

Use Cases:

  • Cache PubChem API responses to avoid rate limiting
  • Store NIOSH Pocket Guide data for exposure limits
  • Enable offline chemical property lookup
  • Pre-populate common industrial chemicals
  • Structure-based chemical matching using InChIKey

Table 27: chemiq_chemical_fetch_queue

Purpose: Background job queue for fetching chemical data from external APIs (PubChem, NIOSH).

Key Features:

  • Priority-based queue processing
  • Retry mechanism with max attempts
  • Tracks source context (SDS, inventory, manual)
  • Company-scoped requests

Important Columns:

  • queue_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed, required): CAS number to fetch
  • source_type (String(50), required): sds_parse, inventory_add, manual_request
  • source_id (UUID): Reference to source record
  • company_id (UUID, FK): Requesting company
  • priority (String(20)): high, normal, low
  • status (String(20), indexed): pending, processing, completed, failed
  • attempts (Integer): Current attempt count
  • max_attempts (Integer): Maximum retry attempts
  • last_attempt_at (Timestamp): Last attempt time
  • error_message (Text): Error details if failed
  • created_at (Timestamp)
  • processed_at (Timestamp): When processing completed

Indexes:

  • Composite index on (status, priority)
  • B-tree index on cas_number

Use Cases:

  • Queue chemical enrichment requests from SDS parsing
  • Batch process multiple chemical lookups
  • Retry failed API requests with backoff
  • Track enrichment request sources

Table 28: chemiq_cache_statistics

Purpose: Tracks cache performance metrics for monitoring and optimization.

Important Columns:

  • stat_id (UUID, PK): Primary identifier
  • period_start (Date, required): Period start date
  • period_end (Date, required): Period end date
  • total_lookups (Integer): Total cache lookups
  • cache_hits (Integer): Successful cache hits
  • cache_misses (Integer): Cache misses requiring API fetch
  • fetch_successes (Integer): Successful API fetches
  • fetch_failures (Integer): Failed API fetches
  • avg_fetch_time_ms (Numeric): Average fetch latency
  • top_chemicals (JSONB): Most frequently accessed chemicals
  • created_at (Timestamp)

Use Cases:

  • Monitor cache hit rates (target: >95%)
  • Track API performance and reliability
  • Identify frequently accessed chemicals for pre-caching
  • Generate performance reports

Table 29: chemiq_unit_conversions

Purpose: Reference table for unit conversion factors used in quantity aggregation.

Key Features:

  • Standard units: lbs (weight), gal (volume)
  • Bidirectional conversion factors
  • Supports weight, volume, and count units

Important Columns:

  • conversion_id (UUID, PK): Primary identifier
  • unit_code (String(20), required): Unit identifier (oz, gal, L, kg)
  • unit_name (String(50), required): Full unit name
  • unit_type (String(20), required): weight, volume
  • to_standard_factor (Numeric, required): Conversion factor to standard unit
  • standard_unit (String(20), required): Standard unit (lbs, gal)
  • abbreviation (String(10), required): Display abbreviation
  • created_at (Timestamp)

Seed Data (14 records):

  • Weight: oz, lbs, g, kg
  • Volume: fl_oz, cup, pt, qt, gal, mL, L

Use Cases:

  • Convert inventory quantities to standard units
  • Support Tier II reporting (lbs required)
  • Enable fire code threshold comparisons

Table 30: chemiq_product_density

Purpose: Stores density values for volume-to-weight conversion.

Important Columns:

  • density_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed): CAS number for chemical-specific density
  • product_id (UUID, FK): Product-specific density
  • density_value (Numeric, required): Density value
  • density_unit (String(20), required): Density unit (g/mL, lb/gal)
  • density_lbs_per_gal (Numeric, required): Normalized density
  • data_source (String(100)): Source of density data
  • created_at, updated_at (Timestamp)

Seed Data:

  • Water baseline: 8.34 lbs/gal

Use Cases:

  • Convert gallons to pounds for Tier II reporting
  • Calculate weight from volume measurements
  • Support fire code quantity calculations

Table 31: chemiq_site_chemical_summary

Purpose: Aggregated chemical quantities per site for Tier II reporting and threshold monitoring.

Key Features:

  • Per-chemical totals (lbs, gallons, containers)
  • Tracks contributing products
  • Fire code classification
  • Tier II threshold tracking

Important Columns:

  • summary_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, required): Company identifier
  • site_id (UUID, FK, required): Site identifier
  • cas_number (String(20), required): Chemical CAS number
  • chemical_name (String(255), required): Chemical name
  • total_quantity_lbs (Numeric): Total weight in pounds
  • total_quantity_gal (Numeric): Total volume in gallons
  • total_containers (Integer): Container count
  • product_count (Integer): Number of contributing products
  • contributing_products (JSONB): Product details
  • is_hazardous (Boolean): Hazardous classification
  • hazard_classes (JSONB): GHS hazard classes
  • physical_state (String(20)): solid, liquid, gas
  • max_daily_quantity_lbs (Numeric): Max daily quantity (Tier II)
  • avg_daily_quantity_lbs (Numeric): Average daily quantity (Tier II)
  • days_on_site (Integer): Days present (Tier II)
  • fire_code_category (String(50)): Fire code hazard category
  • fire_code_class (String(20)): Fire code hazard class
  • storage_locations (JSONB): Storage location details
  • tier2_threshold_lbs (Integer): Tier II reporting threshold
  • fire_permit_threshold (Numeric): Fire permit threshold
  • exceeds_tier2_threshold (Boolean): Tier II threshold exceeded
  • exceeds_fire_threshold (Boolean): Fire threshold exceeded
  • last_calculated_at (Timestamp): Last aggregation time
  • calculation_method (String(50)): Calculation method used
  • created_at, updated_at (Timestamp)

Indexes:

  • Composite index on (company_id, site_id)
  • B-tree index on cas_number

Use Cases:

  • Generate Tier II reports
  • Monitor fire code compliance
  • Track site-level chemical totals
  • Alert on threshold exceedances

Table 32: chemiq_site_hazard_summary

Purpose: Aggregated quantities by hazard class per site for fire code compliance.

Important Columns:

  • summary_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, required): Company identifier
  • site_id (UUID, FK, required): Site identifier
  • hazard_category (String(50), required): Hazard category (Flammable, Corrosive)
  • hazard_class (String(100), required): Specific hazard class
  • ghs_code (String(20)): GHS hazard code
  • fire_code_category (String(50)): IFC category
  • fire_code_class (String(20)): IFC class
  • total_quantity_lbs (Numeric): Total weight
  • total_quantity_gal (Numeric): Total volume
  • total_containers (Integer): Container count
  • chemical_count (Integer): Number of chemicals
  • chemicals (JSONB): Contributing chemical details
  • fire_permit_limit (Numeric): Fire permit threshold
  • exceeds_permit_limit (Boolean): Threshold exceeded
  • percent_of_limit (Numeric): Percentage of limit used
  • last_calculated_at (Timestamp)
  • created_at, updated_at (Timestamp)

Use Cases:

  • Fire department permit applications
  • Hazard class quantity tracking
  • Threshold monitoring by hazard type

Table 33: chemiq_site_quantity_history

Purpose: Daily snapshots of chemical quantities for Tier II reporting (max/avg daily calculations).

Key Features:

  • 7-year retention for regulatory compliance
  • Daily automated snapshots
  • Supports Tier II max/avg calculations

Important Columns:

  • history_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, required): Company identifier
  • site_id (UUID, FK, required): Site identifier
  • cas_number (String(20), required): Chemical CAS number
  • chemical_name (String(255), required): Chemical name
  • snapshot_date (Date, indexed, required): Snapshot date
  • quantity_lbs (Numeric): Quantity in pounds
  • quantity_gal (Numeric): Quantity in gallons
  • container_count (Integer): Container count
  • is_present (Boolean): Chemical present on this date
  • source (String(50)): Snapshot source (daily_snapshot, manual)
  • created_at (Timestamp)

Indexes:

  • Composite index on (company_id, site_id, snapshot_date)
  • B-tree index on cas_number

Use Cases:

  • Calculate Tier II max daily quantity
  • Calculate Tier II average daily quantity
  • Calculate days on site
  • Historical quantity tracking

Table 34: chemiq_fire_code_thresholds

Purpose: Reference table for fire code quantity limits by hazard category.

Important Columns:

  • threshold_id (UUID, PK): Primary identifier
  • hazard_category (String(50), required): Hazard category
  • hazard_class (String(20)): Specific class
  • storage_type (String(50), required): Storage type
  • max_quantity_gal (Numeric): Maximum gallons
  • max_quantity_lbs (Numeric): Maximum pounds
  • max_quantity_cu_ft (Numeric): Maximum cubic feet
  • code_reference (String(100)): Code citation
  • jurisdiction (String(50)): Jurisdiction (IFC, state)
  • notes (Text): Additional notes
  • created_at (Timestamp)

Seed Data (13 records): Based on IFC Table 5003.1.1(1) for flammable liquids, corrosives, oxidizers, etc.

Use Cases:

  • Fire code compliance checking
  • Permit threshold calculations
  • Automated threshold alerts

Table 35: chemiq_regulatory_lists

Purpose: Master regulatory list tracking chemicals on OSHA, EPA, IARC, and state lists.

Key Features:

  • OSHA PEL/specific standards tracking
  • EPA SARA 313 (TRI) and CERCLA RQ
  • California Prop 65
  • IARC/NTP carcinogen classifications

Important Columns:

  • list_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed, required): CAS number
  • chemical_name (String(255)): Chemical name

OSHA Fields:

  • is_osha_pel_listed (Boolean): Has OSHA PEL
  • has_osha_specific_standard (Boolean): Has substance-specific standard
  • osha_standard_citation (String(50)): Standard citation (e.g., 1910.1028)
  • is_niosh_rel_listed (Boolean): Has NIOSH REL
  • is_acgih_tlv_listed (Boolean): Has ACGIH TLV

EPA SARA 313 (TRI):

  • is_epa_sara_313 (Boolean): On TRI list
  • sara_313_threshold_lbs (Integer): Reporting threshold (default 10,000)
  • sara_313_category (String(50)): Category
  • sara_313_pbt (Boolean): Persistent bioaccumulative toxic

EPA CERCLA:

  • is_epa_cercla (Boolean): On CERCLA list
  • cercla_rq_lbs (Integer): Reportable quantity

California Prop 65:

  • is_california_prop65 (Boolean): On Prop 65 list
  • prop65_type (String(20)): cancer, reproductive, both
  • prop65_listing_date (Date): When listed
  • prop65_nsrl_ug (Numeric): No Significant Risk Level
  • prop65_madl_ug (Numeric): Maximum Allowable Dose Level

Carcinogen Classification:

  • is_carcinogen (Boolean): Known/suspected carcinogen
  • carcinogen_source (String(20)): IARC, NTP, OSHA
  • carcinogen_classification (String(20)): Group 1, 2A, 2B, etc.

EU REACH:

  • is_eu_reach (Boolean): On REACH list
  • reach_status (String(50)): REACH status

Metadata:

  • last_verified_at (Timestamp): Last verification date
  • source_urls (Text[]): Source URLs
  • notes (Text): Additional notes
  • created_at, updated_at (Timestamp)

Indexes:

  • Unique constraint on cas_number

Use Cases:

  • Compliance checking against regulatory lists
  • TRI reporting determination
  • Prop 65 warning requirements
  • Carcinogen identification

Table 36: chemiq_company_compliance_settings

Purpose: Company-specific compliance settings for customized checking.

Important Columns:

  • setting_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, required): Company identifier
  • ships_to_california (Boolean): Ships to CA (triggers Prop 65)
  • employee_count (Integer): Employee count (affects TRI)
  • naics_code (String(10)): NAICS code (affects TRI)
  • use_niosh_rel_over_pel (Boolean): Use NIOSH REL instead of PEL
  • exposure_monitoring_frequency (String(20)): Monitoring frequency
  • notify_on_compliance_issues (Boolean): Enable notifications
  • compliance_notification_emails (Text[]): Notification recipients
  • created_at, updated_at (Timestamp)

Indexes:

  • Unique constraint on company_id

Use Cases:

  • Configure Prop 65 requirements
  • Set TRI reporting thresholds
  • Customize exposure monitoring
  • Enable compliance alerts

Table 37: chemiq_compliance_results

Purpose: Stores compliance check results per product/chemical.

Important Columns:

  • result_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, required): Company identifier
  • site_id (UUID, FK): Site scope
  • inventory_id (UUID, FK): Inventory item
  • sds_id (UUID, FK): SDS document
  • check_date (Timestamp): When check was performed
  • compliance_score (Integer): 0-100 score
  • is_compliant (Boolean): Overall compliance status

Check Results (JSONB):

  • hazcom_result: HazCom compliance details
  • exposure_monitoring_result: Exposure monitoring requirements
  • tri_reporting_result: TRI reporting determination
  • cercla_result: CERCLA RQ status
  • prop65_result: Prop 65 requirements
  • carcinogen_result: Carcinogen classification

Action Items:

  • required_actions (Text[]): Required actions
  • warnings (Text[]): Warning messages
  • checked_by (String(50)): Check method (auto, manual)
  • created_at (Timestamp)

Indexes:

  • B-tree indexes on company_id, site_id, sds_id

Use Cases:

  • Store compliance check results
  • Track compliance history
  • Generate compliance reports
  • Identify action items

Table 38: ppe_glove_chemical_resistance

Purpose: Chemical-to-glove-material compatibility ratings.

Important Columns:

  • resistance_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed, required): Chemical CAS number
  • chemical_name (String(255), required): Chemical name
  • chemical_family (String(100)): Chemical family

Material Ratings (VG/G/F/P/NR):

  • nitrile_rating (String(10)): Nitrile rubber rating
  • butyl_rating (String(10)): Butyl rubber rating
  • neoprene_rating (String(10)): Neoprene rating
  • pvc_rating (String(10)): PVC rating
  • viton_rating (String(10)): Viton rating
  • latex_rating (String(10)): Natural latex rating

Breakthrough Times (minutes):

  • nitrile_breakthrough_min (Integer)
  • butyl_breakthrough_min (Integer)
  • neoprene_breakthrough_min (Integer)
  • viton_breakthrough_min (Integer)

Recommendations:

  • recommended_thickness_mil (Integer): Recommended thickness
  • source (String(255)): Data source
  • notes (Text): Additional notes
  • created_at, updated_at (Timestamp)

Seed Data (41 records): Common industrial chemicals with material ratings from Ansell Chemical Resistance Guide:

  • Solvents: Acetone, MEK, Toluene, Xylene, Hexane, Heptane, etc.
  • Alcohols: Methanol, Ethanol, Isopropanol, Butanol
  • Acids: Hydrochloric, Sulfuric, Nitric, Phosphoric, Acetic
  • Bases: Sodium Hydroxide, Potassium Hydroxide, Ammonia
  • Aldehydes: Formaldehyde, Glutaraldehyde
  • Chlorinated: Methylene Chloride, Trichloroethylene, Perchloroethylene

Use Cases:

  • Glove material selection
  • Breakthrough time calculations
  • PPE recommendation generation

Table 39: ppe_eye_protection_reference

Purpose: Hazard type to eye protection mapping.

Important Columns:

  • reference_id (UUID, PK): Primary identifier
  • hazard_type (String(50), required): Hazard type
  • ghs_hazard_code (String(10)): GHS code
  • minimum_protection (String(100), required): Minimum required
  • recommended_protection (String(100)): Recommended level
  • ansi_marking_required (String(20)): ANSI marking (Z87.1)
  • face_shield_recommended (Boolean): Face shield needed
  • full_face_respirator_option (Boolean): Can use full-face respirator
  • selection_reasoning (Text): Selection rationale
  • created_at (Timestamp)

Seed Data (8 records): Hazard types mapped to eye protection requirements:

  • chemical_splash: Safety goggles with indirect venting (Z87.1+D3)
  • dust: Safety goggles or glasses with side shields (Z87.1)
  • vapor: Safety goggles with indirect venting (Z87.1+D3)
  • corrosive_splash: Face shield + safety goggles (Z87.1+D3)
  • severe_eye_damage: Face shield + safety goggles (Z87.1+D3)
  • eye_irritation: Safety glasses with side shields (Z87.1)
  • general_chemical: Safety goggles (Z87.1+D3)
  • uv_radiation: Tinted safety glasses (Z87.1+U6)

Use Cases:

  • Eye protection selection
  • GHS-based recommendations
  • ANSI compliance

Table 40: ppe_respiratory_reference

Purpose: Chemical to respirator cartridge requirements.

Important Columns:

  • reference_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed): CAS number
  • chemical_name (String(255)): Chemical name
  • chemical_family (String(100)): Chemical family

Exposure Limits:

  • osha_pel_ppm (Numeric): OSHA PEL
  • osha_pel_mgm3 (Numeric): OSHA PEL (mg/m³)
  • niosh_rel_ppm (Numeric): NIOSH REL
  • niosh_rel_mgm3 (Numeric): NIOSH REL (mg/m³)
  • niosh_idlh_ppm (Numeric): IDLH value

Cartridge Selection:

  • cartridge_type (String(50)): OV, AG, OV/AG, etc.
  • cartridge_color (String(50)): Black, yellow, etc.
  • particulate_filter (String(20)): N95, P100, etc.

Physical Properties:

  • vapor_pressure_mmhg (Numeric): Vapor pressure
  • is_particulate (Boolean): Particulate hazard
  • oil_present (Boolean): Oil present

Respirator Requirements:

  • min_respirator_type (String(50)): APR, PAPR, SAR, SCBA
  • min_apf (Integer): Minimum APF required
  • requires_supplied_air (Boolean): Requires SAR/SCBA
  • special_considerations (Text): Special notes
  • source (String(100)): Data source
  • created_at, updated_at (Timestamp)

Seed Data (19 records): Common chemicals with respiratory protection requirements from NIOSH Pocket Guide:

  • Organic Solvents: Acetone, Toluene, Xylene, MEK, Hexane (OV cartridge, black)
  • Acid Gases: HCl, Sulfuric acid mist, Nitric acid (AG cartridge, white/yellow)
  • Ammonia: Ammonia (AM cartridge, green)
  • Formaldehyde: Formaldehyde (Formaldehyde cartridge, olive)
  • Particulates: Silica dust, Metal fumes (N95/P100 filters)
  • Multi-gas: Combination OV/AG cartridges for mixed exposures
  • High-hazard: IDLH chemicals requiring supplied air (SCBA/SAR)

Use Cases:

  • Respirator cartridge selection
  • APF calculations
  • IDLH determinations

Table 41: ppe_body_protection_reference

Purpose: Hazard to body protection level mapping.

Important Columns:

  • reference_id (UUID, PK): Primary identifier
  • hazard_type (String(50), required): Hazard type
  • ghs_hazard_code (String(10)): GHS code
  • minimum_protection (String(100), required): Minimum protection
  • recommended_protection (String(100)): Recommended level
  • protection_level (String(5)): EPA level (A, B, C, D)
  • min_material_type (String(50)): Minimum material
  • selection_reasoning (Text): Selection rationale
  • created_at (Timestamp)

Seed Data (8 records): Hazard severity to EPA protection level mapping:

  • Level D: General work, minimal hazard (standard coveralls)
  • Level C: Known contaminants, APR adequate (Tyvek + APR)
  • Level B: High concentrations, splash hazard (Tychem + SCBA)
  • Level A: Highest protection, IDLH/unknown (fully encapsulated + SCBA)
  • Plus specific hazard types: corrosive, oxidizer, flammable, toxic

Use Cases:

  • Body protection selection
  • EPA protection levels
  • Chemical suit requirements

Table 42: ppe_body_material_resistance

Purpose: Chemical-to-body-protection material compatibility.

Important Columns:

  • resistance_id (UUID, PK): Primary identifier
  • cas_number (String(20), indexed, required): CAS number
  • chemical_name (String(255), required): Chemical name

Material Ratings:

  • tyvek_rating (String(10)): Tyvek rating
  • tychem_qc_rating (String(10)): Tychem QC rating
  • tychem_2000_rating (String(10)): Tychem 2000 rating
  • tychem_4000_rating (String(10)): Tychem 4000 rating
  • tychem_6000_rating (String(10)): Tychem 6000 rating
  • saranex_rating (String(10)): Saranex rating

Breakthrough Times:

  • tychem_2000_breakthrough_min (Integer)
  • tychem_4000_breakthrough_min (Integer)
  • source (String(100)): Data source
  • created_at (Timestamp)

Seed Data (8 records): Common chemicals with DuPont Tyvek/Tychem material ratings:

  • Solvents: Acetone, Toluene, Xylene (Tychem 2000/4000 required)
  • Acids: Sulfuric acid, Hydrochloric acid (Tychem 4000/6000 required)
  • Bases: Sodium Hydroxide (Tychem QC/2000 adequate)
  • Water-based: General aqueous solutions (Tyvek adequate)
  • High-hazard: Concentrated acids/bases (Tychem 6000/Saranex required)

Use Cases:

  • Chemical suit material selection
  • Breakthrough time calculations

Table 43: ppe_products

Purpose: Catalog of available PPE products with specifications.

Important Columns:

  • product_id (UUID, PK): Primary identifier
  • product_name (String(255), required): Product name
  • manufacturer (String(100), required): Manufacturer
  • product_code (String(100)): SKU/model number
  • ppe_category (String(50), required): hand, eye, respiratory, body
  • ppe_type (String(50), required): gloves, goggles, respirator, coverall
  • ppe_subtype (String(50)): Specific subtype

Glove Fields:

  • glove_material (String(50)): Material type
  • glove_thickness_mil (Numeric): Thickness in mils
  • glove_length_inches (Numeric): Length
  • glove_ansi_level (Integer): Cut resistance level

Eye Protection Fields:

  • eye_ansi_marking (String(20)): ANSI marking
  • eye_lens_type (String(50)): Lens type
  • eye_has_side_shields (Boolean): Side shields
  • eye_indirect_vent (Boolean): Indirect venting

Respirator Fields:

  • respirator_type (String(50)): APR, PAPR, etc.
  • respirator_apf (Integer): Assigned Protection Factor
  • respirator_cartridge_type (String(50)): Cartridge type
  • respirator_filter_rating (String(10)): N95, P100, etc.
  • respirator_niosh_approved (Boolean): NIOSH approved

Body Protection Fields:

  • body_coverage (String(50)): Full, partial, etc.
  • body_material (String(50)): Material type
  • body_protection_level (String(5)): EPA level
  • body_seam_type (String(50)): Seam construction

Common Fields:

  • certifications (Text[]): Certifications list
  • size_options (String(100)): Available sizes
  • purchase_url (String(500)): Purchase link
  • price_range (String(50)): Price range
  • pack_quantity (Integer): Units per pack
  • is_active (Boolean): Active flag
  • created_at, updated_at (Timestamp)

Indexes:

  • B-tree indexes on ppe_category, ppe_type, manufacturer

Seed Data (20 records): Representative PPE products across all categories:

  • Gloves (5): Nitrile (Ansell Sol-Vex 37-175), Butyl (North B-174), Neoprene (Ansell Neox 09-430), PVC (Ansell Snorkel), Viton (North F-091)
  • Eye Protection (5): Safety glasses with side shields, Chemical splash goggles (indirect vent), Face shields, Full-face respirator visors, UV-protective glasses
  • Respirators (5): Half-face APR (3M 6000), Full-face APR (3M 6800), PAPR (3M Versaflo), N95 disposable (3M 8210), P100 cartridge set
  • Body Protection (5): Tyvek coverall, Tychem QC coverall, Tychem 2000 suit, Tychem 4000 suit, Apron (chemical resistant)

Use Cases:

  • PPE product catalog
  • Product recommendations
  • Specification matching

Table 44: ppe_sds_recommendations

Purpose: Generated PPE recommendations per SDS document.

Important Columns:

  • recommendation_id (UUID, PK): Primary identifier
  • sds_id (UUID, FK, required): SDS document

Glove Recommendation:

  • glove_material (String(50)): Recommended material
  • glove_thickness_mil (Integer): Recommended thickness
  • glove_ansi_level (Integer): Cut resistance
  • glove_reasoning (Text): Selection rationale
  • glove_confidence (Numeric): Confidence score

Eye Protection:

  • eye_protection_type (String(50)): Type recommended
  • eye_ansi_marking (String(20)): ANSI marking
  • eye_face_shield_required (Boolean): Face shield needed
  • eye_reasoning (Text): Selection rationale
  • eye_confidence (Numeric): Confidence score

Respiratory:

  • respirator_type (String(50)): Type recommended
  • respirator_cartridge (String(50)): Cartridge type
  • respirator_filter (String(10)): Filter rating
  • respirator_apf_required (Integer): Minimum APF
  • respirator_reasoning (Text): Selection rationale
  • respirator_confidence (Numeric): Confidence score

Body Protection:

  • body_protection_type (String(50)): Type recommended
  • body_material (String(50)): Material type
  • body_protection_level (String(5)): EPA level
  • body_reasoning (Text): Selection rationale
  • body_confidence (Numeric): Confidence score

Overall:

  • overall_confidence (Numeric): Overall confidence
  • generated_at (Timestamp): Generation time
  • generated_by (String(50)): Generation method (auto, manual)
  • last_reviewed_at (Timestamp): Last review
  • reviewed_by (UUID, FK): Reviewer
  • created_at (Timestamp)

Indexes:

  • Unique constraint on sds_id

Use Cases:

  • Store PPE recommendations per SDS
  • Track recommendation confidence
  • Enable human review workflow

Table 45: ppe_ghs_hazard_mapping

Purpose: Maps GHS hazard codes to PPE requirements (PPE-triggering codes only).

Important Columns:

  • mapping_id (UUID, PK): Primary identifier
  • ghs_code (String(10), required): GHS hazard code
  • hazard_statement (Text): Hazard statement text
  • hazard_category (String(50)): Hazard category

PPE Requirements:

  • eye_protection_min (String(50)): Minimum eye protection
  • eye_protection_recommended (String(50)): Recommended eye protection
  • hand_protection_min (String(50)): Minimum hand protection
  • respirator_required (Boolean): Respirator required
  • respirator_type_min (String(50)): Minimum respirator
  • body_protection_min (String(50)): Minimum body protection
  • notes (Text): Additional notes
  • created_at (Timestamp)

Seed Data (13 codes): H310, H311, H312, H314, H315, H317, H318, H319, H330, H331, H332, H334, H335

Use Cases:

  • Quick PPE lookup by GHS code
  • SDS-to-PPE mapping
  • Automated recommendations

Table 46: chemiq_ghs_hazard_codes

Purpose: Complete reference table for all GHS hazard codes (H-codes).

Key Features:

  • All physical hazards (H200 series)
  • All health hazards (H300 series)
  • All environmental hazards (H400 series)
  • Combined hazard statements
  • PPE triggering flags

Important Columns:

  • hazard_id (UUID, PK): Primary identifier
  • code (String(30), indexed, required): Hazard code
  • code_type (String(20), required): hazard, precautionary, combined
  • hazard_class (String(100)): Hazard class name
  • hazard_category (String(100)): Category designation
  • statement (Text, required): Hazard statement text
  • signal_word (String(20)): Danger, Warning
  • pictogram_codes (Text[]): GHS pictograms (GHS01-GHS09)
  • category_number (Integer): Category number
  • is_physical (Boolean): Physical hazard flag
  • is_health (Boolean): Health hazard flag
  • is_environmental (Boolean): Environmental hazard flag
  • triggers_ppe (Boolean): Triggers PPE requirement
  • ppe_categories (Text[]): PPE categories required
  • notes (Text): Additional notes
  • created_at (Timestamp)

Seed Data (94 codes):

  • 29 physical hazards (H200 series)
  • 57 health hazards (H300 series)
  • 8 environmental hazards (H400 series)
  • 13 combined hazard statements

Indexes:

  • Unique constraint on code
  • B-tree indexes on code_type, hazard_class

Use Cases:

  • GHS code validation
  • Hazard statement lookup
  • Pictogram determination
  • PPE requirement lookup

Table 47: chemiq_ghs_pictograms

Purpose: Reference table for GHS pictograms (GHS01-GHS09).

Important Columns:

  • pictogram_id (UUID, PK): Primary identifier
  • code (String(10), required): Pictogram code (GHS01-GHS09)
  • name (String(100), required): Pictogram name
  • symbol_description (Text, required): Symbol description
  • hazard_types (Text[]): Associated hazard types
  • created_at (Timestamp)

Seed Data (9 pictograms):

  • GHS01: Exploding Bomb
  • GHS02: Flame
  • GHS03: Flame Over Circle
  • GHS04: Gas Cylinder
  • GHS05: Corrosion
  • GHS06: Skull and Crossbones
  • GHS07: Exclamation Mark
  • GHS08: Health Hazard
  • GHS09: Environment

Use Cases:

  • Pictogram lookup
  • SDS label generation
  • Hazard visualization

Part 7: Classification & Tagging System

This section documents the tables added for chemical classification (GHS hazard categories), custom company tags, and AI-generated hazard summaries.

Migration File:

  • 2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py

Table 48: chemiq_company_tags

Purpose: Company-scoped custom tags for organizing and categorizing chemicals.

Key Features:

  • Company-specific tag definitions with color coding
  • Unique tag names per company
  • Cascading delete when company is removed

Important Columns:

  • tag_id (UUID, PK): Primary identifier
  • company_id (UUID, FK, indexed, required): Company that owns this tag
  • name (String(50), required): Tag display name
  • color (String(7), default='#6B7280', required): Hex color code for display
  • description (String(255)): Optional tag description
  • created_by_user_id (UUID, FK): User who created the tag
  • created_at (Timestamp with TZ, default=now): Creation time
  • updated_at (Timestamp with TZ): Last update time

Indexes:

  • B-tree index on company_id
  • Unique constraint on (company_id, name)

Relationships:

  • Many-to-one: company → Company (CASCADE delete)
  • Many-to-one: created_by_user → User (SET NULL on delete)
  • One-to-many: product_tags → ProductTag

Use Cases:

  • Create custom organizational tags (e.g., "High Priority", "Needs Review", "Lab Only")
  • Color-coded tag display in inventory list and details page
  • Company-specific categorization beyond GHS hazard classes

Table 49: chemiq_product_tags

Purpose: Junction table linking company products to tags (many-to-many).

Key Features:

  • Links products to company tags
  • Tracks who assigned the tag
  • Cascading deletes from both product and tag sides

Important Columns:

  • id (UUID, PK): Primary identifier
  • company_product_id (UUID, FK, indexed, required): Links to company product catalog
  • tag_id (UUID, FK, indexed, required): Links to company tag
  • assigned_by_user_id (UUID, FK): User who assigned the tag
  • assigned_at (Timestamp with TZ, default=now): Assignment time

Indexes:

  • B-tree index on company_product_id
  • B-tree index on tag_id
  • Unique constraint on (company_product_id, tag_id)

Relationships:

  • Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
  • Many-to-one: tag → CompanyTag (CASCADE delete)
  • Many-to-one: assigned_by_user → User (SET NULL on delete)

Use Cases:

  • Assign multiple tags to a product
  • Filter inventory by tag
  • Track tag assignment history
  • Dashboard tag distribution charts

Table 50: chemiq_ai_hazard_summaries

Purpose: Stores AI-generated plain-English hazard summaries for products with parsed SDS data.

Key Features:

  • One summary per product (unique index on company_product_id)
  • Risk level classification (low/moderate/high/extreme)
  • Key hazards as JSONB array
  • Tracks which SDS and AI model were used for generation

Important Columns:

  • id (UUID, PK): Primary identifier
  • company_product_id (UUID, FK, unique index, required): Links to company product catalog
  • summary_text (Text, required): AI-generated plain-English hazard summary
  • handling_guidance (Text): AI-generated handling and storage guidance
  • risk_level (String(20)): Overall risk level: low, moderate, high, extreme
  • key_hazards (JSONB): Array of top hazard phrases (e.g., ["Highly flammable", "Causes skin irritation"])
  • model_used (String(50)): AI model used for generation (e.g., "gpt-4o", "claude-3")
  • generated_at (Timestamp with TZ, default=now): When summary was generated
  • sds_document_id (UUID, FK): SDS document used as source for generation

Indexes:

  • Unique index on company_product_id (one summary per product)

Relationships:

  • Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
  • Many-to-one: sds_document → SDSDocument (SET NULL on delete)

Use Cases:

  • Display AI-generated hazard summary on product details page
  • Risk level badges in inventory list
  • Handling guidance for employees without chemistry background
  • Regenerate summary when SDS is updated

Summary

The ChemIQ Complete Schema provides a comprehensive, unified architecture for managing chemical safety documents across multiple companies:

Key Benefits

  1. Unified Search: Two-tier product catalog (company → global) for fast, relevant results
  2. No Product Duplication: Company product catalog eliminates duplicate product data across sites/locations
  3. Document Flexibility: One product can link to multiple document types (SDS, EPA Label, COA, tech sheets)
  4. Global Repositories: SDSs and EPA Labels shared across companies (no duplication)
  5. Company Isolation: Company product catalog and mapping tables maintain data separation
  6. Data Quality Tracking: Confidence scores, verification status, and data source tracking
  7. Performance: Extensive indexing (GIN, B-tree) for fast search across large datasets
  8. Compliance: Complete audit trails, version tracking, and retention policies
  9. Scalability: Partition-ready design for millions of documents
  10. Integration: S3 storage, async parsing queues, EPA PPLS API, external barcode APIs

Total Tables: 50

Product Catalog System (4 tables):

  1. chemiq_product_catalog (global shared catalog)
  2. chemiq_product_documents (polymorphic junction table)
  3. chemiq_company_product_catalog - company-specific products
  4. chemiq_product_images - product label images for curved/multi-sided labels

EPA Label Repository (4 tables): 5. chemiq_epa_label_documents 6. chemiq_company_label_mappings 7. chemiq_label_parse_queue 8. chemiq_label_change_log

SDS Repository (9 tables): 9. chemiq_sds_documents 10. chemiq_sds_hazard_info 11. chemiq_sds_composition 12. chemiq_sds_sections 13. chemiq_company_sds_mappings 14. chemiq_sds_parse_queue 15. chemiq_sds_change_log 16. chemiq_sds_web_search_cache - caches Perplexity API web search results 17. chemiq_sds_bulk_upload_jobs - tracks ZIP bulk upload background jobs

Supporting Tables (8 tables): 18. chemiq_inventory (simplified - links to company product catalog) 19. chemiq_inventory_attributes - stores batch/lot numbers and expiration dates 20. chemiq_sds_reviews_pending 21. chemiq_inventory_uploads 22. background_job_logs - tracks background service job execution 23. chemiq_label_images - legacy label images (use chemiq_product_images for new implementations) 24. ref_chemical_registry - master chemical CAS number registry 25. ref_chemical_synonyms - chemical name synonyms and aliases

Chemical Enrichment (3 tables): 26. chemiq_pubchem_cache - PubChem/NIOSH chemical data cache 27. chemiq_chemical_fetch_queue - background fetch job queue 28. chemiq_cache_statistics - cache performance metrics

Site Quantity Aggregation (6 tables): 29. chemiq_unit_conversions - unit conversion factors 30. chemiq_product_density - volume-to-weight conversion 31. chemiq_site_chemical_summary - per-chemical site totals 32. chemiq_site_hazard_summary - per-hazard-class site totals 33. chemiq_site_quantity_history - daily snapshots for Tier II 34. chemiq_fire_code_thresholds - fire code limits

Compliance Checking (3 tables): 35. chemiq_regulatory_lists - OSHA/EPA/IARC/Prop65 lists 36. chemiq_company_compliance_settings - company compliance config 37. chemiq_compliance_results - compliance check results

PPE Recommendation (8 tables): 38. ppe_glove_chemical_resistance - glove material ratings 39. ppe_eye_protection_reference - eye protection mapping 40. ppe_respiratory_reference - respirator requirements 41. ppe_body_protection_reference - body protection levels 42. ppe_body_material_resistance - suit material ratings 43. ppe_products - PPE product catalog 44. ppe_sds_recommendations - generated PPE recommendations 45. ppe_ghs_hazard_mapping - GHS to PPE mapping

GHS Reference (2 tables): 46. chemiq_ghs_hazard_codes - complete GHS H-codes (94 codes) 47. chemiq_ghs_pictograms - GHS pictograms (9 pictograms)

Classification & Tagging (3 tables): 48. chemiq_company_tags - company-scoped custom tags with color coding 49. chemiq_product_tags - product-to-tag junction table 50. chemiq_ai_hazard_summaries - AI-generated plain-English hazard summaries

Seed Data Summary

TableRecordsSourceDescription
chemiq_unit_conversions14StandardUnit conversion factors (oz→lbs, gal→lbs, etc.)
chemiq_fire_code_thresholds13IFC 5003.1.1Fire code quantity limits by hazard class
chemiq_product_density1StandardWater baseline density
chemiq_ghs_hazard_codes94GHS Rev 9Complete GHS H-codes (physical/health/environmental)
chemiq_ghs_pictograms9GHS Rev 9All GHS pictograms (GHS01-GHS09)
chemiq_pubchem_cache~300NIOSH/PubChemPre-seeded common industrial chemicals with exposure limits
ppe_glove_chemical_resistance41Ansell GuideGlove material ratings for common chemicals
ppe_eye_protection_reference8OSHA/ANSIHazard type to eye protection mapping
ppe_respiratory_reference19NIOSH GuideRespirator cartridge requirements by chemical
ppe_body_protection_reference8EPA/OSHAHazard severity to protection level mapping
ppe_body_material_resistance8DuPontTyvek/Tychem material compatibility ratings
ppe_products20VariousRepresentative PPE product catalog
ppe_ghs_hazard_mapping13GHS/OSHAGHS hazard codes to PPE requirements

Note: Seed data is populated via Alembic migrations in tellus-ehs-hazcom-service/alembic/versions/.

Schema Evolution Note

The addition of chemiq_company_product_catalog represents a key architectural improvement:

  • Before: Product data was denormalized in chemiq_inventory, causing duplication across sites/locations
  • After: Product data is normalized in company product catalog, inventory only stores site/location-specific quantities

This change supports the incremental product catalog building strategy outlined in BARCODE_PRODUCT_CATALOG_STRATEGY.md.

This unified architecture supports the core mission of the Tellus EHS platform: storing chemical inventory documents and generating actionable insights based on composition, hazards, and safety information.