ChemIQ Complete Schema Documentation

Overview

The ChemIQ module implements a comprehensive global repository architecture for chemical safety documents, consisting of three interconnected systems:

Product Catalog - Master searchable product repository
SDS (Safety Data Sheet) Repository - Global SDS document storage
EPA Label Repository - Global EPA pesticide/fungicide label storage

This unified architecture enables multiple companies to efficiently search, share, and access chemical safety documents while maintaining data isolation and compliance.

Key Design Principles

Global Repositories: Documents stored once, shared across companies via mapping tables
Product Catalog Hub: Single search point linking to all document types
S3 Integration: PDF files stored in AWS S3, metadata in PostgreSQL
Deduplication: SHA-256 checksums prevent duplicate storage
Version Tracking: Linked-list pattern for document revision history
Full-Text Search: PostgreSQL TSVECTOR with GIN indexes
Structured Data: JSONB for complex data, arrays for lists
Company Isolation: Many-to-many mapping tables for multi-tenancy
Scalability: Optimized indexes and partition-ready design

Architecture Overview

Four-Tier Document Management

┌─────────────────────────────────────────────────────────────┐
│            GLOBAL PRODUCT CATALOG                            │
│         (Master Searchable Repository - Shared)              │
│  - Product identification (name, manufacturer, codes)        │
│  - Multiple identifiers (CAS, UPC, GTIN, EPA Reg #)         │
│  - PRIMARY SEARCH VECTOR (unified search)                    │
│  - Populated by: API validation, AI agent promotion          │
└─────────────┬───────────────────────────────────────────────┘
              │ (optional link for enrichment)
              ▼
┌─────────────────────────────────────────────────────────────┐
│          COMPANY PRODUCT CATALOG (NEW)                       │
│         (Company-Specific Product Entries)                   │
│  - Company's own product naming and codes                    │
│  - Links to global catalog (optional)                        │
│  - SDS and EPA label attachments                             │
│  - User contributions stay here (not global)                 │
└─────────────┬───────────────────────────────────────────────┘
              │
              ├──────────────┬──────────────┬─────────────────┐
              ▼              ▼              ▼                 ▼
    ┌─────────────┐  ┌──────────────┐  ┌──────┐      ┌──────────┐
    │ SDS Docs    │  │ EPA Labels   │  │ COA  │      │Tech Sheet│
    └─────────────┘  └──────────────┘  └──────┘      └──────────┘
              │              │
              ▼              ▼
    ┌──────────────────────────────────────────────────────────┐
    │              COMPANY MAPPINGS                             │
    │      (Company-specific isolation & tracking)              │
    └──────────────────────────────────────────────────────────┘
              │
              ▼
    ┌──────────────────────────────────────────────────────────┐
    │            INVENTORY ITEMS (Simplified)                   │
    │    (Site/Location-specific inventory tracking)            │
    │    - Links to company product catalog                     │
    │    - Container size, quantity, location only              │
    └──────────────────────────────────────────────────────────┘

User Workflow

When user adds inventory item (Barcode Search Flow):

Check Company Product Catalog first - Search chemiq_company_product_catalog by barcode/product name (company's own products)
If not found, check Global Product Catalog - Search chemiq_product_catalog by barcode/product name
If found in global - Copy to company catalog, link to global_product_id
If not found, query External APIs - Open Food Facts, UPC Database, Barcode Lookup
If API returns data - Save to BOTH global catalog (api_validated) and company catalog
If still not found - User enters manually → saves to company catalog ONLY (not global)
Create inventory record linking to chemiq_company_product_catalog

Key Design Principle:

User-contributed products go to chemiq_company_product_catalog ONLY
Global chemiq_product_catalog is populated by:
- External API validated products (high confidence)
- AI Agent promotion (future): When 30%+ of companies have same product

Benefits:

Single search eliminates need to query multiple repositories
Product metadata stored once per company (no duplication across sites/locations)
Company isolation - each company has their own product catalog
Global enrichment - can link to global catalog for additional data
Data quality tracking - confidence scores and verification status
One product can have multiple document versions
Easy to add new document types (COA, tech sheets)

Part 1: Product Catalog System

Table 1: chemiq_product_catalog

Purpose: Master searchable product repository that serves as the central hub for all chemical products.

Key Features:

Stores master product identification across all companies
Multiple identifier types (product codes, CAS, UPC, GTIN, EPA reg #)
PRIMARY search vector for unified product search
Links to all document types via junction table
Tracks product lifecycle (active, discontinued, replacements)

Important Columns:

product_id (UUID, PK): Primary identifier
product_name (String(255), indexed): Product name
manufacturer (String(255), indexed): Manufacturer name
manufacturer_id (String(100)): Manufacturer's internal ID
brand_name (String(255)): Brand/trade name
product_codes (JSONB): Array of codes with types: [{"type": "manufacturer", "code": "ABC-123"}]
cas_numbers (Text[]): Primary CAS numbers
upc_barcodes (Text[]): Multiple barcodes
gtin (String(14)): Global Trade Item Number
product_category (String(100)): Chemical, Pesticide, etc.
product_type (String(100)): Insecticide, Herbicide, Cleaner, etc.
formulation_type (String(50)): Liquid, Powder, Granular, etc.
epa_reg_number (String(50), indexed): EPA registration number (for pesticides)
epa_product_type (String(100)): EPA product classification
is_restricted_use (Boolean, default=false): Restricted use pesticide flag
is_active (Boolean, indexed, default=true): Active product flag
is_discontinued (Boolean, default=false): Discontinued product flag
discontinued_date (Date): When product was discontinued
replacement_product_id (UUID, FK): Links to replacement product
search_vector (TSVECTOR): PRIMARY SEARCH VECTOR for all product searches
contributed_by_company_id (UUID, FK): Company that contributed this product
source_type (String(20)): manufacturer, user_contributed, epa_ppls
created_at, updated_at (Timestamp)

Indexes:

GIN index on search_vector (PRIMARY SEARCH INDEX)
GIN index on upc_barcodes (array search)
GIN index on cas_numbers (array search)
B-tree indexes on: product_name, manufacturer, epa_reg_number, is_active

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
    product_name, manufacturer, brand_name, product_category, product_type);

Relationships:

Self-referencing: replacement_product (for discontinued products)
One-to-many: product_documents (links to SDSs, EPA labels, etc.)

Use Cases:

Unified Search: Single search across all products and identifiers
Barcode Scanning: Quick lookup by UPC/GTIN
Product Discovery: Find products by name, manufacturer, CAS, EPA reg #
Product Lifecycle: Track discontinued products and replacements
Multi-Company Contribution: Companies can add products to shared catalog

Table 2: chemiq_product_documents

Purpose: Polymorphic junction table linking products to various document types (SDSs, EPA labels, COAs, tech sheets).

Key Features:

Links one product to multiple documents
Supports multiple document types (polymorphic association)
Tracks document validity periods
Supports multiple languages and regions
CHECK constraints enforce document type integrity

Important Columns:

product_document_id (UUID, PK): Primary identifier
product_id (UUID, FK, indexed): Links to product catalog
document_type (String(20), required): sds, epa_label, coa, tech_sheet
sds_id (UUID, FK): Links to SDS (if document_type = 'sds')
label_id (UUID, FK): Links to EPA label (if document_type = 'epa_label')
is_primary (Boolean, default=true): Primary document flag
document_language (String(10), default='en'): Document language
document_region (String(10)): US, CA, EU, etc.
valid_from, valid_until (Date): Document validity period
is_current (Boolean, default=true): Current document flag
created_at (Timestamp)

CHECK Constraint:

(document_type = 'sds' AND sds_id IS NOT NULL AND label_id IS NULL) OR
(document_type = 'epa_label' AND label_id IS NOT NULL AND sds_id IS NULL) OR
(document_type IN ('coa', 'tech_sheet'))

Relationships:

Many-to-one: product (Product Catalog)
Many-to-one: sds_document (SDS Documents)
Many-to-one: epa_label (EPA Label Documents)

Use Cases:

Link products to multiple SDSs (different languages, regions)
Link products to EPA labels for pesticides
Track document versions and validity periods
Support multi-language document management

Table 3: chemiq_company_product_catalog (NEW)

Purpose: Company-specific product catalog that sits between the global product catalog and inventory. Eliminates product data duplication across sites/locations within a company.

Key Features:

Company-specific product entries (one record per product per company)
Optional link to global product catalog for enrichment
Stores company's own product naming and codes
SDS and EPA label attachments at product level (not inventory level)
Data quality tracking (source, confidence, verification status)
Full-text search via TSVECTOR
Unique constraint per company on (product_name, manufacturer)

Problem Solved: Previously, product data (name, manufacturer, SDS) was duplicated in chemiq_inventory for every site/location. Now:

Product info stored ONCE per company in chemiq_company_product_catalog
Inventory records only store site/location-specific data (quantity, container size)
SDS/EPA labels attached to product, not individual inventory items

Important Columns:

company_product_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed, required): Company ownership
global_product_id (UUID, FK): Optional link to global catalog for enrichment

Product Identification:

product_name (String(255), required): Product name (company's naming)
manufacturer (String(255), required): Manufacturer name
product_code (String(100)): Company's internal product code
cas_number (String(50)): Primary CAS number
barcode_upc (String(100), indexed): UPC/EAN barcode (optional)
barcode_source (String(20)): How barcode was obtained: scanned, label_extracted, manual, api_validated
internal_sku (String(100), indexed): Company's internal product code/SKU (separate from UPC barcode)

Product Classification:

product_category (String(100)): Chemical, Pesticide, Cleaning, etc.
product_type (String(100)): Insecticide, Herbicide, Solvent, etc.

Pesticide Identification (for EPA-regulated products):

is_pesticide (Boolean, default=false): Flag indicating product is EPA-regulated (insecticide, herbicide, fungicide, disinfectant, etc.)
epa_registration_number (String(50), indexed): EPA Registration Number (e.g., 1234-567 or 1234-567-8901)
signal_word (String(20)): Required warning level: DANGER, WARNING, or CAUTION

Document Links (company-specific SDS/Label selection):

current_sds_id (UUID, FK): Links to chemiq_sds_documents
sds_attached_at (Timestamp): When SDS was attached
sds_attached_by_user_id (UUID, FK): User who attached SDS
sds_missing (Boolean, default=true): SDS missing flag
current_epa_label_id (UUID, FK): Links to chemiq_epa_label_documents
epa_label_attached_at (Timestamp): When EPA label was attached
epa_label_attached_by_user_id (UUID, FK): User who attached EPA label

Data Quality Tracking:

data_source (String(50), default='user_contributed'): api_validated, user_contributed, label_extracted, admin_curated
confidence_score (Numeric(3,2), default=0.5): 0.0-1.0 confidence
verification_status (String(20), default='unverified'): unverified, verified, needs_review, flagged
last_verified_at (Timestamp): Last verification timestamp
source_image_id (UUID): Link to label image if extracted via Vision LLM

Classification & Tagging:

hazard_categories (JSONB, default='[]', required): Denormalized array of GHS hazard categories derived from SDS (e.g., ["Flammable", "Acute Toxicity", "Corrosive"]). GIN-indexed for fast filtering.

Audit Fields:

is_active (Boolean, default=true): Active flag
created_at, updated_at (Timestamp)
created_by_user_id, updated_by_user_id (UUID, FK)

Full-Text Search:

search_vector (TSVECTOR): Auto-updated search index

Indexes:

GIN index on search_vector
GIN index on hazard_categories (for JSONB contains queries)
B-tree indexes on: company_id, global_product_id, product_name, manufacturer, barcode_upc, internal_sku, epa_registration_number
Composite index on (company_id, sds_missing) WHERE sds_missing = true
Partial index on is_pesticide WHERE is_pesticide = true (for efficient pesticide product queries)
Unique constraint on (company_id, product_name, manufacturer)
Unique constraint on (company_id, barcode_upc) WHERE barcode_upc IS NOT NULL - prevents duplicate barcodes per company

TSVECTOR Trigger:

CREATE TRIGGER tsvector_company_product_trigger BEFORE INSERT OR UPDATE
ON chemiq_company_product_catalog FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
    product_name, manufacturer, product_code, cas_number);

Relationships:

Many-to-one: company (Companies)
Many-to-one: global_product (Product Catalog) - optional enrichment link
Many-to-one: current_sds (SDS Documents)
Many-to-one: current_epa_label (EPA Label Documents)
One-to-many: inventory_items (Inventory)

Use Cases:

Store company-specific product catalog (no duplication across sites)
Link products to SDSs and EPA labels at company level
Track data quality and verification status
Support barcode scanning with company-specific products first
Enable fuzzy matching and duplicate detection within company
Identify pesticide products for EPA label search (via is_pesticide flag and epa_registration_number)
Track barcode source to distinguish trusted (scanned, API-validated) vs manual entries
Store company's internal SKU/reference codes separate from UPC barcodes
Future: AI agent promotion to global catalog when 30%+ companies have same product

Example Data Flow:

User adds "Clorox Wipes" at Site 1:
  → Check chemiq_company_product_catalog (not found)
  → Check chemiq_product_catalog (not found)
  → Query external APIs (found!)
  → Create record in chemiq_company_product_catalog (company_product_id = prod-1)
  → Create record in chemiq_inventory (links to prod-1, site-1)

User adds same product at Site 2:
  → Check chemiq_company_product_catalog (FOUND: prod-1)
  → Create record in chemiq_inventory (links to prod-1, site-2)
  → NO product duplication!

Table 4: chemiq_product_images

Purpose: Stores product label images uploaded during inventory entry. Supports multiple images per product for curved/multi-sided labels where product information is spread across different angles.

Key Features:

Polymorphic association - can link to either company product catalog OR global product catalog
Multi-image support (1-5 images per product)
S3 storage with presigned URL generation
SHA-256 checksums for deduplication
Tracks images used for AI vision extraction
Image metadata (dimensions, MIME type, original filename)

Important Columns:

image_id (UUID, PK): Primary identifier
company_product_id (UUID, FK, indexed): Links to company product catalog (polymorphic)
global_product_id (UUID, FK, indexed): Links to global product catalog (polymorphic)

Image Order:

image_order (Integer, default=1): Order for multi-image displays

S3 Storage:

s3_bucket (String(100), required): AWS S3 bucket
s3_key (Text, required): S3 object key
file_checksum (String(64), indexed, required): SHA-256 hash for deduplication
file_size (BigInteger): File size in bytes
mime_type (String(50), default='image/jpeg'): MIME type (image/jpeg, image/png)
original_filename (String(255)): Original uploaded filename

Image Metadata:

image_width (Integer): Image width in pixels
image_height (Integer): Image height in pixels
image_type (String(50), default='label'): Image type (label, product, container, etc.)

Extraction Metadata (for Vision LLM):

used_for_extraction (Boolean, default=false): Flag if image was used for AI label extraction
extraction_confidence (Numeric(3,2)): Extraction confidence score (0.00-1.00)

Audit Fields:

uploaded_by_company_id (UUID, FK): Company that uploaded the image
uploaded_by_user_id (UUID, FK): User who uploaded the image
created_at (Timestamp): Upload timestamp

CHECK Constraint (Polymorphic Enforcement):

(company_product_id IS NOT NULL AND global_product_id IS NULL) OR
(company_product_id IS NULL AND global_product_id IS NOT NULL)

This constraint ensures each image belongs to exactly one product type (company or global), never both or neither.

Indexes:

B-tree indexes on: company_product_id, global_product_id, file_checksum

Relationships:

Many-to-one: company_product (Company Product Catalog) - polymorphic
Many-to-one: global_product (Product Catalog) - polymorphic
Many-to-one: uploaded_by_company (Companies)
Many-to-one: uploaded_by_user (Users)

Use Cases:

Store product label images during inventory entry
Support multi-image uploads for curved/multi-sided labels
Track which images were used for AI vision extraction
Enable image display when viewing products
Deduplicate images across products using checksums
Generate presigned S3 URLs for secure frontend display

Example Data Flow:

User uploads 3 label images during product entry:
  → LabelExtractionService extracts product info via Vision LLM
  → ProductImageService uploads images to S3
  → 3 records created in chemiq_product_images (linked to company_product_id)
  → used_for_extraction = true, extraction_confidence = 0.85

User opens product later:
  → API returns product with images
  → Presigned S3 URLs generated (1 hour expiration)
  → Frontend displays all 3 label images

Part 2: EPA Label Repository

Table 5: chemiq_epa_label_documents

Purpose: Global repository of EPA pesticide/fungicide labels with comprehensive label data.

Key Features:

Global EPA label storage (shared across companies)
Complete EPA label metadata (active ingredients, target pests, application instructions)
S3 storage for label PDFs
Version tracking with linked-list pattern
Full-text search on label content
EPA PPLS API integration support

Important Columns:

label_id (UUID, PK): Primary identifier
epa_reg_number (String(50), indexed, required): EPA registration number
epa_est_number (String(50)): EPA establishment number
product_name (String(255), indexed, required): Product name
manufacturer (String(255), indexed, required): Manufacturer name
manufacturer_address (Text): Full address
emergency_phone (String(50)): Emergency contact number

Active Ingredients:

active_ingredients (JSONB): Structured array:

[{"name": "Glyphosate", "cas": "1071-83-6", "percentage": 41.0}]

inert_ingredients_percentage (Numeric(5,2)): Percentage of inert ingredients

Label Classification:

signal_word (String(20), indexed): DANGER, WARNING, CAUTION
epa_product_type (String(100)): Insecticide, Herbicide, Fungicide, etc.
formulation_type (String(50)): Liquid, Granular, Wettable Powder, etc.

Hazard Information:

hazard_statements (Text[]): Array of hazard statements
precautionary_statements (Text[]): Array of precautionary statements
first_aid_measures (JSONB): Structured first aid instructions

Use Instructions:

target_pests (JSONB): Structured pest/application data:

[{"pest": "ants", "application_rate": "1-2 oz per gallon"}]

application_sites (Text[]): [residential, commercial, agricultural]
application_methods (Text[]): [spray, bait, dust]
restricted_entry_interval (String(100)): REI in hours
preharvest_interval (String(100)): PHI in days

File Storage (S3):

s3_bucket (String(100), required): AWS S3 bucket
s3_key (Text, required): S3 object key
file_checksum (String(64), indexed, required): SHA-256 hash
file_size (BigInteger): File size in bytes
mime_type (String(50), default='application/pdf')

Parsing Metadata:

label_parsed (Boolean, default=false): Parsing complete flag
parse_confidence (Numeric(5,2)): Parse confidence score (0-100)
parse_errors (Text): Parse error messages

Version Tracking:

label_version (String(50)): Label version number
revision_date (Date, indexed, required): Label revision date
expiration_date (Date): Label expiration date
previous_version_id (UUID, FK): Links to older version
superseded_by_id (UUID, FK): Links to newer version
is_current (Boolean, indexed, default=true): Current version flag

Source Tracking:

source_type (String(20)): ppls_api, epa_website, manual_upload
source_url (Text): Original source URL
ppls_last_synced (Timestamp): Last EPA PPLS sync time
contributed_by_company_id (UUID, FK): Contributing company

Full-Text Search:

search_vector (TSVECTOR): Auto-updated search index

Timestamps:

created_at, updated_at (Timestamp)

Indexes:

GIN index on search_vector
GIN index on active_ingredients (JSONB)
GIN index on target_pests (JSONB)
Unique constraint on (epa_reg_number, revision_date)
B-tree indexes on: epa_reg_number, product_name, manufacturer, signal_word, revision_date, is_current

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_epa_label_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
    product_name, manufacturer, epa_reg_number, epa_product_type);

Relationships:

Self-referencing: previous_version, superseded_by (version chain)
One-to-many: company_mappings, product_links, parse_jobs, change_log

Use Cases:

Store EPA pesticide labels for lawn care, pest control companies
Track active ingredients and target pests
Search by EPA registration number
Version tracking for label updates
EPA PPLS API integration for automatic updates
Company-specific label library management

Table 6: chemiq_company_label_mappings

Purpose: Many-to-many relationship between companies and EPA labels, enabling the global repository model.

Key Features:

Links companies to EPA labels they use
Company-specific review status and notes
Compliance tracking (certifications, training)
Usage statistics
Alert management

Important Columns:

mapping_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed): Company identifier
label_id (UUID, FK, indexed): EPA label identifier

Usage Tracking:

mapped_to_inventory_count (Integer, default=0): Number of inventory items using this label

Company-Specific Metadata:

internal_notes (Text): Company-specific notes
review_status (String(20), indexed, default='pending'): pending, reviewed, approved, flagged
reviewed_by_user_id (UUID, FK): User who reviewed
reviewed_at (Timestamp): Review timestamp

Compliance Tracking:

applicator_certification_required (Boolean, default=false): Certification requirement flag
restricted_use (Boolean, default=false): Restricted use flag
company_training_completed (Boolean, default=false): Training completion flag

Alerts:

expiry_alert_sent (Boolean, default=false): Alert sent flag
last_alert_sent_at (Timestamp): Last alert timestamp

Timestamps:

first_mapped_at (Timestamp): When company first linked this label
last_accessed_at (Timestamp): Last access time

Indexes:

Unique constraint on (company_id, label_id)
B-tree index on review_status

Relationships:

Many-to-one: label_document (EPA Labels)
Many-to-one: reviewed_by_user (Users)

Use Cases:

Company-specific label libraries
Track which companies use which labels
Company review workflows
Compliance tracking for restricted use products
Usage analytics
Expiry alert management

Table 7: chemiq_label_parse_queue

Purpose: Job queue for asynchronous EPA label parsing operations.

Key Features:

Priority-based queue (1-10 scale)
Retry mechanism
Section-specific parsing
Error tracking

Important Columns:

job_id (UUID, PK): Job identifier
label_id (UUID, FK, indexed): Label to parse
job_status (String(20), indexed, default='pending'): pending, processing, completed, failed
priority (Integer, default=5): 1-10, higher = more urgent
parse_sections (Text[]): Sections to parse (active_ingredients, target_pests, etc.)
started_at, completed_at (Timestamp): Job timing
error_message (Text): Failure details
retry_count (Integer, default=0): Number of retry attempts
created_at (Timestamp)

Indexes:

Composite index on (job_status, priority DESC)

Relationships:

Many-to-one: label_document (EPA Labels)

Use Cases:

Background processing of uploaded labels
Retry failed parsing jobs
Prioritize parsing for high-priority products
Monitor parsing performance

Table 8: chemiq_label_change_log

Purpose: Audit trail for all EPA label document changes.

Key Features:

Complete audit history
Tracks user and company making changes
Stores diffs of changes
Change type categorization

Important Columns:

log_id (UUID, PK): Log entry identifier
label_id (UUID, FK, indexed): Label document
change_type (String(50)): created, updated, superseded, expired
changed_by_user_id (UUID, FK): User who made change
changed_by_company_id (UUID, FK): Company context
changes (JSONB): Diff of changes made
previous_values (JSONB): Previous state
created_at (Timestamp, indexed): When change occurred

Relationships:

Many-to-one: label_document (EPA Labels)
Many-to-one: changed_by_user (Users)

Use Cases:

Compliance audit trails
Track label updates over time
Investigate data quality issues
Generate change reports for regulators

Part 3: SDS Repository System

Table 9: chemiq_sds_documents

Purpose: Global repository of SDS documents with core metadata and S3 storage references.

Key Features:

Stores product identification (Section 1)
S3 bucket and key for PDF storage
SHA-256 checksum for deduplication
Full-text search via search_vector (TSVECTOR)
Version tracking with previous_version_id and superseded_by_id
Unique constraint on (manufacturer, product_name, revision_date)

Important Columns:

sds_id (UUID, PK): Primary identifier
product_name (String(255), indexed): Chemical product name
manufacturer (String(255), indexed): Manufacturer name
product_code (String(100)): Manufacturer's product code
manufacturer_address (Text): Full address
emergency_phone (String(50)): Emergency contact number

Document Metadata:

revision_date (Date, indexed, required): SDS revision date
revision_number (String(50)): Revision version number
document_language (String(10), default='en'): Document language

File Storage (S3):

s3_bucket (String(100), required): AWS S3 bucket
s3_key (Text, required): S3 object key
file_checksum (String(64), indexed, required): SHA-256 hash
file_size (BigInteger): File size in bytes
mime_type (String(50), default='application/pdf')

Parsing Metadata:

sds_parsed (Boolean, default=false): Parsing complete flag
parse_confidence (Numeric(5,2)): Parse confidence score
parse_errors (Text): Parse error messages (JSON array)
parsed_json (JSONB): Complete LLM-parsed SDS response for reference and debugging

Version Tracking:

previous_version_id (UUID, FK): Links to older version
superseded_by_id (UUID, FK): Links to newer version
is_current (Boolean, indexed, default=true): Current version flag

Source Tracking:

source_type (String(20)): web_search, manual_upload, manufacturer_api, ppls
source_url (Text): Original source URL
contributed_by_company_id (UUID, FK): Contributing company

Full-Text Search:

search_vector (TSVECTOR): Auto-updated search index

Timestamps:

created_at, updated_at (Timestamp)

Indexes:

GIN index on search_vector
Unique constraint on (manufacturer, product_name, revision_date)
B-tree indexes on: product_name, manufacturer, revision_date, file_checksum, is_current

TSVECTOR Trigger:

CREATE TRIGGER tsvector_update_trigger BEFORE INSERT OR UPDATE
ON chemiq_sds_documents FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english',
    product_name, manufacturer, product_code);

Relationships:

Self-referencing: previous_version, superseded_by (version chain)
One-to-one: hazard_info
One-to-many: composition, sections, company_mappings, parse_jobs, change_log

Use Cases:

Global SDS storage shared across companies
Version tracking for SDS updates
Deduplication via checksums
Full-text search across all SDSs
S3 integration for PDF storage

Table 10: chemiq_sds_hazard_info

Purpose: Stores GHS (Global Harmonized System) hazard classification data from SDS Section 2.

Key Features:

GHS pictograms and signal words
H-codes (Hazard statements) as array
P-codes (Precautionary statements) as array
JSONB for structured hazard classes
GIN indexes for fast array and JSONB searches

Important Columns:

hazard_id (UUID, PK): Primary identifier
sds_id (UUID, FK, indexed): Links to SDS document
signal_word (String(20), indexed): "Danger" or "Warning"
pictograms (JSONB): Array of GHS pictogram codes (e.g., ["GHS02", "GHS07"])

hazard_classes (JSONB): Structured hazard data:

[{"class": "Flammable liquids", "category": "2", "code": "H225"}]

hazard_statements (Text[]): Array of H-codes (e.g., ["H225", "H319"])
precautionary_statements (Text[]): Array of P-codes (e.g., ["P210", "P233"])
supplemental_hazards (Text): Additional hazard information
hazard_description_vector (TSVECTOR): Full-text search on hazard descriptions
created_at (Timestamp)

Indexes:

GIN indexes on pictograms, hazard_classes, hazard_statements
B-tree index on signal_word

Relationships:

One-to-one: sds_document (SDS Documents)

Use Cases:

Search for chemicals by hazard class (flammable, corrosive, toxic)
Find all SDSs with specific pictograms (e.g., flame, skull)
Query by H-codes or P-codes
Generate hazard summaries for locations

Table 11: chemiq_sds_composition

Purpose: Stores chemical composition and ingredients from SDS Section 3.

Key Features:

Individual ingredient records with CAS numbers
Concentration ranges and exact values
Hazardous ingredient flagging
Support for trade secrets

Important Columns:

composition_id (UUID, PK): Primary identifier
sds_id (UUID, FK, indexed): Links to SDS document
chemical_name (String(255), required): Ingredient name
cas_number (String(50), indexed): Chemical Abstracts Service number
ec_number, reach_number (String(50)): European chemical identifiers
concentration_min, concentration_max (Numeric(5,2)): Concentration range
concentration_exact (Numeric(5,2)): Exact concentration
concentration_text (String(100)): Text representation (e.g., "< 1%")
is_trade_secret (Boolean, default=false): Protected ingredient flag
is_hazardous (Boolean, indexed, default=false): Hazardous ingredient flag
ingredient_hazards (Text[]): H-codes specific to this ingredient
created_at (Timestamp)

Indexes:

B-tree index on cas_number
B-tree index on is_hazardous

Relationships:

Many-to-one: sds_document (SDS Documents)

Use Cases:

Search for products containing specific chemicals (by CAS)
Find all hazardous ingredients across inventory
Calculate total hazardous chemical quantities
Generate ingredient disclosure reports

Table 12: chemiq_sds_sections

Purpose: Stores SDS sections 4-16 with both structured (JSONB) and raw text content.

Key Features:

Sections 4-16: First Aid, Fire-Fighting, Accidental Release, Handling & Storage, Exposure Controls, Physical/Chemical Properties, Stability, Toxicological Info, Ecological Info, Disposal, Transport, Regulatory, Other
JSONB for structured data extraction
Full-text search on section content
Check constraint: section_number between 1 and 16

Important Columns:

section_id (UUID, PK): Primary identifier
sds_id (UUID, FK, indexed): Links to SDS document
section_number (Integer, indexed, required): Section number (1-16)
section_title (String(255), required): Section title
section_data (JSONB): Structured parsed data
section_text (Text): Raw text content
section_text_vector (TSVECTOR): Full-text search index
parsed_at (Timestamp): When section was parsed

CHECK Constraint:

section_number >= 1 AND section_number <= 16

Indexes:

GIN indexes on section_data and section_text_vector
B-tree index on section_number
Unique constraint on (sds_id, section_number)

Relationships:

Many-to-one: sds_document (SDS Documents)

Use Cases:

Search for first aid measures across all SDSs
Query storage requirements by keyword
Extract disposal information for waste management
Generate emergency response guides

Table 13: chemiq_company_sds_mappings

Purpose: Many-to-many relationship between companies and SDSs, enabling the global repository model.

Key Features:

Tracks which companies use which SDSs
Company-specific review status
Usage statistics
Internal notes per company

Important Columns:

mapping_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed): Company identifier
sds_id (UUID, FK, indexed): SDS document identifier
mapped_to_inventory_count (Integer, default=0): Number of inventory items using this SDS
internal_notes (Text): Company-specific notes
review_status (String(20), indexed, default='pending'): pending, reviewed, approved, flagged
reviewed_by_user_id (UUID, FK): User who reviewed
reviewed_at (Timestamp): Review timestamp
expiry_alert_sent (Boolean, default=false): Alert tracking
last_alert_sent_at (Timestamp): Last alert timestamp
first_mapped_at (Timestamp): When company first linked this SDS
last_accessed_at (Timestamp): Last access time

Indexes:

Unique constraint on (company_id, sds_id)
B-tree index on review_status

Relationships:

Many-to-one: sds_document (SDS Documents)
Many-to-one: reviewed_by_user (Users)

Use Cases:

Isolate company data while sharing SDS repository
Track company-specific review workflows
Monitor SDS usage across companies
Send expiry alerts per company

Table 14: chemiq_sds_parse_queue

Purpose: Job queue for asynchronous SDS parsing operations.

Key Features:

Priority-based queue (1-10 scale)
Retry mechanism
Section-specific parsing
Error tracking

Important Columns:

job_id (UUID, PK): Job identifier
sds_id (UUID, FK, indexed): SDS to parse
job_status (String(20), indexed, default='pending'): pending, processing, completed, failed
priority (Integer, default=5): 1-10, higher = more urgent
parse_sections (Integer[], default=[1-16]): Which sections to parse
started_at, completed_at (Timestamp): Job timing
error_message (Text): Failure details
retry_count (Integer, default=0): Number of retry attempts
created_at (Timestamp)

Indexes:

Composite index on (job_status, priority DESC)

Relationships:

Many-to-one: sds_document (SDS Documents)

Use Cases:

Background processing of uploaded SDSs
Retry failed parsing jobs
Prioritize parsing for high-priority chemicals
Monitor parsing performance

Table 15: chemiq_sds_change_log

Purpose: Audit trail for all SDS document changes.

Key Features:

Complete audit history
Tracks user and company making changes
Stores diffs of changes
Change type categorization

Important Columns:

log_id (UUID, PK): Log entry identifier
sds_id (UUID, FK, indexed): SDS document
change_type (String(50)): created, updated, superseded, deprecated
changed_by_user_id (UUID, FK): User who made change
changed_by_company_id (UUID, FK): Company context
changes (JSONB): Diff of changes made
previous_values (JSONB): Previous state
created_at (Timestamp, indexed): When change occurred

Relationships:

Many-to-one: sds_document (SDS Documents)
Many-to-one: changed_by_user (Users)

Use Cases:

Compliance audit trails
Track SDS updates over time
Investigate data quality issues
Generate change reports for regulators

Table 16: chemiq_sds_web_search_cache

Purpose: Caches Perplexity API web search results for SDS documents to reduce API costs and improve response times.

Key Features:

Company-scoped cache entries
7-day TTL for cache entries
Normalized search criteria for matching
Tracks cache hits and access patterns
Links to imported SDS when search results are used

Important Columns:

cache_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed, required): Company that performed the search

Search Criteria (normalized for matching):

product_name_normalized (String(255), required): Normalized product name (lowercase, common suffixes removed)
manufacturer_normalized (String(255), required): Normalized manufacturer name
barcode_upc (String(100)): Optional barcode for more specific matching
search_type (String(20), default='sds', required): Type of search: sds, epa_label

Search Metadata:

search_query_used (Text, required): The actual query sent to Perplexity API
searched_at (Timestamp with TZ, required): When the search was performed

Cached Results:

results (JSONB, required): Array of WebSDSResult objects:

[
  {
    "title": "Product SDS - Manufacturer",
    "url": "https://example.com/sds.pdf",
    "snippet": "Safety Data Sheet for...",
    "source_domain": "example.com",
    "date": "2024-01-15",
    "is_pdf": true,
    "confidence_score": 0.85
  }
]

result_count (Integer, default=0): Number of results cached

Tracking:

created_by_user_id (UUID, FK): User who initiated the search
times_accessed (Integer, default=1): Number of times cache was accessed
last_accessed_at (Timestamp with TZ, required): Last access time

Cleanup Tracking:

sds_imported (Boolean, default=false): Whether an SDS was imported from these results
imported_sds_id (UUID, FK): Links to the imported SDS document

Indexes:

B-tree index on company_id
B-tree index on searched_at
B-tree index on sds_imported
Unique constraint on (company_id, product_name_normalized, manufacturer_normalized)

Relationships:

Many-to-one: company (Companies)
Many-to-one: created_by_user (Users)
Many-to-one: imported_sds (SDS Documents)

Use Cases:

Cache Perplexity API search results to avoid repeated API calls
Share cache within company (same product searches return cached results)
Track which searches led to SDS imports
Enable cache cleanup after successful imports
Monitor API usage patterns

Cache Strategy:

User searches for SDS on web
Check cache for matching (company_id, product_name_normalized, manufacturer_normalized)
If cache hit and not expired (< 7 days) and not imported, return cached results
If cache miss, call Perplexity API and cache results
When user imports SDS from results, mark cache as imported

Table 17: chemiq_sds_bulk_upload_jobs

Purpose: Tracks bulk upload jobs for ZIP files containing multiple SDS PDFs, enabling background processing of large-scale SDS imports.

Key Features:

Asynchronous processing for large uploads (500+ files)
Job status tracking (pending, extracting, processing, completed, failed)
Progress metrics (total files, processed, successful, failed, duplicates)
Detailed per-file results stored as JSONB
S3 storage for uploaded ZIP files
Company and user association for audit

Important Columns:

job_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed, required): Company that initiated the upload
uploaded_by_user_id (UUID, FK): User who uploaded the ZIP

File Information:

s3_bucket (String(100), required): S3 bucket for uploaded ZIP
s3_key (Text, required): S3 object key
file_name (String(255), required): Original ZIP filename
file_size (BigInteger): File size in bytes

Job Status:

job_status (String(20), indexed, default='pending'): pending, extracting, processing, completed, failed

Progress Tracking:

total_files (Integer, default=0): Total PDFs found in ZIP
processed_files (Integer, default=0): Number processed so far
successful_uploads (Integer, default=0): Successful uploads
failed_uploads (Integer, default=0): Failed uploads
duplicates_found (Integer, default=0): Duplicate files skipped

Results:

results (JSONB): Detailed per-file results array:

[
  {"filename": "product_a.pdf", "success": true, "sds_id": "uuid", "is_duplicate": false},
  {"filename": "product_b.pdf", "success": false, "error_message": "Invalid PDF", "is_duplicate": false}
]

error_message (Text): Job-level error message if failed

Timestamps:

created_at (Timestamp, required): When job was created
started_at (Timestamp): When processing started
completed_at (Timestamp): When processing completed

Indexes:

B-tree index on company_id
B-tree index on job_status

Relationships:

Many-to-one: company (Companies)
Many-to-one: uploaded_by_user (Users)

Use Cases:

Large-scale SDS library migrations (500+ documents)
Background processing that survives browser closure
Progress tracking and status polling from frontend
Detailed audit trail of bulk import operations
Error tracking for failed file processing

Processing Flow:

User uploads ZIP file via /bulk-upload-zip endpoint
ZIP is uploaded to S3, job record created with status='pending'
Background worker picks up pending jobs
Worker extracts ZIP, processes each PDF
Progress updated in real-time (processed_files, successful_uploads, etc.)
Frontend polls /bulk-upload-status/{job_id} every 5 seconds
Job marked completed or failed when done

Example Job Status Response:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "job_status": "processing",
  "file_name": "sds_library_2024.zip",
  "total_files": 500,
  "processed_files": 250,
  "successful_uploads": 240,
  "failed_uploads": 5,
  "duplicates_found": 5
}

Part 4: Inventory Integration

Table 18: chemiq_inventory (Simplified)

Purpose: Site/location-specific chemical inventory tracking. Now simplified to only store inventory-specific data, with product information moved to chemiq_company_product_catalog.

Schema Change Summary: After the company product catalog decomposition, this table is significantly simplified:

REMOVED: product_name, manufacturer, product_code, cas_number, barcode_upc (moved to company product catalog)
REMOVED: current_sds_id, sds_missing, sds_attached_at, sds_attached_by_user_id (moved to company product catalog)
REMOVED: current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id (moved to company product catalog)
REMOVED: product_catalog_id (replaced by company_product_id)
ADDED: company_product_id (links to chemiq_company_product_catalog)

Key Columns (Simplified Structure):

chemical_id (UUID, PK): Primary identifier
company_product_id (UUID, FK, indexed, required): NEW - Links to chemiq_company_product_catalog
company_id (UUID, FK, indexed): Company identifier
site_id (UUID, FK, indexed): Site identifier
location_id (UUID, FK): Location identifier (optional)

Inventory-Specific Data (retained):

container_size (Numeric): Container size value
size_unit (String(20)): Unit of measure (gal, oz, L, kg, etc.)
quantity (Integer, default=1): Number of containers at this location

Note: batch_lot_number and expiration_date are stored in chemiq_inventory_attributes table, not directly in this table.

Audit Fields:

is_active (Boolean, default=true): Active flag
created_at, updated_at (Timestamp)
created_by_user_id, updated_by_user_id (UUID, FK)

Indexes:

B-tree indexes on: company_product_id, company_id, site_id, location_id
Composite index on (company_id, site_id, is_active)
Unique constraint on (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true - prevents duplicate products at same location

Relationships:

Many-to-one: company_product (Company Product Catalog) - PRIMARY LINK
Many-to-one: company, site, location
Many-to-one: created_by_user, updated_by_user
One-to-many: attributes, reviews_pending, label_images

Benefits of Simplified Structure:

Before (Denormalized)	After (Normalized)
Product data duplicated per site/location	Product data stored once per company
SDS attached per inventory item	SDS attached to product (shared across sites)
15+ columns per inventory record	~8 columns per inventory record
Inconsistent naming across sites	Single canonical name per product per company

Example:

Before (denormalized):
| chemical_id | site_id | product_name   | manufacturer | current_sds_id | quantity |
|-------------|---------|----------------|--------------|----------------|----------|
| inv-1       | site-1  | Clorox Wipes   | Clorox       | sds-123        | 5        |
| inv-2       | site-2  | Clorox Wipes   | Clorox       | sds-123        | 10       |  ← Duplicate!

After (normalized):
chemiq_company_product_catalog:
| company_product_id | product_name  | manufacturer | current_sds_id |
|-------------------|---------------|--------------|----------------|
| prod-1            | Clorox Wipes  | Clorox       | sds-123        |

chemiq_inventory:
| chemical_id | company_product_id | site_id | quantity |
|-------------|-------------------|---------|----------|
| inv-1       | prod-1            | site-1  | 5        |
| inv-2       | prod-1            | site-2  | 10       |  ← No duplication!

Use Cases:

Track inventory quantities per site/location
Link to company product catalog for product details
Support multi-site inventory management
Container and quantity tracking only
Simplified data model with no product duplication

Table 18b: chemiq_inventory_attributes

Purpose: Stores extended inventory item attributes such as batch/lot numbers, expiration dates, manufacturing dates, and storage conditions. Separated from the main inventory table to keep the core inventory record lightweight.

Key Features:

One-to-one relationship with inventory items
Batch/lot tracking for traceability
Expiration and manufacture date tracking
Storage condition notes
Cascading delete with parent inventory record

Important Columns:

Column	Type	Constraints	Description
attribute_id	UUID	PK	Primary identifier
chemical_id	UUID	FK, indexed, required	Links to chemiq_inventory
batch_lot_number	String(100)		Batch or lot number
expiration_date	Date		Product expiration date
manufacture_date	Date		Product manufacture date
storage_conditions	Text		Storage condition notes
notes	Text		General notes
created_at	Timestamp	required, default=now	Creation time

Indexes:

B-tree index on chemical_id

Relationships:

Many-to-one: chemical → ChemIQInventory (CASCADE delete)

Use Cases:

Track batch/lot numbers for product recalls
Monitor expiration dates for shelf life management
Record storage conditions for compliance
Support traceability requirements

Table 18c: chemiq_sds_reviews_pending

Purpose: Tracks inventory items that need SDS review — typically created when chemicals are added without an SDS attached, or when bulk imports identify gaps.

Key Features:

Tracks SDS review status per inventory item
Company and site scoped
Assignable to specific users for resolution
Status workflow: pending → resolved
Cascading delete with parent inventory, company, and site records

Important Columns:

Column	Type	Constraints	Description
review_id	UUID	PK	Primary identifier
chemical_id	UUID	FK, required	Links to chemiq_inventory
company_id	UUID	FK, required	Company scope
site_id	UUID	FK, indexed, required	Site scope
review_reason	String(50)	required	Reason for review (e.g., missing_sds, expired_sds)
status	String(20)	required, default='pending'	pending, resolved
assigned_to_user_id	UUID	FK	User assigned to resolve
created_at	Timestamp	required, default=now	Creation time
resolved_at	Timestamp		When review was resolved

Indexes:

B-tree index on site_id
B-tree index on status

Relationships:

Many-to-one: chemical → ChemIQInventory (CASCADE delete)
Many-to-one: company → Company (CASCADE delete)
Many-to-one: site → CompanySite (CASCADE delete)
Many-to-one: assigned_to_user → User (SET NULL on delete)

Use Cases:

Track missing SDS documents after bulk imports
Create review tasks for safety coordinators
Monitor SDS compliance gaps per site
Generate "Missing SDS" reports for Sentinel alerts
Support fix-and-retry workflow for bulk imports

Search and Performance Optimization

Full-Text Search Strategy

Primary Search Flow (Updated for Company Product Catalog):

User searches for product by name, manufacturer, barcode, CAS, EPA reg #
First: Query chemiq_company_product_catalog.search_vector (company's own products)
If not found: Query chemiq_product_catalog.search_vector (global catalog)
If not found: Query external APIs (Open Food Facts, UPC Database)
Product Catalog returns products with links to all documents
Optional: Drill down to SDS/Label search if needed

TSVECTOR Columns with GIN Indexes:

chemiq_company_product_catalog.search_vector - COMPANY SEARCH (product name, manufacturer, product code, CAS)
chemiq_product_catalog.search_vector - GLOBAL SEARCH (product name, manufacturer, brand, category)
chemiq_epa_label_documents.search_vector - Label content
chemiq_sds_documents.search_vector - SDS metadata
chemiq_sds_hazard_info.hazard_description_vector - Hazard descriptions
chemiq_sds_sections.section_text_vector - Section content

Auto-Update Triggers:

All search_vector columns have triggers for automatic updates on INSERT/UPDATE

Array and JSONB Search

GIN Indexes for Fast Queries:

chemiq_product_catalog.upc_barcodes (Text[])
chemiq_product_catalog.cas_numbers (Text[])
chemiq_epa_label_documents.active_ingredients (JSONB)
chemiq_epa_label_documents.target_pests (JSONB)
chemiq_sds_hazard_info.pictograms (JSONB)
chemiq_sds_hazard_info.hazard_classes (JSONB)
chemiq_sds_hazard_info.hazard_statements (Text[])
chemiq_sds_sections.section_data (JSONB)

Deduplication Strategy

Product Catalog:

No strict uniqueness (multiple sources can contribute same product)
Merge/deduplication handled at application level

SDS Documents:

Primary: SHA-256 checksum (file_checksum)
Secondary: Unique constraint on (manufacturer, product_name, revision_date)

EPA Label Documents:

Primary: SHA-256 checksum (file_checksum)
Secondary: Unique constraint on (epa_reg_number, revision_date)

Version Tracking

Linked-List Pattern

All document types (Product Catalog, SDS, EPA Labels) use the same version tracking pattern:

Fields:

previous_version_id: Links to older version
superseded_by_id: Links to newer version (or NULL if current)
is_current: Boolean flag for active version

Example Version Chain:

Doc v1.0 (2020) <- previous_version_id <- Doc v2.0 (2022) <- previous_version_id <- Doc v3.0 (2024)
    |                                          |                                          |
    superseded_by_id ->                  superseded_by_id ->                      is_current = true

Replacement Products (Product Catalog only):

replacement_product_id: Links to replacement product for discontinued items

Use Cases:

Track document revision history
Compare changes between versions
Alert companies when new versions available
Maintain compliance with version requirements
Handle product discontinuation and replacements

Unified Search Architecture

Example Search Flow

User searches: "Roundup Pro Max"

-- Step 1: Search Product Catalog (PRIMARY SEARCH)
SELECT
    p.product_id,
    p.product_name,
    p.manufacturer,
    p.epa_reg_number,
    p.cas_numbers,
    p.upc_barcodes
FROM chemiq_product_catalog p
WHERE p.search_vector @@ to_tsquery('english', 'roundup & pro & max')
  AND p.is_active = true
ORDER BY ts_rank(p.search_vector, to_tsquery('english', 'roundup & pro & max')) DESC
LIMIT 10;

-- Step 2: Get all documents for selected product
SELECT
    pd.document_type,
    pd.is_primary,
    pd.document_language,
    pd.document_region,
    s.product_name AS sds_product_name,
    s.revision_date AS sds_revision_date,
    l.product_name AS label_product_name,
    l.epa_reg_number,
    l.revision_date AS label_revision_date
FROM chemiq_product_documents pd
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE pd.product_id = :product_id
  AND pd.is_current = true;

-- Step 3: Check if company already has access
SELECT * FROM chemiq_company_sds_mappings
WHERE company_id = :company_id AND sds_id = :sds_id;

SELECT * FROM chemiq_company_label_mappings
WHERE company_id = :company_id AND label_id = :label_id;

-- Step 4: Create mappings if needed (application logic)
-- Step 5: Link to inventory item

Database Migrations

Migration 1: SDS Global Schema

File: 2025_12_03_1826-16a8bffcb66e_redesign_sds_global_schema.py

Tables Created:

chemiq_sds_documents
chemiq_sds_hazard_info
chemiq_sds_composition
chemiq_sds_sections
chemiq_company_sds_mappings
chemiq_sds_parse_queue
chemiq_sds_change_log

Updates: chemiq_inventory (added current_sds_id, sds_attached_at, sds_attached_by_user_id)

Migration 2: Product Catalog and EPA Labels

File: 2025_12_03_2346-e15d8abd9d92_add_product_catalog_and_epa_labels.py

Tables Created:

chemiq_product_catalog
chemiq_epa_label_documents
chemiq_product_documents
chemiq_company_label_mappings
chemiq_label_parse_queue
chemiq_label_change_log

Updates: chemiq_inventory (added product_catalog_id, current_epa_label_id, epa_label_attached_at, epa_label_attached_by_user_id)

Migration 3: Product Images

File: 2025_12_17_2333-9a5f77cdd3a6_add_chemiq_product_images_table.py

Tables Created:

chemiq_product_images

Key Features:

Polymorphic association to company or global product catalog
S3 storage for product label images
Multi-image support for curved/multi-sided labels
Tracks images used for Vision LLM extraction

Migration 4: Unique Constraints for Barcode and Inventory Location

File: 2025_12_25_1545-b7aa5678347e_add_unique_constraints_for_barcode_and_.py

Constraints Added:

uq_company_product_barcode on chemiq_company_product_catalog (company_id, barcode_upc) WHERE barcode_upc IS NOT NULL
uq_inventory_product_location on chemiq_inventory (company_product_id, site_id, COALESCE(location_id, NULL_UUID)) WHERE is_active = true

Key Features:

Prevents duplicate barcodes per company in product catalog
Prevents same product appearing twice at same location in inventory
Includes data cleanup logic to merge existing duplicates before applying constraints

Migration 5: Add search_type to SDS Web Search Cache

File: 2026_01_31_2019-8d1fcec31661_add_search_type_to_sds_web_search_cache.py

Changes:

Added search_type column (String(20), default='sds') to chemiq_sds_web_search_cache

Migration 6: Classification & Tagging

File: 2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py

Tables Created:

chemiq_company_tags - company-scoped custom tags
chemiq_product_tags - product-to-tag junction
chemiq_ai_hazard_summaries - AI hazard summaries

Column Added:

hazard_categories (JSONB, default='[]') on chemiq_company_product_catalog with GIN index

SQLAlchemy Models

SDS Models

Location: tellus-ehs-hazcom-service/app/db/models/chemiq_sds.py

Models:

SDSDocument - Main SDS model (includes parsed_json JSONB column for LLM responses)
SDSHazardInfo - Hazard information
SDSComposition - Chemical ingredients
SDSSection - SDS sections 4-16
CompanySDSMapping - Company relationships
SDSParseJob - Parse queue
SDSChangeLog - Audit trail
SDSWebSearchCache - Perplexity API search result cache
SDSBulkUploadJob - ZIP bulk upload job tracking

Product Catalog & EPA Label Models

Location: tellus-ehs-hazcom-service/app/db/models/chemiq_product_catalog.py

Models:

ProductCatalog - Master product repository
CompanyProductCatalog - Company-specific product catalog
ProductImage - Product label images (S3 storage, polymorphic to company/global products)
EPALabelDocument - EPA label repository
ProductDocument - Junction table (polymorphic)
CompanyLabelMapping - Company-label relationships
LabelParseJob - Parse queue
LabelChangeLog - Audit trail

Key Relationships:

ProductCatalog
├── product_documents (one-to-many) → ProductDocument
│   ├── sds_document (many-to-one) → SDSDocument
│   └── epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── replacement_product (self-referencing)

CompanyProductCatalog
├── current_sds (many-to-one) → SDSDocument
├── current_epa_label (many-to-one) → EPALabelDocument
├── images (one-to-many) → ProductImage
├── global_product (many-to-one) → ProductCatalog
├── inventory_items (one-to-many) → ChemIQInventory
├── product_tags (one-to-many) → ProductTag
├── ai_hazard_summary (one-to-one) → AIHazardSummary

ProductImage
├── company_product (many-to-one) → CompanyProductCatalog (polymorphic)
├── global_product (many-to-one) → ProductCatalog (polymorphic)
├── uploaded_by_company (many-to-one) → Company
├── uploaded_by_user (many-to-one) → User

EPALabelDocument
├── company_mappings (one-to-many) → CompanyLabelMapping
├── product_links (one-to-many) → ProductDocument
├── parse_jobs (one-to-many) → LabelParseJob
├── change_log (one-to-many) → LabelChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)

SDSDocument
├── hazard_info (one-to-one) → SDSHazardInfo
├── composition (one-to-many) → SDSComposition
├── sections (one-to-many) → SDSSection
├── company_mappings (one-to-many) → CompanySDSMapping
├── parse_jobs (one-to-many) → SDSParseJob
├── change_log (one-to-many) → SDSChangeLog
├── previous_version (self-referencing)
└── superseded_by (self-referencing)

Example Queries

Search product by barcode

SELECT * FROM chemiq_product_catalog
WHERE '885126012345' = ANY(upc_barcodes);

Find product and all its documents

SELECT
    p.product_name,
    p.manufacturer,
    pd.document_type,
    COALESCE(s.product_name, l.product_name) AS doc_product_name,
    COALESCE(s.revision_date, l.revision_date) AS doc_revision_date
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
LEFT JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
LEFT JOIN chemiq_epa_label_documents l ON pd.label_id = l.label_id
WHERE p.product_id = :product_id
  AND pd.is_current = true;

Search EPA labels by target pest

SELECT * FROM chemiq_epa_label_documents
WHERE target_pests @> '[{"pest": "ants"}]'::jsonb
  AND is_current = true;

Find all products containing a specific chemical (CAS)

SELECT
    p.product_name,
    p.manufacturer,
    c.chemical_name,
    c.cas_number,
    c.concentration_max
FROM chemiq_product_catalog p
JOIN chemiq_product_documents pd ON p.product_id = pd.product_id
JOIN chemiq_sds_documents s ON pd.sds_id = s.sds_id
JOIN chemiq_sds_composition c ON s.sds_id = c.sds_id
WHERE c.cas_number = '67-64-1'  -- Acetone
  AND p.is_active = true
  AND s.is_current = true;

Get company's label library with usage stats

SELECT
    l.epa_reg_number,
    l.product_name,
    l.manufacturer,
    l.signal_word,
    m.mapped_to_inventory_count,
    m.review_status,
    m.restricted_use
FROM chemiq_epa_label_documents l
JOIN chemiq_company_label_mappings m ON l.label_id = m.label_id
WHERE m.company_id = :company_id
  AND l.is_current = true
ORDER BY m.mapped_to_inventory_count DESC;

Compliance and Regulatory

OSHA Hazard Communication Standard (HazCom 2012)

Requirements Met:

Section 1: Product Identification ✓
Section 2: Hazard Classification (GHS) ✓
Section 3: Composition/Ingredients ✓
Sections 4-16: Required information ✓
Version tracking for updates ✓
Audit trail for changes ✓

EPA FIFRA (Pesticides)

Integration: chemiq_epa_label_documents table PPLS API: Automatic updates for pesticide labels via ppls_last_synced EPA Reg No: Tracked in both Product Catalog and EPA Labels Compliance Tracking: Company-specific certifications and training via chemiq_company_label_mappings

Data Retention

Active Documents: Kept indefinitely Superseded Documents: Retained for 30+ years (OSHA/EPA requirement) Change Logs: Retained for 7+ years (audit requirement) Discontinued Products: Flagged but never deleted (regulatory requirement)

Part 5: Bulk Import and Supporting Tables

Table 19: chemiq_inventory_uploads

Purpose: Tracks bulk upload operations for chemical inventory, providing import history, metrics, and audit trails.

Key Features:

Bulk import tracking for CSV and invoice PDF uploads
Import summary with detailed row-level metrics
Job status tracking (pending, processing, completed, failed)
Error tracking and reporting
SDS gap detection
Compliance audit trail

Important Columns:

upload_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed): Company that performed the upload
site_id (UUID, FK, indexed): Target site for the import
uploaded_by_user_id (UUID, FK): User who initiated the upload

File Information:

file_name (String(255), required): Original filename
file_type (String(20)): csv, pdf_invoice
s3_bucket (String(100)): S3 bucket for uploaded file
s3_key (Text): S3 object key
file_size (BigInteger): File size in bytes

Upload Status:

upload_status (String(20), indexed, default='pending'): pending, processing, completed, failed, partial_success

Import Metrics:

total_rows (Integer, default=0): Total rows in file
rows_created (Integer, default=0): Successfully created records
rows_updated (Integer, default=0): Updated existing records
rows_skipped (Integer, default=0): Skipped (duplicates or validation failures)
rows_errored (Integer, default=0): Failed to process

SDS Tracking:

unresolved_sds_count (Integer, default=0): Number of items without SDS attached
auto_matched_sds_count (Integer, default=0): Number of auto-matched SDSs

Processing Metadata:

processing_mode (String(20)): transactional, fix_and_retry

validation_errors (JSONB): Array of row-level validation errors:

[{"row": 5, "field": "product_name", "error": "Product name is required"}]

error_summary (Text): Human-readable error summary

Timing:

started_at (Timestamp): When processing started
completed_at (Timestamp): When processing completed
created_at (Timestamp): When upload was initiated

Indexes:

B-tree indexes on: company_id, site_id, uploaded_by_user_id, upload_status
Composite index on (company_id, created_at DESC) for history queries

Relationships:

Many-to-one: company (Companies)
Many-to-one: site (Sites)
Many-to-one: uploaded_by_user (Users)

Use Cases:

Track all bulk import operations for audit purposes
Provide detailed feedback to users about import success/failures
Monitor SDS gaps after bulk imports
Generate import history reports
Trigger Sentinel alerts for missing SDSs (Section 4.6.4)
Support partial imports with fix-and-retry workflow

Example Import Summary:

{
  "upload_id": "123e4567-e89b-12d3-a456-426614174000",
  "file_name": "chemical_inventory_2024.csv",
  "total_rows": 100,
  "rows_created": 85,
  "rows_updated": 10,
  "rows_skipped": 3,
  "rows_errored": 2,
  "unresolved_sds_count": 15,
  "upload_status": "completed"
}

Integration Points:

Section 4.6 (Bulk Upload) in Chemical Inventory requirements
Triggers Sentinel "Missing SDS Alert" when unresolved_sds_count > 0
Creates records in chemiq_sds_reviews_pending for unresolved items
Used by bulk import API endpoints for progress tracking

Table 22: background_job_logs

Purpose: Tracks execution of background jobs from tellus-ehs-background-service. Records job status, duration, processing counts, and any errors.

Key Features:

Job identification (name and type)
Status tracking with timestamps
Optional company/site scope
Processing statistics (records processed, success, failed)
Error tracking and detailed JSONB storage

Important Columns:

log_id (UUID, PK): Primary identifier
job_name (String(100), indexed, required): Name of the job (e.g., "sds_bulk_upload_processor")
job_type (String(50), required): Type of job (e.g., "scheduled", "triggered", "manual")
status (String(20), indexed, default='running'): running, completed, failed
started_at (Timestamp with TZ, indexed): When job started
completed_at (Timestamp with TZ): When job completed
duration_seconds (Numeric(10,3)): Job duration in seconds
company_id (UUID, FK, optional): For company-scoped jobs
site_id (UUID, FK, optional): For site-scoped jobs
records_processed (Integer, default=0): Total records processed
records_success (Integer, default=0): Successful records
records_failed (Integer, default=0): Failed records
error_message (Text): Error details if job failed
details (JSONB): Additional job-specific details

Indexes:

B-tree indexes on: job_name, status, started_at

Use Cases:

Track background job execution history
Monitor job performance and success rates
Debug failed jobs with error messages
Generate job execution reports
Audit trail for scheduled processing

Table 23: chemiq_label_images (Legacy)

Purpose: Stores OCR-extracted product label images uploaded during inventory entry. This is a legacy table that predates chemiq_product_images.

Key Features:

Links to inventory items
Stores image path
OCR extracted text and confidence score
Legacy support for older inventory entries

Important Columns:

Column	Type	Constraints	Description
image_id	UUID	PK	Primary identifier
chemical_id	UUID	FK, indexed	Links to inventory item
image_path	Text	required	Path to stored image
ocr_extracted_data	Text		OCR extracted text
ocr_confidence	Numeric		OCR confidence score
uploaded_at	Timestamp		Upload timestamp

Note: New implementations should use chemiq_product_images instead, which supports the company product catalog architecture.

Relationships:

Many-to-one: chemical (ChemIQ Inventory)

Use Cases:

Legacy inventory label image storage
OCR text extraction for product identification
Backward compatibility with older inventory entries

Table 24: ref_chemical_registry

Purpose: Reference table containing a master registry of chemicals with CAS numbers and molecular information.

Key Features:

Master chemical reference data
CAS number registry
Molecular formulas and weights
UN hazard numbers
State of matter classification

Important Columns:

Column	Type	Constraints	Description
chemical_id	UUID	PK	Primary identifier
cas_number	Text	indexed	CAS registry number
primary_name	Text		Primary chemical name
molecular_formula	Text		Chemical formula
molecular_weight	Numeric		Molecular weight
state_of_matter	Text		solid, liquid, gas
un_number	Text		UN hazard classification number
description	Text		Chemical description
created_at	Timestamp		Creation time
updated_at	Timestamp		Last update time

Indexes:

B-tree index on cas_number

Relationships:

One-to-many: synonyms (Chemical Synonyms)

Use Cases:

Master chemical reference lookup
CAS number validation
Hazard classification reference
Chemical property database

Table 25: ref_chemical_synonyms

Purpose: Stores alternative names and synonyms for chemicals in the registry.

Key Features:

Multiple names per chemical
Language support
Preferred name flagging
Synonym type categorization

Important Columns:

Column	Type	Constraints	Description
synonym_id	UUID	PK	Primary identifier
chemical_id	UUID	FK, indexed	Links to chemical registry
synonym	Text		Alternative name
synonym_type	Text		Type: trade_name, common_name, iupac, etc.
language_code	Text		ISO language code
is_preferred	Boolean		Preferred name flag
created_at	Timestamp		Creation time

Indexes:

B-tree index on chemical_id
B-tree index on synonym

Relationships:

Many-to-one: chemical (Chemical Registry)

Use Cases:

Chemical name lookup by synonym
Multi-language chemical names
Trade name to CAS number mapping
Search enhancement with synonyms

Part 6: Chemical Enrichment, Compliance & PPE Systems

This section documents the tables added for advanced chemical data enrichment, site quantity tracking, regulatory compliance checking, and PPE recommendation features.

Migration Files:

2026_01_15_1200-a1b2c3d4e5f7_add_chemical_enrichment_and_compliance_tables.py - 20 tables
2026_01_15_1300-b2c3d4e5f6g8_add_ghs_hazard_codes_reference.py - 2 tables

Table 26: chemiq_pubchem_cache

Purpose: Caches chemical property data fetched from PubChem and other external sources to reduce API calls and improve response times.

Key Features:

External chemical data cache (properties, exposure limits, GHS data)
Multi-source support (PubChem, NIOSH, manufacturer)
Priority-based refresh cycles (30/90/180 days)
Tracks fetch statistics and cache performance

Important Columns:

cache_id (UUID, PK): Primary identifier
cas_number (String(20), indexed, required): CAS registry number
pubchem_cid (Integer): PubChem Compound ID

Chemical Identification:

iupac_name (String(500)): IUPAC systematic name
common_name (String(255)): Common chemical name
molecular_formula (String(100)): Chemical formula
molecular_weight (Numeric): Molecular weight in g/mol
canonical_smiles (Text): Canonical SMILES notation for chemical structure
inchi (Text): IUPAC International Chemical Identifier (InChI)
inchi_key (String(27), indexed): InChIKey hash for fast structure lookups
chemical_family (String(100)): Chemical family classification
chemical_class (String(100)): Chemical class

Physical Properties:

vapor_pressure_mmhg (Numeric): Vapor pressure at 25°C
boiling_point_c (Numeric): Boiling point in Celsius
flash_point_c (Numeric): Flash point in Celsius
specific_gravity (Numeric): Specific gravity
water_solubility_mg_l (Numeric): Water solubility in mg/L

Exposure Limits:

osha_pel_ppm (Numeric): OSHA Permissible Exposure Limit (ppm)
osha_pel_mg_m3 (Numeric): OSHA PEL (mg/m³)
niosh_rel_ppm (Numeric): NIOSH Recommended Exposure Limit (ppm)
niosh_rel_mg_m3 (Numeric): NIOSH REL (mg/m³)
niosh_idlh_ppm (Numeric): NIOSH IDLH value
acgih_tlv_ppm (Numeric): ACGIH Threshold Limit Value

GHS Classification:

ghs_classification (JSONB): Structured GHS hazard data

Cache Metadata:

source (String(50)): Data source (pubchem, niosh, manufacturer)
source_url (String(500)): Source URL
fetched_at (Timestamp): When data was fetched
last_refreshed_at (Timestamp): Last refresh time
fetch_count (Integer): Number of times fetched
priority_level (Integer, 1-3): Cache priority (1=high, 3=low)
created_at, updated_at (Timestamp)

Indexes:

Unique constraint on cas_number
B-tree index on pubchem_cid
B-tree index on inchi_key (for structure-based lookups)

Use Cases:

Cache PubChem API responses to avoid rate limiting
Store NIOSH Pocket Guide data for exposure limits
Enable offline chemical property lookup
Pre-populate common industrial chemicals
Structure-based chemical matching using InChIKey

Table 27: chemiq_chemical_fetch_queue

Purpose: Background job queue for fetching chemical data from external APIs (PubChem, NIOSH).

Key Features:

Priority-based queue processing
Retry mechanism with max attempts
Tracks source context (SDS, inventory, manual)
Company-scoped requests

Important Columns:

queue_id (UUID, PK): Primary identifier
cas_number (String(20), indexed, required): CAS number to fetch
source_type (String(50), required): sds_parse, inventory_add, manual_request
source_id (UUID): Reference to source record
company_id (UUID, FK): Requesting company
priority (String(20)): high, normal, low
status (String(20), indexed): pending, processing, completed, failed
attempts (Integer): Current attempt count
max_attempts (Integer): Maximum retry attempts
last_attempt_at (Timestamp): Last attempt time
error_message (Text): Error details if failed
created_at (Timestamp)
processed_at (Timestamp): When processing completed

Indexes:

Composite index on (status, priority)
B-tree index on cas_number

Use Cases:

Queue chemical enrichment requests from SDS parsing
Batch process multiple chemical lookups
Retry failed API requests with backoff
Track enrichment request sources

Table 28: chemiq_cache_statistics

Purpose: Tracks cache performance metrics for monitoring and optimization.

Important Columns:

stat_id (UUID, PK): Primary identifier
period_start (Date, required): Period start date
period_end (Date, required): Period end date
total_lookups (Integer): Total cache lookups
cache_hits (Integer): Successful cache hits
cache_misses (Integer): Cache misses requiring API fetch
fetch_successes (Integer): Successful API fetches
fetch_failures (Integer): Failed API fetches
avg_fetch_time_ms (Numeric): Average fetch latency
top_chemicals (JSONB): Most frequently accessed chemicals
created_at (Timestamp)

Use Cases:

Monitor cache hit rates (target: >95%)
Track API performance and reliability
Identify frequently accessed chemicals for pre-caching
Generate performance reports

Table 29: chemiq_unit_conversions

Purpose: Reference table for unit conversion factors used in quantity aggregation.

Key Features:

Standard units: lbs (weight), gal (volume)
Bidirectional conversion factors
Supports weight, volume, and count units

Important Columns:

conversion_id (UUID, PK): Primary identifier
unit_code (String(20), required): Unit identifier (oz, gal, L, kg)
unit_name (String(50), required): Full unit name
unit_type (String(20), required): weight, volume
to_standard_factor (Numeric, required): Conversion factor to standard unit
standard_unit (String(20), required): Standard unit (lbs, gal)
abbreviation (String(10), required): Display abbreviation
created_at (Timestamp)

Seed Data (14 records):

Weight: oz, lbs, g, kg
Volume: fl_oz, cup, pt, qt, gal, mL, L

Use Cases:

Convert inventory quantities to standard units
Support Tier II reporting (lbs required)
Enable fire code threshold comparisons

Table 30: chemiq_product_density

Purpose: Stores density values for volume-to-weight conversion.

Important Columns:

density_id (UUID, PK): Primary identifier
cas_number (String(20), indexed): CAS number for chemical-specific density
product_id (UUID, FK): Product-specific density
density_value (Numeric, required): Density value
density_unit (String(20), required): Density unit (g/mL, lb/gal)
density_lbs_per_gal (Numeric, required): Normalized density
data_source (String(100)): Source of density data
created_at, updated_at (Timestamp)

Seed Data:

Water baseline: 8.34 lbs/gal

Use Cases:

Convert gallons to pounds for Tier II reporting
Calculate weight from volume measurements
Support fire code quantity calculations

Table 31: chemiq_site_chemical_summary

Purpose: Aggregated chemical quantities per site for Tier II reporting and threshold monitoring.

Key Features:

Per-chemical totals (lbs, gallons, containers)
Tracks contributing products
Fire code classification
Tier II threshold tracking

Important Columns:

summary_id (UUID, PK): Primary identifier
company_id (UUID, FK, required): Company identifier
site_id (UUID, FK, required): Site identifier
cas_number (String(20), required): Chemical CAS number
chemical_name (String(255), required): Chemical name
total_quantity_lbs (Numeric): Total weight in pounds
total_quantity_gal (Numeric): Total volume in gallons
total_containers (Integer): Container count
product_count (Integer): Number of contributing products
contributing_products (JSONB): Product details
is_hazardous (Boolean): Hazardous classification
hazard_classes (JSONB): GHS hazard classes
physical_state (String(20)): solid, liquid, gas
max_daily_quantity_lbs (Numeric): Max daily quantity (Tier II)
avg_daily_quantity_lbs (Numeric): Average daily quantity (Tier II)
days_on_site (Integer): Days present (Tier II)
fire_code_category (String(50)): Fire code hazard category
fire_code_class (String(20)): Fire code hazard class
storage_locations (JSONB): Storage location details
tier2_threshold_lbs (Integer): Tier II reporting threshold
fire_permit_threshold (Numeric): Fire permit threshold
exceeds_tier2_threshold (Boolean): Tier II threshold exceeded
exceeds_fire_threshold (Boolean): Fire threshold exceeded
last_calculated_at (Timestamp): Last aggregation time
calculation_method (String(50)): Calculation method used
created_at, updated_at (Timestamp)

Indexes:

Composite index on (company_id, site_id)
B-tree index on cas_number

Use Cases:

Generate Tier II reports
Monitor fire code compliance
Track site-level chemical totals
Alert on threshold exceedances

Table 32: chemiq_site_hazard_summary

Purpose: Aggregated quantities by hazard class per site for fire code compliance.

Important Columns:

summary_id (UUID, PK): Primary identifier
company_id (UUID, FK, required): Company identifier
site_id (UUID, FK, required): Site identifier
hazard_category (String(50), required): Hazard category (Flammable, Corrosive)
hazard_class (String(100), required): Specific hazard class
ghs_code (String(20)): GHS hazard code
fire_code_category (String(50)): IFC category
fire_code_class (String(20)): IFC class
total_quantity_lbs (Numeric): Total weight
total_quantity_gal (Numeric): Total volume
total_containers (Integer): Container count
chemical_count (Integer): Number of chemicals
chemicals (JSONB): Contributing chemical details
fire_permit_limit (Numeric): Fire permit threshold
exceeds_permit_limit (Boolean): Threshold exceeded
percent_of_limit (Numeric): Percentage of limit used
last_calculated_at (Timestamp)
created_at, updated_at (Timestamp)

Use Cases:

Fire department permit applications
Hazard class quantity tracking
Threshold monitoring by hazard type

Table 33: chemiq_site_quantity_history

Purpose: Daily snapshots of chemical quantities for Tier II reporting (max/avg daily calculations).

Key Features:

7-year retention for regulatory compliance
Daily automated snapshots
Supports Tier II max/avg calculations

Important Columns:

history_id (UUID, PK): Primary identifier
company_id (UUID, FK, required): Company identifier
site_id (UUID, FK, required): Site identifier
cas_number (String(20), required): Chemical CAS number
chemical_name (String(255), required): Chemical name
snapshot_date (Date, indexed, required): Snapshot date
quantity_lbs (Numeric): Quantity in pounds
quantity_gal (Numeric): Quantity in gallons
container_count (Integer): Container count
is_present (Boolean): Chemical present on this date
source (String(50)): Snapshot source (daily_snapshot, manual)
created_at (Timestamp)

Indexes:

Composite index on (company_id, site_id, snapshot_date)
B-tree index on cas_number

Use Cases:

Calculate Tier II max daily quantity
Calculate Tier II average daily quantity
Calculate days on site
Historical quantity tracking

Table 34: chemiq_fire_code_thresholds

Purpose: Reference table for fire code quantity limits by hazard category.

Important Columns:

threshold_id (UUID, PK): Primary identifier
hazard_category (String(50), required): Hazard category
hazard_class (String(20)): Specific class
storage_type (String(50), required): Storage type
max_quantity_gal (Numeric): Maximum gallons
max_quantity_lbs (Numeric): Maximum pounds
max_quantity_cu_ft (Numeric): Maximum cubic feet
code_reference (String(100)): Code citation
jurisdiction (String(50)): Jurisdiction (IFC, state)
notes (Text): Additional notes
created_at (Timestamp)

Seed Data (13 records): Based on IFC Table 5003.1.1(1) for flammable liquids, corrosives, oxidizers, etc.

Use Cases:

Fire code compliance checking
Permit threshold calculations
Automated threshold alerts

Table 35: chemiq_regulatory_lists

Purpose: Master regulatory list tracking chemicals on OSHA, EPA, IARC, and state lists.

Key Features:

OSHA PEL/specific standards tracking
EPA SARA 313 (TRI) and CERCLA RQ
California Prop 65
IARC/NTP carcinogen classifications

Important Columns:

list_id (UUID, PK): Primary identifier
cas_number (String(20), indexed, required): CAS number
chemical_name (String(255)): Chemical name

OSHA Fields:

is_osha_pel_listed (Boolean): Has OSHA PEL
has_osha_specific_standard (Boolean): Has substance-specific standard
osha_standard_citation (String(50)): Standard citation (e.g., 1910.1028)
is_niosh_rel_listed (Boolean): Has NIOSH REL
is_acgih_tlv_listed (Boolean): Has ACGIH TLV

EPA SARA 313 (TRI):

is_epa_sara_313 (Boolean): On TRI list
sara_313_threshold_lbs (Integer): Reporting threshold (default 10,000)
sara_313_category (String(50)): Category
sara_313_pbt (Boolean): Persistent bioaccumulative toxic

EPA CERCLA:

is_epa_cercla (Boolean): On CERCLA list
cercla_rq_lbs (Integer): Reportable quantity

California Prop 65:

is_california_prop65 (Boolean): On Prop 65 list
prop65_type (String(20)): cancer, reproductive, both
prop65_listing_date (Date): When listed
prop65_nsrl_ug (Numeric): No Significant Risk Level
prop65_madl_ug (Numeric): Maximum Allowable Dose Level

Carcinogen Classification:

is_carcinogen (Boolean): Known/suspected carcinogen
carcinogen_source (String(20)): IARC, NTP, OSHA
carcinogen_classification (String(20)): Group 1, 2A, 2B, etc.

EU REACH:

is_eu_reach (Boolean): On REACH list
reach_status (String(50)): REACH status

Metadata:

last_verified_at (Timestamp): Last verification date
source_urls (Text[]): Source URLs
notes (Text): Additional notes
created_at, updated_at (Timestamp)

Indexes:

Unique constraint on cas_number

Use Cases:

Compliance checking against regulatory lists
TRI reporting determination
Prop 65 warning requirements
Carcinogen identification

Table 36: chemiq_company_compliance_settings

Purpose: Company-specific compliance settings for customized checking.

Important Columns:

setting_id (UUID, PK): Primary identifier
company_id (UUID, FK, required): Company identifier
ships_to_california (Boolean): Ships to CA (triggers Prop 65)
employee_count (Integer): Employee count (affects TRI)
naics_code (String(10)): NAICS code (affects TRI)
use_niosh_rel_over_pel (Boolean): Use NIOSH REL instead of PEL
exposure_monitoring_frequency (String(20)): Monitoring frequency
notify_on_compliance_issues (Boolean): Enable notifications
compliance_notification_emails (Text[]): Notification recipients
created_at, updated_at (Timestamp)

Indexes:

Unique constraint on company_id

Use Cases:

Configure Prop 65 requirements
Set TRI reporting thresholds
Customize exposure monitoring
Enable compliance alerts

Table 37: chemiq_compliance_results

Purpose: Stores compliance check results per product/chemical.

Important Columns:

result_id (UUID, PK): Primary identifier
company_id (UUID, FK, required): Company identifier
site_id (UUID, FK): Site scope
inventory_id (UUID, FK): Inventory item
sds_id (UUID, FK): SDS document
check_date (Timestamp): When check was performed
compliance_score (Integer): 0-100 score
is_compliant (Boolean): Overall compliance status

Check Results (JSONB):

hazcom_result: HazCom compliance details
exposure_monitoring_result: Exposure monitoring requirements
tri_reporting_result: TRI reporting determination
cercla_result: CERCLA RQ status
prop65_result: Prop 65 requirements
carcinogen_result: Carcinogen classification

Action Items:

required_actions (Text[]): Required actions
warnings (Text[]): Warning messages
checked_by (String(50)): Check method (auto, manual)
created_at (Timestamp)

Indexes:

B-tree indexes on company_id, site_id, sds_id

Use Cases:

Store compliance check results
Track compliance history
Generate compliance reports
Identify action items

Table 38: ppe_glove_chemical_resistance

Purpose: Chemical-to-glove-material compatibility ratings.

Important Columns:

resistance_id (UUID, PK): Primary identifier
cas_number (String(20), indexed, required): Chemical CAS number
chemical_name (String(255), required): Chemical name
chemical_family (String(100)): Chemical family

Material Ratings (VG/G/F/P/NR):

nitrile_rating (String(10)): Nitrile rubber rating
butyl_rating (String(10)): Butyl rubber rating
neoprene_rating (String(10)): Neoprene rating
pvc_rating (String(10)): PVC rating
viton_rating (String(10)): Viton rating
latex_rating (String(10)): Natural latex rating

Breakthrough Times (minutes):

nitrile_breakthrough_min (Integer)
butyl_breakthrough_min (Integer)
neoprene_breakthrough_min (Integer)
viton_breakthrough_min (Integer)

Recommendations:

recommended_thickness_mil (Integer): Recommended thickness
source (String(255)): Data source
notes (Text): Additional notes
created_at, updated_at (Timestamp)

Seed Data (41 records): Common industrial chemicals with material ratings from Ansell Chemical Resistance Guide:

Solvents: Acetone, MEK, Toluene, Xylene, Hexane, Heptane, etc.
Alcohols: Methanol, Ethanol, Isopropanol, Butanol
Acids: Hydrochloric, Sulfuric, Nitric, Phosphoric, Acetic
Bases: Sodium Hydroxide, Potassium Hydroxide, Ammonia
Aldehydes: Formaldehyde, Glutaraldehyde
Chlorinated: Methylene Chloride, Trichloroethylene, Perchloroethylene

Use Cases:

Glove material selection
Breakthrough time calculations
PPE recommendation generation

Table 39: ppe_eye_protection_reference

Purpose: Hazard type to eye protection mapping.

Important Columns:

reference_id (UUID, PK): Primary identifier
hazard_type (String(50), required): Hazard type
ghs_hazard_code (String(10)): GHS code
minimum_protection (String(100), required): Minimum required
recommended_protection (String(100)): Recommended level
ansi_marking_required (String(20)): ANSI marking (Z87.1)
face_shield_recommended (Boolean): Face shield needed
full_face_respirator_option (Boolean): Can use full-face respirator
selection_reasoning (Text): Selection rationale
created_at (Timestamp)

Seed Data (8 records): Hazard types mapped to eye protection requirements:

chemical_splash: Safety goggles with indirect venting (Z87.1+D3)
dust: Safety goggles or glasses with side shields (Z87.1)
vapor: Safety goggles with indirect venting (Z87.1+D3)
corrosive_splash: Face shield + safety goggles (Z87.1+D3)
severe_eye_damage: Face shield + safety goggles (Z87.1+D3)
eye_irritation: Safety glasses with side shields (Z87.1)
general_chemical: Safety goggles (Z87.1+D3)
uv_radiation: Tinted safety glasses (Z87.1+U6)

Use Cases:

Eye protection selection
GHS-based recommendations
ANSI compliance

Table 40: ppe_respiratory_reference

Purpose: Chemical to respirator cartridge requirements.

Important Columns:

reference_id (UUID, PK): Primary identifier
cas_number (String(20), indexed): CAS number
chemical_name (String(255)): Chemical name
chemical_family (String(100)): Chemical family

Exposure Limits:

osha_pel_ppm (Numeric): OSHA PEL
osha_pel_mgm3 (Numeric): OSHA PEL (mg/m³)
niosh_rel_ppm (Numeric): NIOSH REL
niosh_rel_mgm3 (Numeric): NIOSH REL (mg/m³)
niosh_idlh_ppm (Numeric): IDLH value

Cartridge Selection:

cartridge_type (String(50)): OV, AG, OV/AG, etc.
cartridge_color (String(50)): Black, yellow, etc.
particulate_filter (String(20)): N95, P100, etc.

Physical Properties:

vapor_pressure_mmhg (Numeric): Vapor pressure
is_particulate (Boolean): Particulate hazard
oil_present (Boolean): Oil present

Respirator Requirements:

min_respirator_type (String(50)): APR, PAPR, SAR, SCBA
min_apf (Integer): Minimum APF required
requires_supplied_air (Boolean): Requires SAR/SCBA
special_considerations (Text): Special notes
source (String(100)): Data source
created_at, updated_at (Timestamp)

Seed Data (19 records): Common chemicals with respiratory protection requirements from NIOSH Pocket Guide:

Organic Solvents: Acetone, Toluene, Xylene, MEK, Hexane (OV cartridge, black)
Acid Gases: HCl, Sulfuric acid mist, Nitric acid (AG cartridge, white/yellow)
Ammonia: Ammonia (AM cartridge, green)
Formaldehyde: Formaldehyde (Formaldehyde cartridge, olive)
Particulates: Silica dust, Metal fumes (N95/P100 filters)
Multi-gas: Combination OV/AG cartridges for mixed exposures
High-hazard: IDLH chemicals requiring supplied air (SCBA/SAR)

Use Cases:

Respirator cartridge selection
APF calculations
IDLH determinations

Table 41: ppe_body_protection_reference

Purpose: Hazard to body protection level mapping.

Important Columns:

reference_id (UUID, PK): Primary identifier
hazard_type (String(50), required): Hazard type
ghs_hazard_code (String(10)): GHS code
minimum_protection (String(100), required): Minimum protection
recommended_protection (String(100)): Recommended level
protection_level (String(5)): EPA level (A, B, C, D)
min_material_type (String(50)): Minimum material
selection_reasoning (Text): Selection rationale
created_at (Timestamp)

Seed Data (8 records): Hazard severity to EPA protection level mapping:

Level D: General work, minimal hazard (standard coveralls)
Level C: Known contaminants, APR adequate (Tyvek + APR)
Level B: High concentrations, splash hazard (Tychem + SCBA)
Level A: Highest protection, IDLH/unknown (fully encapsulated + SCBA)
Plus specific hazard types: corrosive, oxidizer, flammable, toxic

Use Cases:

Body protection selection
EPA protection levels
Chemical suit requirements

Table 42: ppe_body_material_resistance

Purpose: Chemical-to-body-protection material compatibility.

Important Columns:

resistance_id (UUID, PK): Primary identifier
cas_number (String(20), indexed, required): CAS number
chemical_name (String(255), required): Chemical name

Material Ratings:

tyvek_rating (String(10)): Tyvek rating
tychem_qc_rating (String(10)): Tychem QC rating
tychem_2000_rating (String(10)): Tychem 2000 rating
tychem_4000_rating (String(10)): Tychem 4000 rating
tychem_6000_rating (String(10)): Tychem 6000 rating
saranex_rating (String(10)): Saranex rating

Breakthrough Times:

tychem_2000_breakthrough_min (Integer)
tychem_4000_breakthrough_min (Integer)
source (String(100)): Data source
created_at (Timestamp)

Seed Data (8 records): Common chemicals with DuPont Tyvek/Tychem material ratings:

Solvents: Acetone, Toluene, Xylene (Tychem 2000/4000 required)
Acids: Sulfuric acid, Hydrochloric acid (Tychem 4000/6000 required)
Bases: Sodium Hydroxide (Tychem QC/2000 adequate)
Water-based: General aqueous solutions (Tyvek adequate)
High-hazard: Concentrated acids/bases (Tychem 6000/Saranex required)

Use Cases:

Chemical suit material selection
Breakthrough time calculations

Table 43: ppe_products

Purpose: Catalog of available PPE products with specifications.

Important Columns:

product_id (UUID, PK): Primary identifier
product_name (String(255), required): Product name
manufacturer (String(100), required): Manufacturer
product_code (String(100)): SKU/model number
ppe_category (String(50), required): hand, eye, respiratory, body
ppe_type (String(50), required): gloves, goggles, respirator, coverall
ppe_subtype (String(50)): Specific subtype

Glove Fields:

glove_material (String(50)): Material type
glove_thickness_mil (Numeric): Thickness in mils
glove_length_inches (Numeric): Length
glove_ansi_level (Integer): Cut resistance level

Eye Protection Fields:

eye_ansi_marking (String(20)): ANSI marking
eye_lens_type (String(50)): Lens type
eye_has_side_shields (Boolean): Side shields
eye_indirect_vent (Boolean): Indirect venting

Respirator Fields:

respirator_type (String(50)): APR, PAPR, etc.
respirator_apf (Integer): Assigned Protection Factor
respirator_cartridge_type (String(50)): Cartridge type
respirator_filter_rating (String(10)): N95, P100, etc.
respirator_niosh_approved (Boolean): NIOSH approved

Body Protection Fields:

body_coverage (String(50)): Full, partial, etc.
body_material (String(50)): Material type
body_protection_level (String(5)): EPA level
body_seam_type (String(50)): Seam construction

Common Fields:

certifications (Text[]): Certifications list
size_options (String(100)): Available sizes
purchase_url (String(500)): Purchase link
price_range (String(50)): Price range
pack_quantity (Integer): Units per pack
is_active (Boolean): Active flag
created_at, updated_at (Timestamp)

Indexes:

B-tree indexes on ppe_category, ppe_type, manufacturer

Seed Data (20 records): Representative PPE products across all categories:

Gloves (5): Nitrile (Ansell Sol-Vex 37-175), Butyl (North B-174), Neoprene (Ansell Neox 09-430), PVC (Ansell Snorkel), Viton (North F-091)
Eye Protection (5): Safety glasses with side shields, Chemical splash goggles (indirect vent), Face shields, Full-face respirator visors, UV-protective glasses
Respirators (5): Half-face APR (3M 6000), Full-face APR (3M 6800), PAPR (3M Versaflo), N95 disposable (3M 8210), P100 cartridge set
Body Protection (5): Tyvek coverall, Tychem QC coverall, Tychem 2000 suit, Tychem 4000 suit, Apron (chemical resistant)

Use Cases:

PPE product catalog
Product recommendations
Specification matching

Table 44: ppe_sds_recommendations

Purpose: Generated PPE recommendations per SDS document.

Important Columns:

recommendation_id (UUID, PK): Primary identifier
sds_id (UUID, FK, required): SDS document

Glove Recommendation:

glove_material (String(50)): Recommended material
glove_thickness_mil (Integer): Recommended thickness
glove_ansi_level (Integer): Cut resistance
glove_reasoning (Text): Selection rationale
glove_confidence (Numeric): Confidence score

Eye Protection:

eye_protection_type (String(50)): Type recommended
eye_ansi_marking (String(20)): ANSI marking
eye_face_shield_required (Boolean): Face shield needed
eye_reasoning (Text): Selection rationale
eye_confidence (Numeric): Confidence score

Respiratory:

respirator_type (String(50)): Type recommended
respirator_cartridge (String(50)): Cartridge type
respirator_filter (String(10)): Filter rating
respirator_apf_required (Integer): Minimum APF
respirator_reasoning (Text): Selection rationale
respirator_confidence (Numeric): Confidence score

Body Protection:

body_protection_type (String(50)): Type recommended
body_material (String(50)): Material type
body_protection_level (String(5)): EPA level
body_reasoning (Text): Selection rationale
body_confidence (Numeric): Confidence score

Overall:

overall_confidence (Numeric): Overall confidence
generated_at (Timestamp): Generation time
generated_by (String(50)): Generation method (auto, manual)
last_reviewed_at (Timestamp): Last review
reviewed_by (UUID, FK): Reviewer
created_at (Timestamp)

Indexes:

Unique constraint on sds_id

Use Cases:

Store PPE recommendations per SDS
Track recommendation confidence
Enable human review workflow

Table 45: ppe_ghs_hazard_mapping

Purpose: Maps GHS hazard codes to PPE requirements (PPE-triggering codes only).

Important Columns:

mapping_id (UUID, PK): Primary identifier
ghs_code (String(10), required): GHS hazard code
hazard_statement (Text): Hazard statement text
hazard_category (String(50)): Hazard category

PPE Requirements:

eye_protection_min (String(50)): Minimum eye protection
eye_protection_recommended (String(50)): Recommended eye protection
hand_protection_min (String(50)): Minimum hand protection
respirator_required (Boolean): Respirator required
respirator_type_min (String(50)): Minimum respirator
body_protection_min (String(50)): Minimum body protection
notes (Text): Additional notes
created_at (Timestamp)

Seed Data (13 codes): H310, H311, H312, H314, H315, H317, H318, H319, H330, H331, H332, H334, H335

Use Cases:

Quick PPE lookup by GHS code
SDS-to-PPE mapping
Automated recommendations

Table 46: chemiq_ghs_hazard_codes

Purpose: Complete reference table for all GHS hazard codes (H-codes).

Key Features:

All physical hazards (H200 series)
All health hazards (H300 series)
All environmental hazards (H400 series)
Combined hazard statements
PPE triggering flags

Important Columns:

hazard_id (UUID, PK): Primary identifier
code (String(30), indexed, required): Hazard code
code_type (String(20), required): hazard, precautionary, combined
hazard_class (String(100)): Hazard class name
hazard_category (String(100)): Category designation
statement (Text, required): Hazard statement text
signal_word (String(20)): Danger, Warning
pictogram_codes (Text[]): GHS pictograms (GHS01-GHS09)
category_number (Integer): Category number
is_physical (Boolean): Physical hazard flag
is_health (Boolean): Health hazard flag
is_environmental (Boolean): Environmental hazard flag
triggers_ppe (Boolean): Triggers PPE requirement
ppe_categories (Text[]): PPE categories required
notes (Text): Additional notes
created_at (Timestamp)

Seed Data (94 codes):

29 physical hazards (H200 series)
57 health hazards (H300 series)
8 environmental hazards (H400 series)
13 combined hazard statements

Indexes:

Unique constraint on code
B-tree indexes on code_type, hazard_class

Use Cases:

GHS code validation
Hazard statement lookup
Pictogram determination
PPE requirement lookup

Table 47: chemiq_ghs_pictograms

Purpose: Reference table for GHS pictograms (GHS01-GHS09).

Important Columns:

pictogram_id (UUID, PK): Primary identifier
code (String(10), required): Pictogram code (GHS01-GHS09)
name (String(100), required): Pictogram name
symbol_description (Text, required): Symbol description
hazard_types (Text[]): Associated hazard types
created_at (Timestamp)

Seed Data (9 pictograms):

GHS01: Exploding Bomb
GHS02: Flame
GHS03: Flame Over Circle
GHS04: Gas Cylinder
GHS05: Corrosion
GHS06: Skull and Crossbones
GHS07: Exclamation Mark
GHS08: Health Hazard
GHS09: Environment

Use Cases:

Pictogram lookup
SDS label generation
Hazard visualization

Part 7: Classification & Tagging System

This section documents the tables added for chemical classification (GHS hazard categories), custom company tags, and AI-generated hazard summaries.

Migration File:

2026_01_31_2200-a3c7f1e2d456_add_classification_and_tagging.py

Table 48: chemiq_company_tags

Purpose: Company-scoped custom tags for organizing and categorizing chemicals.

Key Features:

Company-specific tag definitions with color coding
Unique tag names per company
Cascading delete when company is removed

Important Columns:

tag_id (UUID, PK): Primary identifier
company_id (UUID, FK, indexed, required): Company that owns this tag
name (String(50), required): Tag display name
color (String(7), default='#6B7280', required): Hex color code for display
description (String(255)): Optional tag description
created_by_user_id (UUID, FK): User who created the tag
created_at (Timestamp with TZ, default=now): Creation time
updated_at (Timestamp with TZ): Last update time

Indexes:

B-tree index on company_id
Unique constraint on (company_id, name)

Relationships:

Many-to-one: company → Company (CASCADE delete)
Many-to-one: created_by_user → User (SET NULL on delete)
One-to-many: product_tags → ProductTag

Use Cases:

Create custom organizational tags (e.g., "High Priority", "Needs Review", "Lab Only")
Color-coded tag display in inventory list and details page
Company-specific categorization beyond GHS hazard classes

Table 49: chemiq_product_tags

Purpose: Junction table linking company products to tags (many-to-many).

Key Features:

Links products to company tags
Tracks who assigned the tag
Cascading deletes from both product and tag sides

Important Columns:

id (UUID, PK): Primary identifier
company_product_id (UUID, FK, indexed, required): Links to company product catalog
tag_id (UUID, FK, indexed, required): Links to company tag
assigned_by_user_id (UUID, FK): User who assigned the tag
assigned_at (Timestamp with TZ, default=now): Assignment time

Indexes:

B-tree index on company_product_id
B-tree index on tag_id
Unique constraint on (company_product_id, tag_id)

Relationships:

Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
Many-to-one: tag → CompanyTag (CASCADE delete)
Many-to-one: assigned_by_user → User (SET NULL on delete)

Use Cases:

Assign multiple tags to a product
Filter inventory by tag
Track tag assignment history
Dashboard tag distribution charts

Table 50: chemiq_ai_hazard_summaries

Purpose: Stores AI-generated plain-English hazard summaries for products with parsed SDS data.

Key Features:

One summary per product (unique index on company_product_id)
Risk level classification (low/moderate/high/extreme)
Key hazards as JSONB array
Tracks which SDS and AI model were used for generation

Important Columns:

id (UUID, PK): Primary identifier
company_product_id (UUID, FK, unique index, required): Links to company product catalog
summary_text (Text, required): AI-generated plain-English hazard summary
handling_guidance (Text): AI-generated handling and storage guidance
risk_level (String(20)): Overall risk level: low, moderate, high, extreme
key_hazards (JSONB): Array of top hazard phrases (e.g., ["Highly flammable", "Causes skin irritation"])
model_used (String(50)): AI model used for generation (e.g., "gpt-4o", "claude-3")
generated_at (Timestamp with TZ, default=now): When summary was generated
sds_document_id (UUID, FK): SDS document used as source for generation

Indexes:

Unique index on company_product_id (one summary per product)

Relationships:

Many-to-one: company_product → CompanyProductCatalog (CASCADE delete)
Many-to-one: sds_document → SDSDocument (SET NULL on delete)

Use Cases:

Display AI-generated hazard summary on product details page
Risk level badges in inventory list
Handling guidance for employees without chemistry background
Regenerate summary when SDS is updated

Summary

The ChemIQ Complete Schema provides a comprehensive, unified architecture for managing chemical safety documents across multiple companies:

Key Benefits

Unified Search: Two-tier product catalog (company → global) for fast, relevant results
No Product Duplication: Company product catalog eliminates duplicate product data across sites/locations
Document Flexibility: One product can link to multiple document types (SDS, EPA Label, COA, tech sheets)
Global Repositories: SDSs and EPA Labels shared across companies (no duplication)
Company Isolation: Company product catalog and mapping tables maintain data separation
Data Quality Tracking: Confidence scores, verification status, and data source tracking
Performance: Extensive indexing (GIN, B-tree) for fast search across large datasets
Compliance: Complete audit trails, version tracking, and retention policies
Scalability: Partition-ready design for millions of documents
Integration: S3 storage, async parsing queues, EPA PPLS API, external barcode APIs

Total Tables: 50

Product Catalog System (4 tables):

chemiq_product_catalog (global shared catalog)
chemiq_product_documents (polymorphic junction table)
chemiq_company_product_catalog - company-specific products
chemiq_product_images - product label images for curved/multi-sided labels

EPA Label Repository (4 tables): 5. chemiq_epa_label_documents 6. chemiq_company_label_mappings 7. chemiq_label_parse_queue 8. chemiq_label_change_log

SDS Repository (9 tables): 9. chemiq_sds_documents 10. chemiq_sds_hazard_info 11. chemiq_sds_composition 12. chemiq_sds_sections 13. chemiq_company_sds_mappings 14. chemiq_sds_parse_queue 15. chemiq_sds_change_log 16. chemiq_sds_web_search_cache - caches Perplexity API web search results 17. chemiq_sds_bulk_upload_jobs - tracks ZIP bulk upload background jobs

Supporting Tables (8 tables): 18. chemiq_inventory (simplified - links to company product catalog) 19. chemiq_inventory_attributes - stores batch/lot numbers and expiration dates 20. chemiq_sds_reviews_pending 21. chemiq_inventory_uploads 22. background_job_logs - tracks background service job execution 23. chemiq_label_images - legacy label images (use chemiq_product_images for new implementations) 24. ref_chemical_registry - master chemical CAS number registry 25. ref_chemical_synonyms - chemical name synonyms and aliases

Chemical Enrichment (3 tables): 26. chemiq_pubchem_cache - PubChem/NIOSH chemical data cache 27. chemiq_chemical_fetch_queue - background fetch job queue 28. chemiq_cache_statistics - cache performance metrics

Site Quantity Aggregation (6 tables): 29. chemiq_unit_conversions - unit conversion factors 30. chemiq_product_density - volume-to-weight conversion 31. chemiq_site_chemical_summary - per-chemical site totals 32. chemiq_site_hazard_summary - per-hazard-class site totals 33. chemiq_site_quantity_history - daily snapshots for Tier II 34. chemiq_fire_code_thresholds - fire code limits

Compliance Checking (3 tables): 35. chemiq_regulatory_lists - OSHA/EPA/IARC/Prop65 lists 36. chemiq_company_compliance_settings - company compliance config 37. chemiq_compliance_results - compliance check results

PPE Recommendation (8 tables): 38. ppe_glove_chemical_resistance - glove material ratings 39. ppe_eye_protection_reference - eye protection mapping 40. ppe_respiratory_reference - respirator requirements 41. ppe_body_protection_reference - body protection levels 42. ppe_body_material_resistance - suit material ratings 43. ppe_products - PPE product catalog 44. ppe_sds_recommendations - generated PPE recommendations 45. ppe_ghs_hazard_mapping - GHS to PPE mapping

GHS Reference (2 tables): 46. chemiq_ghs_hazard_codes - complete GHS H-codes (94 codes) 47. chemiq_ghs_pictograms - GHS pictograms (9 pictograms)

Classification & Tagging (3 tables): 48. chemiq_company_tags - company-scoped custom tags with color coding 49. chemiq_product_tags - product-to-tag junction table 50. chemiq_ai_hazard_summaries - AI-generated plain-English hazard summaries

Seed Data Summary

Table	Records	Source	Description
`chemiq_unit_conversions`	14	Standard	Unit conversion factors (oz→lbs, gal→lbs, etc.)
`chemiq_fire_code_thresholds`	13	IFC 5003.1.1	Fire code quantity limits by hazard class
`chemiq_product_density`	1	Standard	Water baseline density
`chemiq_ghs_hazard_codes`	94	GHS Rev 9	Complete GHS H-codes (physical/health/environmental)
`chemiq_ghs_pictograms`	9	GHS Rev 9	All GHS pictograms (GHS01-GHS09)
`chemiq_pubchem_cache`	~300	NIOSH/PubChem	Pre-seeded common industrial chemicals with exposure limits
`ppe_glove_chemical_resistance`	41	Ansell Guide	Glove material ratings for common chemicals
`ppe_eye_protection_reference`	8	OSHA/ANSI	Hazard type to eye protection mapping
`ppe_respiratory_reference`	19	NIOSH Guide	Respirator cartridge requirements by chemical
`ppe_body_protection_reference`	8	EPA/OSHA	Hazard severity to protection level mapping
`ppe_body_material_resistance`	8	DuPont	Tyvek/Tychem material compatibility ratings
`ppe_products`	20	Various	Representative PPE product catalog
`ppe_ghs_hazard_mapping`	13	GHS/OSHA	GHS hazard codes to PPE requirements

Note: Seed data is populated via Alembic migrations in tellus-ehs-hazcom-service/alembic/versions/.

Schema Evolution Note

The addition of chemiq_company_product_catalog represents a key architectural improvement:

Before: Product data was denormalized in chemiq_inventory, causing duplication across sites/locations
After: Product data is normalized in company product catalog, inventory only stores site/location-specific quantities

This change supports the incremental product catalog building strategy outlined in BARCODE_PRODUCT_CATALOG_STRATEGY.md.

This unified architecture supports the core mission of the Tellus EHS platform: storing chemical inventory documents and generating actionable insights based on composition, hazards, and safety information.

Overview​

Key Design Principles​

Architecture Overview​

Four-Tier Document Management​

User Workflow​

Part 1: Product Catalog System​

Table 1: chemiq_product_catalog​

Table 2: chemiq_product_documents​

Table 3: chemiq_company_product_catalog (NEW)​

Table 4: chemiq_product_images​

Part 2: EPA Label Repository​

Table 5: chemiq_epa_label_documents​

Table 6: chemiq_company_label_mappings​

Table 7: chemiq_label_parse_queue​

Table 8: chemiq_label_change_log​

Part 3: SDS Repository System​

Table 9: chemiq_sds_documents​

Table 10: chemiq_sds_hazard_info​

Table 11: chemiq_sds_composition​

Table 12: chemiq_sds_sections​

Table 13: chemiq_company_sds_mappings​

Table 14: chemiq_sds_parse_queue​

Table 15: chemiq_sds_change_log​

Table 16: chemiq_sds_web_search_cache​

Table 17: chemiq_sds_bulk_upload_jobs​

Part 4: Inventory Integration​

Table 18: chemiq_inventory (Simplified)​

Table 18b: chemiq_inventory_attributes​

Table 18c: chemiq_sds_reviews_pending​

Search and Performance Optimization​

Full-Text Search Strategy​

Array and JSONB Search​

Deduplication Strategy​

Version Tracking​

Linked-List Pattern​

Unified Search Architecture​

Example Search Flow​

Database Migrations​

Migration 1: SDS Global Schema​

Migration 2: Product Catalog and EPA Labels​

Migration 3: Product Images​

Migration 4: Unique Constraints for Barcode and Inventory Location​

Migration 5: Add search_type to SDS Web Search Cache​

Migration 6: Classification & Tagging​

SQLAlchemy Models​

SDS Models​

Product Catalog & EPA Label Models​

Example Queries​

Search product by barcode​

Find product and all its documents​

Search EPA labels by target pest​

Find all products containing a specific chemical (CAS)​

Get company's label library with usage stats​

Compliance and Regulatory​

OSHA Hazard Communication Standard (HazCom 2012)​

EPA FIFRA (Pesticides)​

Data Retention​

Part 5: Bulk Import and Supporting Tables​

Table 19: chemiq_inventory_uploads​

Table 22: background_job_logs​

Table 23: chemiq_label_images (Legacy)​

Table 24: ref_chemical_registry​

Table 25: ref_chemical_synonyms​

Part 6: Chemical Enrichment, Compliance & PPE Systems​

Table 26: chemiq_pubchem_cache​

Table 27: chemiq_chemical_fetch_queue​

Table 28: chemiq_cache_statistics​

Table 29: chemiq_unit_conversions​

Table 30: chemiq_product_density​

Table 31: chemiq_site_chemical_summary​

Table 32: chemiq_site_hazard_summary​

Table 33: chemiq_site_quantity_history​

Table 34: chemiq_fire_code_thresholds​

Table 35: chemiq_regulatory_lists​

Table 36: chemiq_company_compliance_settings​

Table 37: chemiq_compliance_results​

Table 38: ppe_glove_chemical_resistance​

Table 39: ppe_eye_protection_reference​

Table 40: ppe_respiratory_reference​

Table 41: ppe_body_protection_reference​

Overview

Key Design Principles

Architecture Overview

Four-Tier Document Management

User Workflow

Part 1: Product Catalog System

Table 1: chemiq_product_catalog

Table 2: chemiq_product_documents

Table 3: chemiq_company_product_catalog (NEW)

Table 4: chemiq_product_images

Part 2: EPA Label Repository

Table 5: chemiq_epa_label_documents

Table 6: chemiq_company_label_mappings

Table 7: chemiq_label_parse_queue

Table 8: chemiq_label_change_log

Part 3: SDS Repository System

Table 9: chemiq_sds_documents

Table 10: chemiq_sds_hazard_info

Table 11: chemiq_sds_composition

Table 12: chemiq_sds_sections

Table 13: chemiq_company_sds_mappings

Table 14: chemiq_sds_parse_queue

Table 15: chemiq_sds_change_log

Table 16: chemiq_sds_web_search_cache

Table 17: chemiq_sds_bulk_upload_jobs

Part 4: Inventory Integration

Table 18: chemiq_inventory (Simplified)

Table 18b: chemiq_inventory_attributes

Table 18c: chemiq_sds_reviews_pending

Search and Performance Optimization

Full-Text Search Strategy

Array and JSONB Search

Deduplication Strategy

Version Tracking

Linked-List Pattern

Unified Search Architecture

Example Search Flow

Database Migrations

Migration 1: SDS Global Schema

Migration 2: Product Catalog and EPA Labels

Migration 3: Product Images

Migration 4: Unique Constraints for Barcode and Inventory Location

Migration 5: Add search_type to SDS Web Search Cache

Migration 6: Classification & Tagging

SQLAlchemy Models

SDS Models

Product Catalog & EPA Label Models

Example Queries

Search product by barcode

Find product and all its documents

Search EPA labels by target pest

Find all products containing a specific chemical (CAS)

Get company's label library with usage stats

Compliance and Regulatory

OSHA Hazard Communication Standard (HazCom 2012)

EPA FIFRA (Pesticides)

Data Retention

Part 5: Bulk Import and Supporting Tables

Table 19: chemiq_inventory_uploads

Table 22: background_job_logs

Table 23: chemiq_label_images (Legacy)

Table 24: ref_chemical_registry

Table 25: ref_chemical_synonyms

Part 6: Chemical Enrichment, Compliance & PPE Systems

Table 26: chemiq_pubchem_cache

Table 27: chemiq_chemical_fetch_queue

Table 28: chemiq_cache_statistics

Table 29: chemiq_unit_conversions

Table 30: chemiq_product_density

Table 31: chemiq_site_chemical_summary

Table 32: chemiq_site_hazard_summary

Table 33: chemiq_site_quantity_history

Table 34: chemiq_fire_code_thresholds

Table 35: chemiq_regulatory_lists

Table 36: chemiq_company_compliance_settings

Table 37: chemiq_compliance_results

Table 38: ppe_glove_chemical_resistance

Table 39: ppe_eye_protection_reference

Table 40: ppe_respiratory_reference

Table 41: ppe_body_protection_reference