S3 Storage Strategy
This document describes the S3 storage architecture for the Tellus EHS platform, following industry best practices for multi-tenant SaaS applications.
Overview
The platform uses a single S3 bucket per environment with key prefixes for organization and tenant isolation. This approach is the industry standard for multi-tenant applications.
Why Single Bucket with Prefixes?
| Factor | Multiple Buckets | Single Bucket + Prefixes |
|---|---|---|
| AWS Limits | Default 100 bucket limit per account | Unlimited prefixes |
| IAM Complexity | Separate policies per bucket | Single policy with prefix conditions |
| Cost | Same (per-GB pricing) | Same |
| Performance | No difference | S3 scales infinitely with prefixes |
| Operations | Complex backup/replication | Simple single-bucket operations |
| Monitoring | Multiple metrics streams | Unified metrics with prefix filtering |
Bucket Naming
Environment-specific buckets ensure complete isolation between environments:
tellus-ehs-documents-dev # Development
tellus-ehs-documents-staging # Staging
tellus-ehs-documents-prod # Production
Directory Structure
tellus-ehs-documents-{env}/
│
├── chemiq/
│ │
│ ├── product-images/ # User-uploaded label photos
│ │ └── {company_id}/
│ │ └── {product_id}/
│ │ └── {image_id}.jpg
│ │
│ ├── sds/
│ │ ├── global/ # Shared SDS repository (verified)
│ │ │ └── {sds_id}/
│ │ │ └── {revision_date}_{checksum}.pdf
│ │ │
│ │ └── company/ # Company-uploaded (unverified)
│ │ └── {company_id}/
│ │ └── {sds_id}_{checksum}.pdf
│ │
│ └── epa-labels/
│ ├── global/ # EPA label repository (verified)
│ │ └── {label_id}/
│ │ └── {version}_{checksum}.pdf
│ │
│ └── company/ # Company-uploaded
│ └── {company_id}/
│ └── {label_id}_{checksum}.pdf
│
└── temp/ # Short-lived processing files
└── uploads/
└── {company_id}/
└── {upload_id}/
└── {original_filename}
Key Patterns
Product Images
chemiq/product-images/{company_id}/{product_id}/{image_id}.jpg
- Use case: Label photos uploaded during inventory entry
- Isolation: By company_id prefix
- Deduplication: Via
file_checksumin database (not filename)
SDS Documents (Global)
chemiq/sds/global/{sds_id}/{revision_date}_{checksum}.pdf
- Use case: Verified SDS documents shared across all companies
- Versioning: revision_date in path supports multiple versions
- Deduplication: Checksum in filename prevents duplicates
SDS Documents (Company)
chemiq/sds/company/{company_id}/{sds_id}_{checksum}.pdf
- Use case: Company-uploaded SDSs before verification
- Isolation: Full company_id isolation
- Workflow: Can be promoted to global after verification
EPA Labels (Global)
chemiq/epa-labels/global/{label_id}/{version}_{checksum}.pdf
- Use case: EPA label repository (from EPA PPLS API or verified uploads)
- Versioning: Label version in path
EPA Labels (Company)
chemiq/epa-labels/company/{company_id}/{label_id}_{checksum}.pdf
- Use case: Company-uploaded labels before verification
Temporary Files
temp/uploads/{company_id}/{upload_id}/{filename}
- Use case: Processing uploads before final storage
- Lifecycle: Auto-deleted after 24 hours via S3 lifecycle rule
Configuration
Python Configuration
The S3 configuration is centralized in app/core/s3_config.py:
from app.core.s3_config import (
S3_BUCKET,
S3Prefixes,
S3KeyGenerator,
S3PresignedConfig
)
# Generate a product image key
key = S3KeyGenerator.product_image(
company_id="abc123",
product_id="prod456",
image_id="img789",
extension="jpg"
)
# Result: chemiq/product-images/abc123/prod456/img789.jpg
# Generate an SDS key
key = S3KeyGenerator.sds_global(
sds_id="sds123",
revision_date="2024-01-15",
checksum="a1b2c3d4e5f6"
)
# Result: chemiq/sds/global/sds123/2024-01-15_a1b2c3d4e5f6.pdf
Environment Variables
# Optional: Override the bucket name
S3_BUCKET=custom-bucket-name
# Required: AWS credentials (or use IAM roles)
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_REGION=us-east-1
Security
IAM Policy Example
Company-scoped access (for direct S3 access if needed):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/*/company/{company_id}/*",
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/product-images/{company_id}/*"
]
},
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": [
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/sds/global/*",
"arn:aws:s3:::tellus-ehs-documents-prod/chemiq/epa-labels/global/*"
]
}
]
}
Access Patterns
| Document Type | Write Access | Read Access |
|---|---|---|
| Product Images | Own company only | Own company only |
| SDS (Global) | Admin/System | All companies |
| SDS (Company) | Own company only | Own company only |
| EPA Labels (Global) | Admin/System | All companies |
| EPA Labels (Company) | Own company only | Own company only |
| Temp Files | Own company only | Own company only |
Presigned URLs
All frontend access uses presigned URLs generated by the backend:
from app.core.s3_config import S3PresignedConfig
# Default expiration times
S3PresignedConfig.PRODUCT_IMAGE_EXPIRY = 3600 # 1 hour
S3PresignedConfig.SDS_DOCUMENT_EXPIRY = 3600 # 1 hour
S3PresignedConfig.UPLOAD_URL_EXPIRY = 900 # 15 minutes
S3 Bucket Configuration
Recommended Settings
- Versioning: Enabled (for audit trail and accidental deletion recovery)
- Encryption: SSE-S3 or SSE-KMS (at-rest encryption)
- Public Access: Blocked (all access via presigned URLs)
- CORS: Configured for frontend domain
Lifecycle Rules
{
"Rules": [
{
"ID": "DeleteTempFiles",
"Status": "Enabled",
"Filter": {
"Prefix": "temp/"
},
"Expiration": {
"Days": 1
}
},
{
"ID": "TransitionOldVersions",
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
}
]
}
CORS Configuration
[
{
"AllowedHeaders": ["*"],
"AllowedMethods": ["GET", "PUT", "POST"],
"AllowedOrigins": [
"https://app.tellus-ehs.com",
"https://staging.tellus-ehs.com",
"http://localhost:5174"
],
"ExposeHeaders": ["ETag"],
"MaxAgeSeconds": 3600
}
]
Cost Optimization
-
Storage Classes:
- Standard: Active documents (< 30 days)
- Intelligent-Tiering: SDSs and EPA labels (automatically optimized)
-
Cost Allocation Tags:
company_id: For per-customer cost trackingdocument_type: For cost analysis by document type
-
Request Optimization:
- Use multipart uploads for files > 5MB
- Implement client-side caching with presigned URL expiry
Monitoring
CloudWatch Metrics to Track
BucketSizeBytes- Total storageNumberOfObjects- Object count4xxErrors- Client errors (access denied, etc.)5xxErrors- Server errors
Alerts
- Storage threshold: Alert when bucket size exceeds threshold
- Error rate: Alert on elevated 4xx/5xx error rates
- Unusual access patterns: Alert on access from unexpected regions
Migration Notes
If migrating from a different structure:
- Create new keys following the pattern
- Copy objects to new locations
- Update database references (
s3_keycolumns) - Delete old objects after verification
- Update lifecycle rules
Related Files
- Configuration:
tellus-ehs-hazcom-service/app/core/s3_config.py - Product Image Service:
tellus-ehs-hazcom-service/app/services/chemiq/product_image_service.py - Database Schema: ChemIQ Schema (Table 4: chemiq_product_images)