About the Author
Elvin Suleymanov β Software Engineer | C# & .NET Specialist | Systems Thinking
Passionate Software Engineer with expertise in C# (.NET), Python, TypeScript, PostgreSQL, and cloud-native technologies. Experience in building high-performance applications, optimizing databases, and delivering solutions that provide real business value.
"Clean code, solid architecture, and real business value β thatβs my development philosophy."
Abstract
Modern web applications face a fundamental architectural decision when handling file storage: should binary data be stored directly in relational databases or delegated to specialized object storage systems? This research presents a comprehensive investigation of file storage strategies in ASP.NET applicatioβ¦
About the Author
Elvin Suleymanov β Software Engineer | C# & .NET Specialist | Systems Thinking
Passionate Software Engineer with expertise in C# (.NET), Python, TypeScript, PostgreSQL, and cloud-native technologies. Experience in building high-performance applications, optimizing databases, and delivering solutions that provide real business value.
"Clean code, solid architecture, and real business value β thatβs my development philosophy."
Abstract
Modern web applications face a fundamental architectural decision when handling file storage: should binary data be stored directly in relational databases or delegated to specialized object storage systems? This research presents a comprehensive investigation of file storage strategies in ASP.NET applications, comparing PostgreSQL-based storage against Azure Blob Storage (emulated via Azurite) for large binary objects.
Through systematic benchmarking and hardware-level analysis, we examine performance characteristics including upload throughput, read latency, backup efficiency, and resource utilization. Our experiments reveal significant trade-offs between database storage and object storage, with implications for write amplification, cache behavior, and operational complexity.
The study provides quantitative evidence for storage decision-making, considering factors such as file size distribution, access patterns, and hardware constraints. We present a reproducible experimental framework using Docker containers, enabling researchers and practitioners to validate and extend our findings.
Key contributions include: (1) a comparative performance analysis of database vs. object storage, (2) hardware-aware considerations for storage decisions, (3) a complete reproducible research artifact, and (4) real-world deployment guidance for Azerbaijan-specific contexts.
Our results demonstrate that object storage (Azurite/Azure Blob) provides superior performance for large files (>1MB) and high-throughput scenarios, while database storage offers transactional consistency and simpler operational models for smaller files. The choice depends on specific application requirements, scale, and operational constraints.
Keywords: File Storage, PostgreSQL, Object Storage, Azure Blob Storage, Performance Benchmarking, System Architecture
1. Introduction
1.1 Background
The proliferation of user-generated content, document management systems, and media-rich applications has made efficient file storage a critical concern for modern web applications. As applications scale, the decision of where and how to store binary data becomes increasingly important, affecting performance, cost, maintainability, and user experience.
Traditional approaches have favored storing files directly in relational databases, leveraging transactional guarantees and simplified data models. However, as file sizes grow and storage requirements scale, alternative architectures using specialized object storage systems have gained prominence. These systems, exemplified by Amazon S3, Azure Blob Storage, and Google Cloud Storage, are designed specifically for large-scale binary data storage.
1.2 Problem Domain
ASP.NET applications, particularly those deployed in enterprise and government contexts, must balance multiple competing requirements:
- Performance: Fast upload and retrieval of files
- Reliability: Data durability and availability
- Scalability: Handling growing storage and traffic demands
- Cost: Efficient resource utilization
- Operational Simplicity: Manageable deployment and maintenance
- Compliance: Meeting regulatory and security requirements
The choice between database storage and object storage impacts all of these dimensions, yet clear guidance based on empirical evidence is often lacking.
1.3 Research Objectives
This research aims to:
- Quantify Performance Differences: Measure upload throughput, read latency, and resource utilization for both storage strategies
- Analyze Hardware Implications: Understand how storage decisions affect IO operations, write amplification, and cache behavior
- Provide Reproducible Framework: Create a complete experimental setup that others can use to validate and extend findings
- Offer Practical Guidance: Translate research findings into actionable recommendations for system architects
1.4 Scope and Limitations
This study focuses on:
- Technology Stack: ASP.NET Core, PostgreSQL, and Azure Blob Storage (via Azurite)
- File Sizes: Small (<100KB) to large (>10MB) binary files
- Access Patterns: Upload, download, and backup operations
- Deployment Context: Docker-based containerized environments
Limitations include:
- Single geographic region (Azerbaijan context)
- Emulated object storage (Azurite rather than production Azure)
- Limited to specific hardware configurations in test environment
- Focus on technical performance rather than cost analysis
1.5 Article Structure
The remainder of this article is organized as follows:
- Section 2: Related work and literature review
- Section 3: Problem statement and research questions
- Section 4: Hardware storage internals (PostgreSQL pages, TOAST, WAL)
- Section 5: System architecture and design
- Section 6: Experimental setup and methodology
- Section 7: Implementation details
- Section 8: Results and analysis
- Section 9: Discussion and interpretation
- Section 10: Real-world context (Azerbaijan deployment)
- Section 11: Ownership models and lessons learned
- Section 12: Conclusion
- Section 13: References
2. Related Work
2.1 Database Storage for Binary Data
The practice of storing binary data in relational databases has been extensively studied. Stonebraker et al. [1] examined the trade-offs of storing large objects (LOBs) in databases, identifying performance degradation as object sizes increase. PostgreSQLβs TOAST (The Oversized-Attribute Storage Technique) mechanism [2] addresses this by automatically moving large values to separate storage, but still maintains them within the database system.
Recent work by Pavlo et al. [3] on database storage engines has highlighted the write amplification issues inherent in B-tree structures when handling large binary data. The research demonstrates that traditional database storage can lead to significant overhead for large files due to page fragmentation and WAL (Write-Ahead Logging) overhead.
2.2 Object Storage Systems
Object storage systems have emerged as specialized solutions for large-scale binary data. The architecture of systems like Amazon S3 [4] and Azure Blob Storage [5] emphasizes horizontal scalability and eventual consistency over transactional guarantees. Research by Balakrishnan et al. [6] on distributed storage systems has shown that object storage can achieve higher throughput for large files compared to database storage.
The CAP theorem [7] implications for object storage have been explored by Vogels [8], demonstrating how object storage systems prioritize availability and partition tolerance over strong consistency, making them suitable for file storage use cases.
2.3 Hybrid Architectures
Several studies have examined hybrid approaches combining databases for metadata with object storage for binary data. The "database + blob storage" pattern has been documented in various contexts [9, 10], but quantitative performance comparisons are less common.
Work by Armbrust et al. [11] on cloud storage architectures has shown that separating metadata and data storage can improve scalability, but introduces complexity in maintaining consistency between systems.
2.4 Performance Benchmarking
Performance evaluation methodologies for storage systems have been established in the literature. The TPC benchmarks [12] provide standardized approaches, though they focus primarily on transactional workloads rather than file storage.
Recent work on microbenchmarking storage systems [13] has emphasized the importance of hardware-aware testing, considering factors such as SSD write amplification, cache behavior, and IO scheduler effects.
2.5 ASP.NET and .NET Storage Patterns
The .NET ecosystem has specific patterns and best practices for file storage. Microsoftβs documentation [14] recommends object storage for large files, but provides limited quantitative justification. Community-driven benchmarks [15] have shown mixed results, often depending on specific deployment configurations.
2.6 Research Gap
While extensive literature exists on both database storage and object storage systems individually, there is a gap in:
- Direct Comparative Studies: Few studies provide head-to-head performance comparisons with identical workloads
- Hardware-Aware Analysis: Limited research connecting storage decisions to underlying hardware behavior
- Reproducible Artifacts: Most studies lack complete, reproducible experimental frameworks
- Real-World Context: Limited consideration of operational and ownership concerns in specific deployment contexts
This research addresses these gaps by providing a comprehensive, reproducible comparison with hardware-level analysis and practical deployment guidance.
3. Problem Statement
3.1 Core Research Question
How do file storage strategies (database storage vs. object storage) compare in terms of performance, resource utilization, and operational characteristics for ASP.NET applications?
This question encompasses several sub-questions:
- What are the performance differences (throughput, latency) between storing files in PostgreSQL versus Azure Blob Storage?
- How do file size and access patterns affect the relative performance of each approach?
- What are the hardware-level implications (IO operations, write amplification, cache behavior) of each storage strategy?
- What are the operational trade-offs (backup complexity, scalability, maintenance) between approaches?
- Under what conditions should each approach be preferred?
3.2 Problem Motivation
3.2.1 The Storage Decision Dilemma
System architects face a fundamental choice when designing file storage:
Option A: Database Storage
- Store files directly in PostgreSQL (using BYTEA or TOAST)
- Metadata and binary data in the same system
- Transactional consistency guarantees
- Simpler operational model
Option B: Object Storage
- Store files in Azure Blob Storage (or similar)
- Store only metadata in PostgreSQL
- Separate systems for metadata and data
- More complex operational model
Both approaches have advocates, but decision-making is often based on intuition, vendor recommendations, or limited anecdotal evidence rather than systematic evaluation.
3.2.2 Real-World Impact
The storage decision has cascading effects:
- Performance: Affects user experience, especially for large file uploads/downloads
- Cost: Different resource requirements and scaling characteristics
- Reliability: Different failure modes and recovery mechanisms
- Maintainability: Different operational procedures and skill requirements
- Scalability: Different bottlenecks and scaling strategies
3.2.3 Lack of Empirical Evidence
While both approaches are widely used, there is limited published research providing:
- Quantitative performance comparisons under identical conditions
- Hardware-level analysis of resource utilization
- Guidance on when to choose each approach
- Reproducible experimental frameworks
3.3 Research Hypotheses
Based on preliminary analysis and related work, we formulate the following hypotheses:
H1: Object storage (Azurite/Azure Blob) will demonstrate higher throughput for large files (>1MB) compared to database storage.
H2: Database storage will show lower latency for small files (<100KB) due to reduced network overhead and cache locality.
H3: Database storage will exhibit higher write amplification due to WAL and page-level updates.
H4: Backup operations will be faster for object storage due to incremental and parallel capabilities.
H5: Resource utilization (CPU, memory, disk IO) will differ significantly between approaches, with database storage showing higher variability.
3.4 Success Criteria
This research will be considered successful if it:
- Provides quantitative performance data comparing both approaches
- Identifies clear decision criteria based on file characteristics and use cases
- Delivers a reproducible experimental framework
- Offers actionable guidance for system architects
- Contributes to the body of knowledge on storage system design
3.5 Scope Definition
3.5.1 In Scope
- Performance benchmarking (throughput, latency)
- Resource utilization analysis (CPU, memory, disk IO)
- Hardware-level considerations (write amplification, cache effects)
- Backup and recovery operations
- ASP.NET Core implementation patterns
- Docker-based reproducible environment
3.5.2 Out of Scope
- Cost analysis (though resource utilization data can inform cost estimates)
- Multi-region deployment and replication
- Security and access control mechanisms (assumed equivalent)
- Production Azure Blob Storage (using Azurite for reproducibility)
- Other object storage systems (S3, GCS) beyond Azure Blob
- Long-term durability and archival storage
3.6 Expected Contributions
This research contributes:
- Empirical Evidence: Quantitative performance data from systematic benchmarking
- Hardware Awareness: Analysis connecting storage decisions to hardware behavior
- Reproducible Artifact: Complete experimental framework for validation and extension
- Practical Guidance: Decision criteria and recommendations for practitioners
- Research Foundation: Baseline for future studies on storage architectures
4. Hardware Storage Internals
Understanding the hardware-level behavior of storage systems is crucial for making informed architectural decisions. This section examines the internals of PostgreSQL storage and object storage systems, focusing on how they interact with underlying hardware.
4.1 PostgreSQL Storage Architecture
4.1.1 Page-Based Storage
PostgreSQL stores data in fixed-size pages (typically 8KB). Each page contains multiple rows, and the database engine manages these pages through a buffer pool in memory. When storing binary data directly in tables, several mechanisms come into play:
Page Structure:
- Header: Metadata about the page (checksum, free space, etc.)
- Row Pointers: Array of pointers to row data
- Row Data: Actual table data, including binary columns
- Free Space: Unused space within the page
For binary data stored in BYTEA columns, the data is stored inline within the page if it fits. However, this can lead to:
- Page Fragmentation: Large binary values can cause significant wasted space
- Row Size Limits: PostgreSQL has practical limits on row size (typically ~1.6GB, but performance degrades much earlier)
- Cache Inefficiency: Mixing small metadata and large binary data in the same pages reduces cache hit rates
4.1.2 TOAST (The Oversized-Attribute Storage Technique)
PostgreSQL automatically uses TOAST for values exceeding a threshold (typically 2KB). TOAST moves large values to separate storage:
TOAST Mechanism:
- Inline Storage: Small values stored directly in the main table
- Extended Storage: Large values moved to TOAST tables
- External Storage: Very large values stored out-of-line with compression
- Main Table Reference: Main table stores a pointer to TOAST data
TOAST Implications:
- Additional IO: Reading large values requires additional page reads from TOAST tables
- Write Amplification: Updates to large values may require rewriting both main and TOAST pages
- Compression: TOAST can compress data, reducing storage but adding CPU overhead
- Transaction Overhead: TOAST operations are still transactional, requiring WAL writes
4.1.3 Write-Ahead Logging (WAL)
PostgreSQL uses WAL to ensure durability and enable replication:
WAL Process:
- Write Request: Application writes data
- Buffer Update: Data written to shared buffer pool
- WAL Write: Change logged to WAL before commit
- Checkpoint: Periodic flushing of dirty pages to disk
- WAL Archival: Old WAL segments archived for point-in-time recovery
WAL Impact on Binary Data:
- Full Value Logging: Large binary values are fully logged to WAL (unless using
UNLOGGEDtables, which sacrifice durability) - Write Amplification: Each update to a large binary value writes the entire value to WAL
- Replication Overhead: WAL-based replication streams all binary data changes
- Backup Size: WAL archives contain complete history of binary data changes
4.1.4 Buffer Pool and Caching
PostgreSQLβs shared buffer pool caches frequently accessed pages:
Cache Behavior:
- Page-Level Caching: Entire pages cached, not individual rows
- LRU Eviction: Least recently used pages evicted when cache is full
- Mixed Workload Impact: Binary data can evict metadata pages, reducing cache efficiency
- Memory Pressure: Large binary values consume significant buffer pool space
4.2 Object Storage Architecture
4.2.1 Blob Storage Design
Azure Blob Storage (and Azurite emulation) stores objects as independent entities:
Object Storage Characteristics:
- Flat Namespace: Objects identified by container and blob name
- No Transactional Overhead: Each object write is independent
- Append-Optimized: Designed for sequential writes
- Metadata Separation: Object metadata stored separately from data
4.2.2 Write Patterns
Object storage systems are optimized for different write patterns:
Write Operations:
- Put Blob: Complete object replacement (atomic)
- Append Block: Adding data to existing blob (for append blobs)
- Block Upload: Multipart uploads for large files
- No In-Place Updates: Objects are immutable; updates create new versions
Hardware Implications:
- Sequential Writes: Better aligned with SSD write patterns
- Reduced Write Amplification: No page-level fragmentation or WAL overhead
- Parallel Writes: Multiple objects can be written concurrently without coordination
4.2.3 Read Patterns
Object storage read operations:
Read Characteristics:
- Range Reads: Can read specific byte ranges without fetching entire object
- Parallel Reads: Multiple objects can be read concurrently
- CDN Integration: Objects can be served via CDN for global distribution
- No Cache Coherency: Simpler caching model (no transactional consistency requirements)
4.3 Hardware-Level Considerations
4.3.1 SSD Write Amplification
Modern SSDs use flash memory with specific characteristics:
SSD Behavior:
- Page Size: Typically 4KB-16KB pages
- Block Erasure: Must erase entire blocks (typically 256KB-2MB) before writing
- Write Amplification: Actual writes exceed logical writes due to garbage collection
- Wear Leveling: Distributes writes across cells to prevent premature wear
Impact on Storage Strategies:
- Database Storage: Random writes to pages increase write amplification
- Object Storage: Sequential writes better aligned with SSD characteristics
- Large Files: Object storageβs append-optimized design reduces write amplification
4.3.2 IO Patterns
Different storage strategies produce different IO patterns:
Database Storage IO:
- Random Reads/Writes: Accessing specific pages in tables
- Mixed Workload: Metadata and binary data interleaved
- Small IO Operations: Many small reads/writes for page management
- Synchronous Writes: WAL requires synchronous writes for durability
Object Storage IO:
- Sequential Writes: Appending data to objects
- Large IO Operations: Reading/writing entire objects or large ranges
- Asynchronous Writes: Can batch and optimize writes
- Parallel IO: Multiple objects accessed concurrently
4.3.3 Cache Behavior
CPU and memory caches have different characteristics:
Database Storage Caching:
- Page Cache: Operating system caches database pages
- Buffer Pool: PostgreSQLβs own caching layer
- Cache Pollution: Large binary values reduce effective cache size for metadata
- Locality: Related data (metadata + binary) may be co-located
Object Storage Caching:
- Object Cache: Can cache entire objects or ranges
- CDN Cache: Objects can be cached at edge locations
- Cache Efficiency: Metadata queries donβt affect object cache
- Predictable Patterns: Easier to optimize caching strategies
4.4 Performance Implications
The hardware-level differences translate to performance characteristics:
For Small Files (<100KB):
- Database storage may be faster due to reduced network overhead
- Cache locality benefits from co-located metadata and data
- Transactional overhead is minimal for small values
For Large Files (>1MB):
- Object storage benefits from sequential write patterns
- Reduced write amplification on SSDs
- Better parallelization opportunities
- Lower WAL overhead
For Mixed Workloads:
- Database storage can cause cache pollution
- Object storage allows independent scaling of metadata and data access
4.5 Summary
Understanding hardware internals reveals why different storage strategies perform differently:
- PostgreSQL: Optimized for transactional workloads with small to medium rows; large binary data causes write amplification and cache inefficiency
- Object Storage: Optimized for large, immutable objects with sequential access patterns; better aligned with modern SSD characteristics
These fundamental differences form the basis for performance expectations and guide experimental design.
5. System Architecture
This section describes the overall system architecture, including the components, their interactions, and design decisions.
5.1 Architecture Overview
The system implements two storage strategies for comparison:
- Database Storage Strategy: Files stored directly in PostgreSQL
- Object Storage Strategy: Files stored in Azurite (Azure Blob Storage emulation), metadata in PostgreSQL
Both strategies are implemented within the same ASP.NET Core application, allowing direct comparison under identical conditions.
5.2 System Components
The system consists of three main components:
ASP.NET Core API:
-
Upload Controller: Handles file upload requests
-
Download Controller: Handles file download requests
-
Metadata Controller: Manages file metadata operations
-
Storage Service Abstraction: Interface for storage operations
-
Database Storage Strategy: Implements database-based storage
-
Object Storage Strategy: Implements object storage-based storage
PostgreSQL Database:
- Stores metadata for both strategies
- Stores file data directly for database storage strategy
- Provides transactional consistency
Azurite (Azure Blob Storage Emulation):
- Stores binary file data for object storage strategy
- Provides blob storage API
- Emulates Azure Blob Storage behavior
5.3 Storage Strategies
5.3.1 Database Storage Strategy
Design:
- Files stored in PostgreSQL
BYTEAcolumns - Metadata and binary data in the same table
- Single transaction for file upload
- Direct database queries for file retrieval
Schema:
CREATE TABLE files_db (
id UUID PRIMARY KEY,
filename VARCHAR(255),
content_type VARCHAR(100),
file_size BIGINT,
uploaded_at TIMESTAMP,
file_data BYTEA -- Binary data stored here
);
Advantages:
- Transactional consistency
- Single system to manage
- ACID guarantees
- Simpler backup (single database dump)
Disadvantages:
- Database size grows with files
- WAL overhead for large files
- Cache pollution
- Backup/restore complexity for large datasets
5.3.2 Object Storage Strategy
Design:
- Files stored in Azurite blob containers
- Metadata stored in PostgreSQL
- Two-phase commit pattern (metadata + blob)
- Blob storage accessed via Azure SDK
Schema:
CREATE TABLE files_blob (
id UUID PRIMARY KEY,
filename VARCHAR(255),
content_type VARCHAR(100),
file_size BIGINT,
uploaded_at TIMESTAMP,
blob_container VARCHAR(100),
blob_name VARCHAR(500) -- Reference to blob storage
);
Advantages:
- Scalable storage (independent of database)
- Optimized for large files
- Reduced database size
- Better parallelization
Disadvantages:
- Two systems to manage
- Eventual consistency concerns
- More complex error handling
- Separate backup procedures
5.4 API Design
5.4.1 Endpoints
Upload:
POST /api/files/db/upload- Upload to database storagePOST /api/files/blob/upload- Upload to object storage
Download:
GET /api/files/db/{id}- Download from database storageGET /api/files/blob/{id}- Download from object storage
Metadata:
GET /api/files/db/{id}/metadata- Get metadata (database storage)GET /api/files/blob/{id}/metadata- Get metadata (object storage)GET /api/files/db- List all files (database storage)GET /api/files/blob- List all files (object storage)
5.4.2 Service Abstraction
The system uses a strategy pattern to abstract storage operations:
public interface IFileStorageService
{
Task<FileMetadata> UploadAsync(Stream fileStream, string filename, string contentType);
Task<Stream> DownloadAsync(Guid fileId);
Task<FileMetadata> GetMetadataAsync(Guid fileId);
Task DeleteAsync(Guid fileId);
}
This allows both storage strategies to be tested with identical API interfaces.
5.5 Data Flow
5.5.1 Upload Flow (Database Storage)
1. Client β API: POST /api/files/db/upload
2. API β Database Storage Service: UploadAsync()
3. Service β PostgreSQL: INSERT INTO files_db (..., file_data)
4. PostgreSQL: Store data in BYTEA column (TOAST if large)
5. PostgreSQL: Write to WAL
6. PostgreSQL β Service: Return file ID
7. Service β API: Return FileMetadata
8. API β Client: 201 Created with metadata
5.5.2 Upload Flow (Object Storage)
1. Client β API: POST /api/files/blob/upload
2. API β Object Storage Service: UploadAsync()
3. Service β Azurite: Upload blob to container
4. Azurite β Service: Return blob URL/name
5. Service β PostgreSQL: INSERT INTO files_blob (..., blob_name)
6. PostgreSQL: Store metadata only
7. PostgreSQL β Service: Return file ID
8. Service β API: Return FileMetadata
9. API β Client: 201 Created with metadata
5.5.3 Download Flow (Database Storage)
1. Client β API: GET /api/files/db/{id}
2. API β Database Storage Service: DownloadAsync(id)
3. Service β PostgreSQL: SELECT file_data FROM files_db WHERE id = ?
4. PostgreSQL: Read from table (and TOAST if needed)
5. PostgreSQL β Service: Return byte array
6. Service β API: Return file stream
7. API β Client: 200 OK with file content
5.5.4 Download Flow (Object Storage)
1. Client β API: GET /api/files/blob/{id}
2. API β Object Storage Service: DownloadAsync(id)
3. Service β PostgreSQL: SELECT blob_name FROM files_blob WHERE id = ?
4. PostgreSQL β Service: Return blob reference
5. Service β Azurite: Download blob by name
6. Azurite β Service: Return blob stream
7. Service β API: Return file stream
8. API β Client: 200 OK with file content
5.6 Error Handling
5.6.1 Database Storage Errors
- Transaction Rollback: Failed uploads automatically roll back
- Constraint Violations: Handled at database level
- Storage Limits: Database size limits apply
- Connection Failures: Retry logic with exponential backoff
5.6.2 Object Storage Errors
- Partial Failures: Metadata saved but blob upload fails (requires cleanup)
- Blob Not Found: Handle missing blob references
- Container Errors: Handle container creation/access issues
- Network Failures: Retry logic for blob operations
5.7 Deployment Architecture
The system is designed for Docker-based deployment:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose Network β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββ β
β β ASP.NET β β PostgreSQL β β Azurite β β
β β API β β β β β β
β β (Port 5000) β β (Port 5432) β β(Ports β β
β β β β β β 10000- β β
β β β β β β 10002) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
All services communicate over Dockerβs internal network, ensuring reproducible networking conditions.
5.8 Design Decisions
5.8.1 Why Azurite Instead of Production Azure?
- Reproducibility: Local emulation ensures consistent test conditions
- Cost: No cloud costs for experimentation
- Network Independence: Eliminates network latency as a variable
- Control: Full control over storage backend for testing
5.8.2 Why Both Strategies in One Application?
- Fair Comparison: Identical runtime conditions
- Code Reuse: Shared infrastructure and testing code
- Simplicity: Single deployment for both strategies
5.8.3 Why PostgreSQL for Metadata in Both Cases?
- Consistency: Same metadata storage for fair comparison
- Realistic: Most applications use databases for metadata
- Isolation: Focuses comparison on binary storage, not metadata storage
5.9 Scalability Considerations
5.9.1 Database Storage Scaling
- Vertical Scaling: Increase database server resources
- Read Replicas: Scale read operations
- Partitioning: Partition tables by file size or date
- Archival: Move old files to separate storage
5.9.2 Object Storage Scaling
- Horizontal Scaling: Azurite/Azure Blob scales automatically
- CDN Integration: Serve files via CDN
- Parallel Access: Multiple concurrent reads/writes
- Tiered Storage: Use different storage tiers for cost optimization
5.10 Summary
The architecture provides a fair, reproducible framework for comparing storage strategies. By implementing both approaches in the same application with identical APIs, we ensure that performance differences reflect storage strategy choices rather than implementation variations.
6. Experimental Setup
This section describes the experimental methodology, including hardware configuration, software stack, test scenarios, and data collection procedures.
6.1 Hardware Configuration
6.1.1 Test Environment
Primary Test Machine:
- CPU: System metrics available in
data/raw/docker-stats-*.json - Memory: System metrics available in
data/raw/docker-stats-*.json - Storage: Docker volumes on host filesystem (SSD/HDD dependent on host)
- Network: Local Docker network (no external network latency)
Docker Configuration:
- PostgreSQL Container: 2GB memory limit, 4 CPU cores
- Azurite Container: 1GB memory limit, 2 CPU cores
- API Container: 1GB memory limit, 2 CPU cores
6.1.2 Storage Characteristics
Understanding the underlying storage is critical for interpreting results:
- SSD vs HDD: Different IO characteristics
- Write Amplification: Measured for both strategies
- Cache Behavior: Buffer pool and OS cache effects
- IO Scheduler: Linux IO scheduler settings
6.2 Software Stack
6.2.1 Components
- ASP.NET Core: 8.0
- PostgreSQL: 16.x
- Azurite: Latest stable version
- Docker: Latest stable version
- Docker Compose: Latest stable version
- .NET SDK: 8.0
6.2.2 Configuration
PostgreSQL Settings:
shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1 # For SSD
effective_io_concurrency = 200 # For SSD
Azurite Settings:
- Default configuration (emulating Azure Blob Storage)
- Local file system backend
- No replication or redundancy
6.3 Test Scenarios
6.3.1 File Size Distribution
Tests cover a range of file sizes to understand performance characteristics:
| File Size Category | Size Range | Test Files |
|---|---|---|
| Small | < 100 KB | 100 files |
| Medium | 100 KB - 1 MB | 50 files |
| Large | 1 MB - 10 MB | 20 files |
| Very Large | > 10 MB | 10 files |
6.3.2 Test Operations
Upload Tests:
- Single file uploads (various sizes)
- Batch uploads (multiple files concurrently)
- Sequential uploads (one after another)
- Mixed workload (small and large files)
Download Tests:
- Single file downloads
- Range reads (partial file downloads)
- Concurrent downloads
- Sequential downloads
Metadata Operations:
- Metadata queries
- List operations
- Search/filter operations
Backup Tests:
- Full database backup (database strategy)
- Blob storage backup (object strategy)
- Backup size comparison
- Backup time comparison
6.3.3 Workload Patterns
Write-Heavy Workload:
- 80% uploads, 20% downloads
- Simulates content creation scenario
Read-Heavy Workload:
- 20% uploads, 80% downloads
- Simulates content delivery scenario
Balanced Workload:
- 50% uploads, 50% downloads
- Simulates general-purpose application
6.4 Benchmark Methodology
6.4.1 Warm-up Phase
Before each test run:
- System idle for 30 seconds
- Pre-load 10 files of each size category
- Allow caches to stabilize
- Clear performance counters
6.4.2 Test Execution
For Each Test:
- Preparation: Clear previous test data
- Execution: Run test operation(s)
- Measurement: Collect metrics during execution
- Cooldown: Wait 10 seconds between tests
- Repetition: Run each test 5 times for statistical significance
6.4.3 Metrics Collection
Performance Metrics:
- Throughput: Files per second, MB/s
- Latency: P50, P95, P99 percentiles
- Response Time: End-to-end operation time
- Error Rate: Failed operations percentage
Resource Metrics:
- CPU Usage: Average and peak CPU utilization
- Memory Usage: Peak memory consumption
- Disk IO: Read/write operations per second
- Network IO: Bytes transferred (for object storage)
- Database Size: Total database size growth
- WAL Size: Write-ahead log size
Storage Metrics:
- Write Amplification: Actual writes / logical writes
- Cache Hit Rate: Buffer pool and OS cache hit rates
- Fragmentation: Database and file system fragmentation
6.5 Data Collection Tools
6.5.1 Application Metrics
- Custom Instrumentation: ASP.NET Core metrics middleware
- Structured Logging: JSON logs with timestamps
- Performance Counters: .NET performance counters
6.5.2 System Metrics
- Docker Stats: Container resource usage
- PostgreSQL Statistics:
pg_stat_*views - Linux Tools:
iostat,vmstat,sar - Database Queries: Size and performance queries
6.5.3 Automated Collection
Scripts collect metrics at regular intervals:
- During Tests: Metrics every 1 second
- Between Tests: Summary statistics
- Post-Test: Final measurements and cleanup
6.6 Statistical Analysis
6.6.1 Descriptive Statistics
For each metric:
- Mean: Average value
- Median: 50th percentile
- Standard Deviation: Variability measure
- Min/Max: Range of values
- Percentiles: P50, P95, P99
6.6.2 Comparative Analysis
Between Strategies:
- T-Tests: Statistical significance of differences
- Effect Size: Practical significance (Cohenβs d)
- Confidence Intervals: 95% confidence intervals for means
Across File Sizes:
- Regression Analysis: Performance vs. file size
- Breakpoint Analysis: Identify size thresholds where strategies differ
6.6.3 Visualization
- Time Series: Performance over time
- Box Plots: Distribution comparison
- Scatter Plots: Relationship between variables
- Heat Maps: Multi-dimensional analysis
6.7 Reproducibility Measures
6.7.1 Environment Isolation
- Docker Containers: Isolated runtime environments
- Version Pinning: Specific versions of all dependencies
- Configuration Files: All settings in version control
- Seed Data: Consistent test data generation
6.7.2 Deterministic Testing
- Fixed Random Seeds: Reproducible test data
- Timing Controls: Consistent timing between runs
- Resource Limits: Fixed container resource allocations
- Network Isolation: No external network dependencies
6.7.3 Documentation
- Complete Setup Instructions: Step-by-step reproduction guide
- All Configurations: Every setting documented
- Raw Data: All collected metrics preserved
- Analysis Scripts: Reproducible data analysis
6.8 Ethical Considerations
6.8.1 Data Privacy
- No Real User Data: All test data is synthetic
- No Personal Information: No PII in test files
- Local Testing: All experiments run locally
6.8.2 Resource Usage
- Contained Experiments: Limited to test environment
- No External Services: No impact on production systems
- Resource Monitoring: Track and limit resource consumption
6.9 Limitations
6.9.1 Test Environment Limitations
- Single Machine: Not testing distributed scenarios
- Local Network: No real network latency
- Limited Scale: Testing up to [X] files, [Y] GB total
- Synthetic Workloads: May not match all real-world patterns
6.9.2 Measurement Limitations
- Sampling Frequency: 1-second intervals may miss brief spikes
- Container Overhead: Docker adds some measurement overhead
- Cache Effects: Results depend on cache state
- Timing Precision: Limited by system clock resolution
6.10 Summary
The experimental setup is designed for:
- Fair Comparison: Identical conditions for both strategies
- Statistical Rigor: Multiple runs and proper analysis
- Reproducibility: Complete environment in version control
- Comprehensive Coverage: Multiple file sizes and workloads
This methodology ensures that results are reliable, reproducible, and meaningful for decision-making.
7. Implementation
This section provides a detailed walkthrough of the implementation, including code structure, key components, and design patterns.
7.1 Project Structure
code/
βββ api/
β βββ FileStorageApi/
β β βββ Controllers/
β β βββ Services/
β β βββ Models/
β β βββ Data/
β β βββ Program.cs
β βββ FileStorageApi.csproj
βββ database/
β βββ schema.sql
β βββ migrations/
β βββ init.sql
βββ storage/
β βββ azurite/
β βββ docker-compose.azurite.yml
βββ benchmarks/
βββ BenchmarkRunner/
β βββ Scenarios/
β βββ Collectors/
β βββ Program.cs
βββ BenchmarkRunner.csproj
7.2 API Implementation
7.2.1 Service Abstraction
The core abstraction allows switching between storage strategies:
public interface IFileStorageService
{
Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken = default);
Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken = default);
Task<FileMetadata> GetMetadataAsync(
Guid fileId,
CancellationToken cancellationToken = default);
Task DeleteAsync(
Guid fileId,
CancellationToken cancellationToken = default);
Task<IEnumerable<FileMetadata>> ListAsync(
int skip = 0,
int take = 100,
CancellationToken cancellationToken = default);
}
7.2.2 Database Storage Implementation
Service Implementation:
public class DatabaseStorageService : IFileStorageService
{
private readonly ApplicationDbContext _context;
private readonly ILogger<DatabaseStorageService> _logger;
public async Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken)
{
using var memoryStream = new MemoryStream();
await fileStream.CopyToAsync(memoryStream, cancellationToken);
var fileData = memoryStream.ToArray();
var file = new FileEntity
{
Id = Guid.NewGuid(),
Filename = filename,
ContentType = contentType,
FileSize = fileData.Length,
FileData = fileData, // Stored in BYTEA column
UploadedAt = DateTime.UtcNow
};
_context.Files.Add(file);
await _context.SaveChangesAsync(cancellationToken);
return new FileMetadata
{
Id = file.Id,
Filename = file.Filename,
ContentType = file.ContentType,
FileSize = file.FileSize,
UploadedAt = file.UploadedAt
};
}
public async Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken)
{
var file = await _context.Files
.FindAsync(new object[] { fileId }, cancellationToken);
if (file == null)
throw new FileNotFoundException($"File {fileId} not found");
return new MemoryStream(file.FileData);
}
// ... other methods
}
Entity Model:
public class FileEntity
{
public Guid Id { get; set; }
public string Filename { get; set; }
public string ContentType { get; set; }
public long FileSize { get; set; }
public DateTime UploadedAt { get; set; }
public byte[] FileData { get; set; } // BYTEA in PostgreSQL
}
DbContext Configuration:
public class ApplicationDbContext : DbContext
{
public DbSet<FileEntity> Files { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<FileEntity>(entity =>
{
entity.ToTable("files_db");
entity.HasKey(e => e.Id);
entity.Property(e => e.Filename).HasMaxLength(255);
entity.Property(e => e.ContentType).HasMaxLength(100);
entity.Property(e => e.FileData)
.HasColumnType("bytea"); // PostgreSQL BYTEA type
});
}
}
7.2.3 Object Storage Implementation
Service Implementation:
public class ObjectStorageService : IFileStorageService
{
private readonly BlobServiceClient _blobServiceClient;
private readonly ApplicationDbContext _context;
private readonly ILogger<ObjectStorageService> _logger;
private const string ContainerName = "files";
public async Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken)
{
// Ensure container exists
var containerClient = _blobServiceClient.GetBlobContainerClient(ContainerName);
await containerClient.CreateIfNotExistsAsync(cancellationToken: cancellationToken);
// Generate unique blob name
var blobName = $"{Guid.NewGuid()}/{filename}";
var blobClient = containerClient.GetBlobClient(blobName);
// Upload to blob storage
await blobClient.UploadAsync(
fileStream,
new BlobUploadOptions
{
HttpHeaders = new BlobHttpHeaders
{
ContentType = contentType
}
},
cancellationToken);
// Get file size
var properties = await blobClient.GetPropertiesAsync(cancellationToken: cancellationToken);
var fileSize = properties.Value.ContentLength;
// Store metadata in database
var file = new FileBlobEntity
{
Id = Guid.NewGuid(),
Filename = filename,
ContentType = contentType,
FileSize = fileSize,
BlobContainer = ContainerName,
BlobName = blobName,
UploadedAt = DateTime.UtcNow
};
_context.FilesBlob.Add(file);
await _context.SaveChangesAsync(cancellationToken);
return new FileMetadata
{
Id = file.Id,
Filename = file.Filename,
ContentType = file.ContentType,
FileSize = file.FileSize,
UploadedAt = file.UploadedAt
};
}
public async Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken)
{
// Get metadata from database
var file = await _context.FilesBlob
.FindAsync(new object[] { fileId }, cancellationToken);
if (file == null)
throw new FileNotFoundException($"File {fileId} not found");
// Download from blob storage
var containerClient = _blobServiceClient.GetBlobContainerClient(file.BlobContainer);
var blobClient = containerClient.GetBlobClient(file.BlobName);
var response = await blobClient.DownloadStreamingAsync(cancellationToken);
return response.Value.Content;
}
// ... other methods
}
Entity Model:
public class FileBlobEntity
{
public Guid Id { get; set; }
public string Filename { get; set; }
public string ContentType { get; set; }
public long FileSize { get; set; }
public string BlobContainer { get; set; }
public string BlobName { get; set; }
public DateTime UploadedAt { get; set; }
}
7.2.4 Controllers
Database Storage Controller:
[ApiController]
[Route("api/files/db")]
public class DatabaseStorageController : ControllerBase
{
private readonly DatabaseStorageService _storageService;
private readonly ILogger<DatabaseStorageController> _logger;
[HttpPost("upload")]
public async Task<ActionResult<FileMetadata>> Upload(
IFormFile file,
CancellationToken cancellationToken)
{
if (file == null || file.Length == 0)
return BadRequest("No file provided");
using var stream = file.OpenReadStream();
var metadata = await _storageService.UploadAsync(
stream,
file.FileName,
file.ContentType,
cancellationToken);
return CreatedAtAction(
nameof(GetMetadata),
new { id = metadata.Id },
metadata);
}
[HttpGet("{id}")]
public async Task<IActionResult> Download(
Guid id,
CancellationToken cancellationToken)
{
try
{
var metadata = await _storageService.GetMetadataAsync(id, cancellationToken);
var stream = await _storageService.DownloadAsync(id, cancellationToken);
return File(stream, metadata.ContentType, metadata.Filename);
}
catch (FileNotFoundException)
{
return NotFound();
}
}
// ... other endpoints
}
Object Storage Controller:
Similar structure but using ObjectStorageService and route /api/files/blob.
7.2.5 Dependency Injection
// Program.cs
var builder = WebApplication.CreateBuilder(args);
// Database
builder.Services.AddDbContext<ApplicationDbContext>(options =>
options.UseNpgsql(builder.Configuration.GetConnectionString("PostgreSQL")));
// Blob Storage
builder.Services.AddSingleton(serviceProvider =>
{
var connectionString = builder.Configuration.GetConnectionString("Azurite");
return new BlobServiceClient(connectionString);
});
// Storage Services
builder.Services.AddScoped<DatabaseStorageService>();
builder.Services.AddScoped<ObjectStorageService>();
// Controllers
builder.Services.AddControllers();
7.3 Database Schema
7.3.1 Database Storage Table
CREATE TABLE files_db (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
filename VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL,
file_size BIGINT NOT NULL,
uploaded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
file_data BYTEA NOT NULL
);
CREATE INDEX idx_files_db_uploaded_at ON files_db(uploaded_at);
CREATE INDEX idx_files_db_filename ON files_db(filename);
7.3.2 Object Storage Metadata Table
CREATE TABLE files_blob (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
filename VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL,
file_size BIGINT NOT NULL,
uploaded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
blob_container VARCHAR(100) NOT NULL,
blob_name VARCHAR(500) NOT NULL,
UNIQUE(blob_container, blob_name)
);
CREATE INDEX idx_files_blob_uploaded_at ON files_blob(uploaded_at);
CREATE INDEX idx_files_blob_filename ON files_blob(filename);
CREATE INDEX idx_files_blob_container_name ON files_blob(blob_container, blob_name);
7.4 Benchmark Implementation
7.4.1 Benchmark Runner
public class BenchmarkRunner
{
private readonly HttpClient _httpClient;
private readonly MetricsCollector _metricsCollector;
public async Task<BenchmarkResult> RunUploadBenchmark(
string storageType,
int fileCount,
int fileSizeBytes)
{
var results = new List<OperationResult>();
for (int i = 0; i < fileCount; i++)
{
var fileData = GenerateTestFile(fileSizeBytes);
var startTime = DateTime.UtcNow;
try
{
var response = await UploadFile(storageType, fileData);
var duration = (DateTime.UtcNow - startTime).TotalMilliseconds;
results.Add(new OperationResult
{
Success = response.IsSuccessStatusCode,
DurationMs = duration,
FileSize = fileSizeBytes
});
}
catch (Exception ex)
{
results.Add(new OperationResult
{
Success = false,
Error = ex.Message
});
}
}
return new BenchmarkResult
{
Operation = "Upload",
StorageType = storageType,
Results = results,
Summary = CalculateSummary(results)
};
}
// ... other benchmark methods
}
7.4.2 Metrics Collection
public class MetricsCollector
{
public async Task<SystemMetrics> CollectMetricsAsync()
{
var metrics = new SystemMetrics
{
Timestamp = DateTime.UtcNow,
CpuUsage = await GetCpuUsageAsync(),
MemoryUsage = await GetMemoryUsageAsyn