Database vs Object Storage: Performance, Reliability, and System Design

About the Author

Elvin Suleymanov — Software Engineer | C# & .NET Specialist | Systems Thinking

Passionate Software Engineer with expertise in C# (.NET), Python, TypeScript, PostgreSQL, and cloud-native technologies. Experience in building high-performance applications, optimizing databases, and delivering solutions that provide real business value.

"Clean code, solid architecture, and real business value — that’s my development philosophy."

Abstract

Modern web applications face a fundamental architectural decision when handling file storage: should binary data be stored directly in relational databases or delegated to specialized object storage systems? This research presents a comprehensive investigation of file storage strategies in ASP.NET applicatio…

About the Author

Elvin Suleymanov — Software Engineer | C# & .NET Specialist | Systems Thinking

"Clean code, solid architecture, and real business value — that’s my development philosophy."

Abstract

Through systematic benchmarking and hardware-level analysis, we examine performance characteristics including upload throughput, read latency, backup efficiency, and resource utilization. Our experiments reveal significant trade-offs between database storage and object storage, with implications for write amplification, cache behavior, and operational complexity.

The study provides quantitative evidence for storage decision-making, considering factors such as file size distribution, access patterns, and hardware constraints. We present a reproducible experimental framework using Docker containers, enabling researchers and practitioners to validate and extend our findings.

Key contributions include: (1) a comparative performance analysis of database vs. object storage, (2) hardware-aware considerations for storage decisions, (3) a complete reproducible research artifact, and (4) real-world deployment guidance for Azerbaijan-specific contexts.

Our results demonstrate that object storage (Azurite/Azure Blob) provides superior performance for large files (>1MB) and high-throughput scenarios, while database storage offers transactional consistency and simpler operational models for smaller files. The choice depends on specific application requirements, scale, and operational constraints.

Keywords: File Storage, PostgreSQL, Object Storage, Azure Blob Storage, Performance Benchmarking, System Architecture

1. Introduction

1.1 Background

The proliferation of user-generated content, document management systems, and media-rich applications has made efficient file storage a critical concern for modern web applications. As applications scale, the decision of where and how to store binary data becomes increasingly important, affecting performance, cost, maintainability, and user experience.

Traditional approaches have favored storing files directly in relational databases, leveraging transactional guarantees and simplified data models. However, as file sizes grow and storage requirements scale, alternative architectures using specialized object storage systems have gained prominence. These systems, exemplified by Amazon S3, Azure Blob Storage, and Google Cloud Storage, are designed specifically for large-scale binary data storage.

1.2 Problem Domain

ASP.NET applications, particularly those deployed in enterprise and government contexts, must balance multiple competing requirements:

Performance: Fast upload and retrieval of files
Reliability: Data durability and availability
Scalability: Handling growing storage and traffic demands
Cost: Efficient resource utilization
Operational Simplicity: Manageable deployment and maintenance
Compliance: Meeting regulatory and security requirements

The choice between database storage and object storage impacts all of these dimensions, yet clear guidance based on empirical evidence is often lacking.

1.3 Research Objectives

This research aims to:

Quantify Performance Differences: Measure upload throughput, read latency, and resource utilization for both storage strategies
Analyze Hardware Implications: Understand how storage decisions affect IO operations, write amplification, and cache behavior
Provide Reproducible Framework: Create a complete experimental setup that others can use to validate and extend findings
Offer Practical Guidance: Translate research findings into actionable recommendations for system architects

1.4 Scope and Limitations

This study focuses on:

Technology Stack: ASP.NET Core, PostgreSQL, and Azure Blob Storage (via Azurite)
File Sizes: Small (<100KB) to large (>10MB) binary files
Access Patterns: Upload, download, and backup operations
Deployment Context: Docker-based containerized environments

Limitations include:

Single geographic region (Azerbaijan context)
Emulated object storage (Azurite rather than production Azure)
Limited to specific hardware configurations in test environment
Focus on technical performance rather than cost analysis

1.5 Article Structure

The remainder of this article is organized as follows:

Section 2: Related work and literature review
Section 3: Problem statement and research questions
Section 4: Hardware storage internals (PostgreSQL pages, TOAST, WAL)
Section 5: System architecture and design
Section 6: Experimental setup and methodology
Section 7: Implementation details
Section 8: Results and analysis
Section 9: Discussion and interpretation
Section 10: Real-world context (Azerbaijan deployment)
Section 11: Ownership models and lessons learned
Section 12: Conclusion
Section 13: References

2. Related Work

2.1 Database Storage for Binary Data

The practice of storing binary data in relational databases has been extensively studied. Stonebraker et al. [1] examined the trade-offs of storing large objects (LOBs) in databases, identifying performance degradation as object sizes increase. PostgreSQL’s TOAST (The Oversized-Attribute Storage Technique) mechanism [2] addresses this by automatically moving large values to separate storage, but still maintains them within the database system.

Recent work by Pavlo et al. [3] on database storage engines has highlighted the write amplification issues inherent in B-tree structures when handling large binary data. The research demonstrates that traditional database storage can lead to significant overhead for large files due to page fragmentation and WAL (Write-Ahead Logging) overhead.

2.2 Object Storage Systems

Object storage systems have emerged as specialized solutions for large-scale binary data. The architecture of systems like Amazon S3 [4] and Azure Blob Storage [5] emphasizes horizontal scalability and eventual consistency over transactional guarantees. Research by Balakrishnan et al. [6] on distributed storage systems has shown that object storage can achieve higher throughput for large files compared to database storage.

The CAP theorem [7] implications for object storage have been explored by Vogels [8], demonstrating how object storage systems prioritize availability and partition tolerance over strong consistency, making them suitable for file storage use cases.

2.3 Hybrid Architectures

Several studies have examined hybrid approaches combining databases for metadata with object storage for binary data. The "database + blob storage" pattern has been documented in various contexts [9, 10], but quantitative performance comparisons are less common.

Work by Armbrust et al. [11] on cloud storage architectures has shown that separating metadata and data storage can improve scalability, but introduces complexity in maintaining consistency between systems.

2.4 Performance Benchmarking

Performance evaluation methodologies for storage systems have been established in the literature. The TPC benchmarks [12] provide standardized approaches, though they focus primarily on transactional workloads rather than file storage.

Recent work on microbenchmarking storage systems [13] has emphasized the importance of hardware-aware testing, considering factors such as SSD write amplification, cache behavior, and IO scheduler effects.

2.5 ASP.NET and .NET Storage Patterns

The .NET ecosystem has specific patterns and best practices for file storage. Microsoft’s documentation [14] recommends object storage for large files, but provides limited quantitative justification. Community-driven benchmarks [15] have shown mixed results, often depending on specific deployment configurations.

2.6 Research Gap

While extensive literature exists on both database storage and object storage systems individually, there is a gap in:

Direct Comparative Studies: Few studies provide head-to-head performance comparisons with identical workloads
Hardware-Aware Analysis: Limited research connecting storage decisions to underlying hardware behavior
Reproducible Artifacts: Most studies lack complete, reproducible experimental frameworks
Real-World Context: Limited consideration of operational and ownership concerns in specific deployment contexts

This research addresses these gaps by providing a comprehensive, reproducible comparison with hardware-level analysis and practical deployment guidance.

3. Problem Statement

3.1 Core Research Question

How do file storage strategies (database storage vs. object storage) compare in terms of performance, resource utilization, and operational characteristics for ASP.NET applications?

This question encompasses several sub-questions:

What are the performance differences (throughput, latency) between storing files in PostgreSQL versus Azure Blob Storage?
How do file size and access patterns affect the relative performance of each approach?
What are the hardware-level implications (IO operations, write amplification, cache behavior) of each storage strategy?
What are the operational trade-offs (backup complexity, scalability, maintenance) between approaches?
Under what conditions should each approach be preferred?

3.2 Problem Motivation

3.2.1 The Storage Decision Dilemma

System architects face a fundamental choice when designing file storage:

Option A: Database Storage

Store files directly in PostgreSQL (using BYTEA or TOAST)
Metadata and binary data in the same system
Transactional consistency guarantees
Simpler operational model

Option B: Object Storage

Store files in Azure Blob Storage (or similar)
Store only metadata in PostgreSQL
Separate systems for metadata and data
More complex operational model

Both approaches have advocates, but decision-making is often based on intuition, vendor recommendations, or limited anecdotal evidence rather than systematic evaluation.

3.2.2 Real-World Impact

The storage decision has cascading effects:

Performance: Affects user experience, especially for large file uploads/downloads
Cost: Different resource requirements and scaling characteristics
Reliability: Different failure modes and recovery mechanisms
Maintainability: Different operational procedures and skill requirements
Scalability: Different bottlenecks and scaling strategies

3.2.3 Lack of Empirical Evidence

While both approaches are widely used, there is limited published research providing:

Quantitative performance comparisons under identical conditions
Hardware-level analysis of resource utilization
Guidance on when to choose each approach
Reproducible experimental frameworks

3.3 Research Hypotheses

Based on preliminary analysis and related work, we formulate the following hypotheses:

H1: Object storage (Azurite/Azure Blob) will demonstrate higher throughput for large files (>1MB) compared to database storage.

H2: Database storage will show lower latency for small files (<100KB) due to reduced network overhead and cache locality.

H3: Database storage will exhibit higher write amplification due to WAL and page-level updates.

H4: Backup operations will be faster for object storage due to incremental and parallel capabilities.

H5: Resource utilization (CPU, memory, disk IO) will differ significantly between approaches, with database storage showing higher variability.

3.4 Success Criteria

This research will be considered successful if it:

Provides quantitative performance data comparing both approaches
Identifies clear decision criteria based on file characteristics and use cases
Delivers a reproducible experimental framework
Offers actionable guidance for system architects
Contributes to the body of knowledge on storage system design

3.5 Scope Definition

3.5.1 In Scope

Performance benchmarking (throughput, latency)
Resource utilization analysis (CPU, memory, disk IO)
Hardware-level considerations (write amplification, cache effects)
Backup and recovery operations
ASP.NET Core implementation patterns
Docker-based reproducible environment

3.5.2 Out of Scope

Cost analysis (though resource utilization data can inform cost estimates)
Multi-region deployment and replication
Security and access control mechanisms (assumed equivalent)
Production Azure Blob Storage (using Azurite for reproducibility)
Other object storage systems (S3, GCS) beyond Azure Blob
Long-term durability and archival storage

3.6 Expected Contributions

This research contributes:

Empirical Evidence: Quantitative performance data from systematic benchmarking
Hardware Awareness: Analysis connecting storage decisions to hardware behavior
Reproducible Artifact: Complete experimental framework for validation and extension
Practical Guidance: Decision criteria and recommendations for practitioners
Research Foundation: Baseline for future studies on storage architectures

4. Hardware Storage Internals

Understanding the hardware-level behavior of storage systems is crucial for making informed architectural decisions. This section examines the internals of PostgreSQL storage and object storage systems, focusing on how they interact with underlying hardware.

4.1 PostgreSQL Storage Architecture

4.1.1 Page-Based Storage

PostgreSQL stores data in fixed-size pages (typically 8KB). Each page contains multiple rows, and the database engine manages these pages through a buffer pool in memory. When storing binary data directly in tables, several mechanisms come into play:

Page Structure:

Header: Metadata about the page (checksum, free space, etc.)
Row Pointers: Array of pointers to row data
Row Data: Actual table data, including binary columns
Free Space: Unused space within the page

For binary data stored in BYTEA columns, the data is stored inline within the page if it fits. However, this can lead to:

Page Fragmentation: Large binary values can cause significant wasted space
Row Size Limits: PostgreSQL has practical limits on row size (typically ~1.6GB, but performance degrades much earlier)
Cache Inefficiency: Mixing small metadata and large binary data in the same pages reduces cache hit rates

4.1.2 TOAST (The Oversized-Attribute Storage Technique)

PostgreSQL automatically uses TOAST for values exceeding a threshold (typically 2KB). TOAST moves large values to separate storage:

TOAST Mechanism:

Inline Storage: Small values stored directly in the main table
Extended Storage: Large values moved to TOAST tables
External Storage: Very large values stored out-of-line with compression
Main Table Reference: Main table stores a pointer to TOAST data

TOAST Implications:

Additional IO: Reading large values requires additional page reads from TOAST tables
Write Amplification: Updates to large values may require rewriting both main and TOAST pages
Compression: TOAST can compress data, reducing storage but adding CPU overhead
Transaction Overhead: TOAST operations are still transactional, requiring WAL writes

4.1.3 Write-Ahead Logging (WAL)

PostgreSQL uses WAL to ensure durability and enable replication:

WAL Process:

Write Request: Application writes data
Buffer Update: Data written to shared buffer pool
WAL Write: Change logged to WAL before commit
Checkpoint: Periodic flushing of dirty pages to disk
WAL Archival: Old WAL segments archived for point-in-time recovery

WAL Impact on Binary Data:

Full Value Logging: Large binary values are fully logged to WAL (unless using UNLOGGED tables, which sacrifice durability)
Write Amplification: Each update to a large binary value writes the entire value to WAL
Replication Overhead: WAL-based replication streams all binary data changes
Backup Size: WAL archives contain complete history of binary data changes

4.1.4 Buffer Pool and Caching

PostgreSQL’s shared buffer pool caches frequently accessed pages:

Cache Behavior:

Page-Level Caching: Entire pages cached, not individual rows
LRU Eviction: Least recently used pages evicted when cache is full
Mixed Workload Impact: Binary data can evict metadata pages, reducing cache efficiency
Memory Pressure: Large binary values consume significant buffer pool space

4.2 Object Storage Architecture

4.2.1 Blob Storage Design

Azure Blob Storage (and Azurite emulation) stores objects as independent entities:

Object Storage Characteristics:

Flat Namespace: Objects identified by container and blob name
No Transactional Overhead: Each object write is independent
Append-Optimized: Designed for sequential writes
Metadata Separation: Object metadata stored separately from data

4.2.2 Write Patterns

Object storage systems are optimized for different write patterns:

Write Operations:

Put Blob: Complete object replacement (atomic)
Append Block: Adding data to existing blob (for append blobs)
Block Upload: Multipart uploads for large files
No In-Place Updates: Objects are immutable; updates create new versions

Hardware Implications:

Sequential Writes: Better aligned with SSD write patterns
Reduced Write Amplification: No page-level fragmentation or WAL overhead
Parallel Writes: Multiple objects can be written concurrently without coordination

4.2.3 Read Patterns

Object storage read operations:

Read Characteristics:

Range Reads: Can read specific byte ranges without fetching entire object
Parallel Reads: Multiple objects can be read concurrently
CDN Integration: Objects can be served via CDN for global distribution
No Cache Coherency: Simpler caching model (no transactional consistency requirements)

4.3 Hardware-Level Considerations

4.3.1 SSD Write Amplification

Modern SSDs use flash memory with specific characteristics:

SSD Behavior:

Page Size: Typically 4KB-16KB pages
Block Erasure: Must erase entire blocks (typically 256KB-2MB) before writing
Write Amplification: Actual writes exceed logical writes due to garbage collection
Wear Leveling: Distributes writes across cells to prevent premature wear

Impact on Storage Strategies:

Database Storage: Random writes to pages increase write amplification
Object Storage: Sequential writes better aligned with SSD characteristics
Large Files: Object storage’s append-optimized design reduces write amplification

4.3.2 IO Patterns

Different storage strategies produce different IO patterns:

Database Storage IO:

Random Reads/Writes: Accessing specific pages in tables
Mixed Workload: Metadata and binary data interleaved
Small IO Operations: Many small reads/writes for page management
Synchronous Writes: WAL requires synchronous writes for durability

Object Storage IO:

Sequential Writes: Appending data to objects
Large IO Operations: Reading/writing entire objects or large ranges
Asynchronous Writes: Can batch and optimize writes
Parallel IO: Multiple objects accessed concurrently

4.3.3 Cache Behavior

CPU and memory caches have different characteristics:

Database Storage Caching:

Page Cache: Operating system caches database pages
Buffer Pool: PostgreSQL’s own caching layer
Cache Pollution: Large binary values reduce effective cache size for metadata
Locality: Related data (metadata + binary) may be co-located

Object Storage Caching:

Object Cache: Can cache entire objects or ranges
CDN Cache: Objects can be cached at edge locations
Cache Efficiency: Metadata queries don’t affect object cache
Predictable Patterns: Easier to optimize caching strategies

4.4 Performance Implications

The hardware-level differences translate to performance characteristics:

For Small Files (<100KB):

Database storage may be faster due to reduced network overhead
Cache locality benefits from co-located metadata and data
Transactional overhead is minimal for small values

For Large Files (>1MB):

Object storage benefits from sequential write patterns
Reduced write amplification on SSDs
Better parallelization opportunities
Lower WAL overhead

For Mixed Workloads:

Database storage can cause cache pollution
Object storage allows independent scaling of metadata and data access

4.5 Summary

Understanding hardware internals reveals why different storage strategies perform differently:

PostgreSQL: Optimized for transactional workloads with small to medium rows; large binary data causes write amplification and cache inefficiency
Object Storage: Optimized for large, immutable objects with sequential access patterns; better aligned with modern SSD characteristics

These fundamental differences form the basis for performance expectations and guide experimental design.

5. System Architecture

This section describes the overall system architecture, including the components, their interactions, and design decisions.

5.1 Architecture Overview

The system implements two storage strategies for comparison:

Database Storage Strategy: Files stored directly in PostgreSQL
Object Storage Strategy: Files stored in Azurite (Azure Blob Storage emulation), metadata in PostgreSQL

Both strategies are implemented within the same ASP.NET Core application, allowing direct comparison under identical conditions.

5.2 System Components

The system consists of three main components:

ASP.NET Core API:

Upload Controller: Handles file upload requests
Download Controller: Handles file download requests
Metadata Controller: Manages file metadata operations
Storage Service Abstraction: Interface for storage operations
Database Storage Strategy: Implements database-based storage
Object Storage Strategy: Implements object storage-based storage

PostgreSQL Database:

Stores metadata for both strategies
Stores file data directly for database storage strategy
Provides transactional consistency

Azurite (Azure Blob Storage Emulation):

Stores binary file data for object storage strategy
Provides blob storage API
Emulates Azure Blob Storage behavior

5.3 Storage Strategies

5.3.1 Database Storage Strategy

Design:

Files stored in PostgreSQL BYTEA columns
Metadata and binary data in the same table
Single transaction for file upload
Direct database queries for file retrieval

Schema:

CREATE TABLE files_db (
id UUID PRIMARY KEY,
filename VARCHAR(255),
content_type VARCHAR(100),
file_size BIGINT,
uploaded_at TIMESTAMP,
file_data BYTEA  -- Binary data stored here
);

Advantages:

Transactional consistency
Single system to manage
ACID guarantees
Simpler backup (single database dump)

Disadvantages:

Database size grows with files
WAL overhead for large files
Cache pollution
Backup/restore complexity for large datasets

5.3.2 Object Storage Strategy

Design:

Files stored in Azurite blob containers
Metadata stored in PostgreSQL
Two-phase commit pattern (metadata + blob)
Blob storage accessed via Azure SDK

Schema:

CREATE TABLE files_blob (
id UUID PRIMARY KEY,
filename VARCHAR(255),
content_type VARCHAR(100),
file_size BIGINT,
uploaded_at TIMESTAMP,
blob_container VARCHAR(100),
blob_name VARCHAR(500)  -- Reference to blob storage
);

Advantages:

Scalable storage (independent of database)
Optimized for large files
Reduced database size
Better parallelization

Disadvantages:

Two systems to manage
Eventual consistency concerns
More complex error handling
Separate backup procedures

5.4 API Design

5.4.1 Endpoints

Upload:

POST /api/files/db/upload - Upload to database storage
POST /api/files/blob/upload - Upload to object storage

Download:

GET /api/files/db/{id} - Download from database storage
GET /api/files/blob/{id} - Download from object storage

Metadata:

GET /api/files/db/{id}/metadata - Get metadata (database storage)
GET /api/files/blob/{id}/metadata - Get metadata (object storage)
GET /api/files/db - List all files (database storage)
GET /api/files/blob - List all files (object storage)

5.4.2 Service Abstraction

The system uses a strategy pattern to abstract storage operations:

public interface IFileStorageService
{
Task<FileMetadata> UploadAsync(Stream fileStream, string filename, string contentType);
Task<Stream> DownloadAsync(Guid fileId);
Task<FileMetadata> GetMetadataAsync(Guid fileId);
Task DeleteAsync(Guid fileId);
}

This allows both storage strategies to be tested with identical API interfaces.

5.5 Data Flow

5.5.1 Upload Flow (Database Storage)

1. Client → API: POST /api/files/db/upload
2. API → Database Storage Service: UploadAsync()
3. Service → PostgreSQL: INSERT INTO files_db (..., file_data)
4. PostgreSQL: Store data in BYTEA column (TOAST if large)
5. PostgreSQL: Write to WAL
6. PostgreSQL → Service: Return file ID
7. Service → API: Return FileMetadata
8. API → Client: 201 Created with metadata

5.5.2 Upload Flow (Object Storage)

1. Client → API: POST /api/files/blob/upload
2. API → Object Storage Service: UploadAsync()
3. Service → Azurite: Upload blob to container
4. Azurite → Service: Return blob URL/name
5. Service → PostgreSQL: INSERT INTO files_blob (..., blob_name)
6. PostgreSQL: Store metadata only
7. PostgreSQL → Service: Return file ID
8. Service → API: Return FileMetadata
9. API → Client: 201 Created with metadata

5.5.3 Download Flow (Database Storage)

1. Client → API: GET /api/files/db/{id}
2. API → Database Storage Service: DownloadAsync(id)
3. Service → PostgreSQL: SELECT file_data FROM files_db WHERE id = ?
4. PostgreSQL: Read from table (and TOAST if needed)
5. PostgreSQL → Service: Return byte array
6. Service → API: Return file stream
7. API → Client: 200 OK with file content

5.5.4 Download Flow (Object Storage)

1. Client → API: GET /api/files/blob/{id}
2. API → Object Storage Service: DownloadAsync(id)
3. Service → PostgreSQL: SELECT blob_name FROM files_blob WHERE id = ?
4. PostgreSQL → Service: Return blob reference
5. Service → Azurite: Download blob by name
6. Azurite → Service: Return blob stream
7. Service → API: Return file stream
8. API → Client: 200 OK with file content

5.6 Error Handling

5.6.1 Database Storage Errors

Transaction Rollback: Failed uploads automatically roll back
Constraint Violations: Handled at database level
Storage Limits: Database size limits apply
Connection Failures: Retry logic with exponential backoff

5.6.2 Object Storage Errors

Partial Failures: Metadata saved but blob upload fails (requires cleanup)
Blob Not Found: Handle missing blob references
Container Errors: Handle container creation/access issues
Network Failures: Retry logic for blob operations

5.7 Deployment Architecture

The system is designed for Docker-based deployment:

┌─────────────────────────────────────────────────────┐
│              Docker Compose Network                 │
│                                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────┐ │
│  │   ASP.NET    │  │  PostgreSQL  │  │ Azurite  │ │
│  │     API      │  │              │  │          │ │
│  │  (Port 5000) │  │  (Port 5432) │  │(Ports    │ │
│  │              │  │              │  │ 10000-    │ │
│  │              │  │              │  │ 10002)    │ │
│  └──────────────┘  └──────────────┘  └──────────┘ │
│                                                     │
└─────────────────────────────────────────────────────┘

All services communicate over Docker’s internal network, ensuring reproducible networking conditions.

5.8 Design Decisions

5.8.1 Why Azurite Instead of Production Azure?

Reproducibility: Local emulation ensures consistent test conditions
Cost: No cloud costs for experimentation
Network Independence: Eliminates network latency as a variable
Control: Full control over storage backend for testing

5.8.2 Why Both Strategies in One Application?

Fair Comparison: Identical runtime conditions
Code Reuse: Shared infrastructure and testing code
Simplicity: Single deployment for both strategies

5.8.3 Why PostgreSQL for Metadata in Both Cases?

Consistency: Same metadata storage for fair comparison
Realistic: Most applications use databases for metadata
Isolation: Focuses comparison on binary storage, not metadata storage

5.9 Scalability Considerations

5.9.1 Database Storage Scaling

Vertical Scaling: Increase database server resources
Read Replicas: Scale read operations
Partitioning: Partition tables by file size or date
Archival: Move old files to separate storage

5.9.2 Object Storage Scaling

Horizontal Scaling: Azurite/Azure Blob scales automatically
CDN Integration: Serve files via CDN
Parallel Access: Multiple concurrent reads/writes
Tiered Storage: Use different storage tiers for cost optimization

5.10 Summary

The architecture provides a fair, reproducible framework for comparing storage strategies. By implementing both approaches in the same application with identical APIs, we ensure that performance differences reflect storage strategy choices rather than implementation variations.

6. Experimental Setup

This section describes the experimental methodology, including hardware configuration, software stack, test scenarios, and data collection procedures.

6.1 Hardware Configuration

6.1.1 Test Environment

Primary Test Machine:

CPU: System metrics available in data/raw/docker-stats-*.json
Memory: System metrics available in data/raw/docker-stats-*.json
Storage: Docker volumes on host filesystem (SSD/HDD dependent on host)
Network: Local Docker network (no external network latency)

Docker Configuration:

PostgreSQL Container: 2GB memory limit, 4 CPU cores
Azurite Container: 1GB memory limit, 2 CPU cores
API Container: 1GB memory limit, 2 CPU cores

6.1.2 Storage Characteristics

Understanding the underlying storage is critical for interpreting results:

SSD vs HDD: Different IO characteristics
Write Amplification: Measured for both strategies
Cache Behavior: Buffer pool and OS cache effects
IO Scheduler: Linux IO scheduler settings

6.2 Software Stack

6.2.1 Components

ASP.NET Core: 8.0
PostgreSQL: 16.x
Azurite: Latest stable version
Docker: Latest stable version
Docker Compose: Latest stable version
.NET SDK: 8.0

6.2.2 Configuration

PostgreSQL Settings:

shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1  # For SSD
effective_io_concurrency = 200  # For SSD

Azurite Settings:

Default configuration (emulating Azure Blob Storage)
Local file system backend
No replication or redundancy

6.3 Test Scenarios

6.3.1 File Size Distribution

Tests cover a range of file sizes to understand performance characteristics:

File Size Category	Size Range	Test Files
Small	< 100 KB	100 files
Medium	100 KB - 1 MB	50 files
Large	1 MB - 10 MB	20 files
Very Large	> 10 MB	10 files

6.3.2 Test Operations

Upload Tests:

Single file uploads (various sizes)
Batch uploads (multiple files concurrently)
Sequential uploads (one after another)
Mixed workload (small and large files)

Download Tests:

Single file downloads
Range reads (partial file downloads)
Concurrent downloads
Sequential downloads

Metadata Operations:

Metadata queries
List operations
Search/filter operations

Backup Tests:

Full database backup (database strategy)
Blob storage backup (object strategy)
Backup size comparison
Backup time comparison

6.3.3 Workload Patterns

Write-Heavy Workload:

80% uploads, 20% downloads
Simulates content creation scenario

Read-Heavy Workload:

20% uploads, 80% downloads
Simulates content delivery scenario

Balanced Workload:

50% uploads, 50% downloads
Simulates general-purpose application

6.4 Benchmark Methodology

6.4.1 Warm-up Phase

Before each test run:

System idle for 30 seconds
Pre-load 10 files of each size category
Allow caches to stabilize
Clear performance counters

6.4.2 Test Execution

For Each Test:

Preparation: Clear previous test data
Execution: Run test operation(s)
Measurement: Collect metrics during execution
Cooldown: Wait 10 seconds between tests
Repetition: Run each test 5 times for statistical significance

6.4.3 Metrics Collection

Performance Metrics:

Throughput: Files per second, MB/s
Latency: P50, P95, P99 percentiles
Response Time: End-to-end operation time
Error Rate: Failed operations percentage

Resource Metrics:

CPU Usage: Average and peak CPU utilization
Memory Usage: Peak memory consumption
Disk IO: Read/write operations per second
Network IO: Bytes transferred (for object storage)
Database Size: Total database size growth
WAL Size: Write-ahead log size

Storage Metrics:

Write Amplification: Actual writes / logical writes
Cache Hit Rate: Buffer pool and OS cache hit rates
Fragmentation: Database and file system fragmentation

6.5 Data Collection Tools

6.5.1 Application Metrics

Custom Instrumentation: ASP.NET Core metrics middleware
Structured Logging: JSON logs with timestamps
Performance Counters: .NET performance counters

6.5.2 System Metrics

Docker Stats: Container resource usage
PostgreSQL Statistics: pg_stat_* views
Linux Tools: iostat, vmstat, sar
Database Queries: Size and performance queries

6.5.3 Automated Collection

Scripts collect metrics at regular intervals:

During Tests: Metrics every 1 second
Between Tests: Summary statistics
Post-Test: Final measurements and cleanup

6.6 Statistical Analysis

6.6.1 Descriptive Statistics

For each metric:

Mean: Average value
Median: 50th percentile
Standard Deviation: Variability measure
Min/Max: Range of values
Percentiles: P50, P95, P99

6.6.2 Comparative Analysis

Between Strategies:

T-Tests: Statistical significance of differences
Effect Size: Practical significance (Cohen’s d)
Confidence Intervals: 95% confidence intervals for means

Across File Sizes:

Regression Analysis: Performance vs. file size
Breakpoint Analysis: Identify size thresholds where strategies differ

6.6.3 Visualization

Time Series: Performance over time
Box Plots: Distribution comparison
Scatter Plots: Relationship between variables
Heat Maps: Multi-dimensional analysis

6.7 Reproducibility Measures

6.7.1 Environment Isolation

Docker Containers: Isolated runtime environments
Version Pinning: Specific versions of all dependencies
Configuration Files: All settings in version control
Seed Data: Consistent test data generation

6.7.2 Deterministic Testing

Fixed Random Seeds: Reproducible test data
Timing Controls: Consistent timing between runs
Resource Limits: Fixed container resource allocations
Network Isolation: No external network dependencies

6.7.3 Documentation

Complete Setup Instructions: Step-by-step reproduction guide
All Configurations: Every setting documented
Raw Data: All collected metrics preserved
Analysis Scripts: Reproducible data analysis

6.8 Ethical Considerations

6.8.1 Data Privacy

No Real User Data: All test data is synthetic
No Personal Information: No PII in test files
Local Testing: All experiments run locally

6.8.2 Resource Usage

Contained Experiments: Limited to test environment
No External Services: No impact on production systems
Resource Monitoring: Track and limit resource consumption

6.9 Limitations

6.9.1 Test Environment Limitations

Single Machine: Not testing distributed scenarios
Local Network: No real network latency
Limited Scale: Testing up to [X] files, [Y] GB total
Synthetic Workloads: May not match all real-world patterns

6.9.2 Measurement Limitations

Sampling Frequency: 1-second intervals may miss brief spikes
Container Overhead: Docker adds some measurement overhead
Cache Effects: Results depend on cache state
Timing Precision: Limited by system clock resolution

6.10 Summary

The experimental setup is designed for:

Fair Comparison: Identical conditions for both strategies
Statistical Rigor: Multiple runs and proper analysis
Reproducibility: Complete environment in version control
Comprehensive Coverage: Multiple file sizes and workloads

This methodology ensures that results are reliable, reproducible, and meaningful for decision-making.

7. Implementation

This section provides a detailed walkthrough of the implementation, including code structure, key components, and design patterns.

7.1 Project Structure

code/
├── api/
│   ├── FileStorageApi/
│   │   ├── Controllers/
│   │   ├── Services/
│   │   ├── Models/
│   │   ├── Data/
│   │   └── Program.cs
│   └── FileStorageApi.csproj
├── database/
│   ├── schema.sql
│   ├── migrations/
│   └── init.sql
├── storage/
│   └── azurite/
│       └── docker-compose.azurite.yml
└── benchmarks/
├── BenchmarkRunner/
│   ├── Scenarios/
│   ├── Collectors/
│   └── Program.cs
└── BenchmarkRunner.csproj

7.2 API Implementation

7.2.1 Service Abstraction

The core abstraction allows switching between storage strategies:

public interface IFileStorageService
{
Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken = default);

Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken = default);

Task<FileMetadata> GetMetadataAsync(
Guid fileId,
CancellationToken cancellationToken = default);

Task DeleteAsync(
Guid fileId,
CancellationToken cancellationToken = default);

Task<IEnumerable<FileMetadata>> ListAsync(
int skip = 0,
int take = 100,
CancellationToken cancellationToken = default);
}

7.2.2 Database Storage Implementation

Service Implementation:

public class DatabaseStorageService : IFileStorageService
{
private readonly ApplicationDbContext _context;
private readonly ILogger<DatabaseStorageService> _logger;

public async Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken)
{
using var memoryStream = new MemoryStream();
await fileStream.CopyToAsync(memoryStream, cancellationToken);
var fileData = memoryStream.ToArray();

var file = new FileEntity
{
Id = Guid.NewGuid(),
Filename = filename,
ContentType = contentType,
FileSize = fileData.Length,
FileData = fileData,  // Stored in BYTEA column
UploadedAt = DateTime.UtcNow
};

_context.Files.Add(file);
await _context.SaveChangesAsync(cancellationToken);

return new FileMetadata
{
Id = file.Id,
Filename = file.Filename,
ContentType = file.ContentType,
FileSize = file.FileSize,
UploadedAt = file.UploadedAt
};
}

public async Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken)
{
var file = await _context.Files
.FindAsync(new object[] { fileId }, cancellationToken);

if (file == null)
throw new FileNotFoundException($"File {fileId} not found");

return new MemoryStream(file.FileData);
}

// ... other methods
}

Entity Model:

public class FileEntity
{
public Guid Id { get; set; }
public string Filename { get; set; }
public string ContentType { get; set; }
public long FileSize { get; set; }
public DateTime UploadedAt { get; set; }
public byte[] FileData { get; set; }  // BYTEA in PostgreSQL
}

DbContext Configuration:

public class ApplicationDbContext : DbContext
{
public DbSet<FileEntity> Files { get; set; }

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<FileEntity>(entity =>
{
entity.ToTable("files_db");
entity.HasKey(e => e.Id);
entity.Property(e => e.Filename).HasMaxLength(255);
entity.Property(e => e.ContentType).HasMaxLength(100);
entity.Property(e => e.FileData)
.HasColumnType("bytea");  // PostgreSQL BYTEA type
});
}
}

7.2.3 Object Storage Implementation

Service Implementation:

public class ObjectStorageService : IFileStorageService
{
private readonly BlobServiceClient _blobServiceClient;
private readonly ApplicationDbContext _context;
private readonly ILogger<ObjectStorageService> _logger;
private const string ContainerName = "files";

public async Task<FileMetadata> UploadAsync(
Stream fileStream,
string filename,
string contentType,
CancellationToken cancellationToken)
{
// Ensure container exists
var containerClient = _blobServiceClient.GetBlobContainerClient(ContainerName);
await containerClient.CreateIfNotExistsAsync(cancellationToken: cancellationToken);

// Generate unique blob name
var blobName = $"{Guid.NewGuid()}/{filename}";
var blobClient = containerClient.GetBlobClient(blobName);

// Upload to blob storage
await blobClient.UploadAsync(
fileStream,
new BlobUploadOptions
{
HttpHeaders = new BlobHttpHeaders
{
ContentType = contentType
}
},
cancellationToken);

// Get file size
var properties = await blobClient.GetPropertiesAsync(cancellationToken: cancellationToken);
var fileSize = properties.Value.ContentLength;

// Store metadata in database
var file = new FileBlobEntity
{
Id = Guid.NewGuid(),
Filename = filename,
ContentType = contentType,
FileSize = fileSize,
BlobContainer = ContainerName,
BlobName = blobName,
UploadedAt = DateTime.UtcNow
};

_context.FilesBlob.Add(file);
await _context.SaveChangesAsync(cancellationToken);

return new FileMetadata
{
Id = file.Id,
Filename = file.Filename,
ContentType = file.ContentType,
FileSize = file.FileSize,
UploadedAt = file.UploadedAt
};
}

public async Task<Stream> DownloadAsync(
Guid fileId,
CancellationToken cancellationToken)
{
// Get metadata from database
var file = await _context.FilesBlob
.FindAsync(new object[] { fileId }, cancellationToken);

if (file == null)
throw new FileNotFoundException($"File {fileId} not found");

// Download from blob storage
var containerClient = _blobServiceClient.GetBlobContainerClient(file.BlobContainer);
var blobClient = containerClient.GetBlobClient(file.BlobName);

var response = await blobClient.DownloadStreamingAsync(cancellationToken);
return response.Value.Content;
}

// ... other methods
}

Entity Model:

public class FileBlobEntity
{
public Guid Id { get; set; }
public string Filename { get; set; }
public string ContentType { get; set; }
public long FileSize { get; set; }
public string BlobContainer { get; set; }
public string BlobName { get; set; }
public DateTime UploadedAt { get; set; }
}

7.2.4 Controllers

Database Storage Controller:

[ApiController]
[Route("api/files/db")]
public class DatabaseStorageController : ControllerBase
{
private readonly DatabaseStorageService _storageService;
private readonly ILogger<DatabaseStorageController> _logger;

[HttpPost("upload")]
public async Task<ActionResult<FileMetadata>> Upload(
IFormFile file,
CancellationToken cancellationToken)
{
if (file == null || file.Length == 0)
return BadRequest("No file provided");

using var stream = file.OpenReadStream();
var metadata = await _storageService.UploadAsync(
stream,
file.FileName,
file.ContentType,
cancellationToken);

return CreatedAtAction(
nameof(GetMetadata),
new { id = metadata.Id },
metadata);
}

[HttpGet("{id}")]
public async Task<IActionResult> Download(
Guid id,
CancellationToken cancellationToken)
{
try
{
var metadata = await _storageService.GetMetadataAsync(id, cancellationToken);
var stream = await _storageService.DownloadAsync(id, cancellationToken);

return File(stream, metadata.ContentType, metadata.Filename);
}
catch (FileNotFoundException)
{
return NotFound();
}
}

// ... other endpoints
}

Object Storage Controller:

Similar structure but using ObjectStorageService and route /api/files/blob.

7.2.5 Dependency Injection

// Program.cs
var builder = WebApplication.CreateBuilder(args);

// Database
builder.Services.AddDbContext<ApplicationDbContext>(options =>
options.UseNpgsql(builder.Configuration.GetConnectionString("PostgreSQL")));

// Blob Storage
builder.Services.AddSingleton(serviceProvider =>
{
var connectionString = builder.Configuration.GetConnectionString("Azurite");
return new BlobServiceClient(connectionString);
});

// Storage Services
builder.Services.AddScoped<DatabaseStorageService>();
builder.Services.AddScoped<ObjectStorageService>();

// Controllers
builder.Services.AddControllers();

7.3 Database Schema

7.3.1 Database Storage Table

CREATE TABLE files_db (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
filename VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL,
file_size BIGINT NOT NULL,
uploaded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
file_data BYTEA NOT NULL
);

CREATE INDEX idx_files_db_uploaded_at ON files_db(uploaded_at);
CREATE INDEX idx_files_db_filename ON files_db(filename);

7.3.2 Object Storage Metadata Table

CREATE TABLE files_blob (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
filename VARCHAR(255) NOT NULL,
content_type VARCHAR(100) NOT NULL,
file_size BIGINT NOT NULL,
uploaded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
blob_container VARCHAR(100) NOT NULL,
blob_name VARCHAR(500) NOT NULL,
UNIQUE(blob_container, blob_name)
);

CREATE INDEX idx_files_blob_uploaded_at ON files_blob(uploaded_at);
CREATE INDEX idx_files_blob_filename ON files_blob(filename);
CREATE INDEX idx_files_blob_container_name ON files_blob(blob_container, blob_name);

7.4 Benchmark Implementation

7.4.1 Benchmark Runner

public class BenchmarkRunner
{
private readonly HttpClient _httpClient;
private readonly MetricsCollector _metricsCollector;

public async Task<BenchmarkResult> RunUploadBenchmark(
string storageType,
int fileCount,
int fileSizeBytes)
{
var results = new List<OperationResult>();

for (int i = 0; i < fileCount; i++)
{
var fileData = GenerateTestFile(fileSizeBytes);
var startTime = DateTime.UtcNow;

try
{
var response = await UploadFile(storageType, fileData);
var duration = (DateTime.UtcNow - startTime).TotalMilliseconds;

results.Add(new OperationResult
{
Success = response.IsSuccessStatusCode,
DurationMs = duration,
FileSize = fileSizeBytes
});
}
catch (Exception ex)
{
results.Add(new OperationResult
{
Success = false,
Error = ex.Message
});
}
}

return new BenchmarkResult
{
Operation = "Upload",
StorageType = storageType,
Results = results,
Summary = CalculateSummary(results)
};
}

// ... other benchmark methods
}

7.4.2 Metrics Collection

public class MetricsCollector
{
public async Task<SystemMetrics> CollectMetricsAsync()
{
var metrics = new SystemMetrics
{
Timestamp = DateTime.UtcNow,
CpuUsage = await GetCpuUsageAsync(),
MemoryUsage = await GetMemoryUsageAsyn