This project encapsulates the BLIP neural network for image captioning inside a Docker container and exposes it via a simple API. Images sent to the endpoint are automatically queued, run through the captioning model, and return the generated captions. A worker pool allows parallel caption processing for multiple requests simultaneously. The number of workers can be configured to find the optimal concurrency. New caption jobs get load balanced across available workers via a Redis queue. The containerized architecture lends itself to easy horizontal scaling if higher throughput is needed. Just launch more container instances behind a load balancer to grow capacity. Goals of this project included learning productionization practices around AI models like leveraging queues, understanding efficient concurrency patterns, and utilizing containers/APIs for a scalable access paradigm. Having an on-demand captioning service ready to deploy helps make the latest deep learning research rapidly usable in downstream applications.