Last week, we briefly mentioned how to develop event-driven containers for taking runtime parameters. Today, we would like to talk about different options that we have for deploying our containers for data pipelines or data tasks.
Server vs Serverless
If you have a pipeline that needs to run once every week, it doesn’t make financial sense to rent a server and have that running constantly. By switching to serverless infrastructure, you only pay for the runtime, the time that you actually run the pipeline or any other tasks. Not only you end up saving 90%+ of the server costs (if it’s a simple pipeline), but also you would reduce the risk associated with securing a server constantly. You outsource the management responsibilities to a vendor like AWS, which most likely is better at handling these tasks.
By handing the deployment responsibility to a Docker container or other types of containers, you also make the pipeline more scalable as it’s faster and easier to spin up a container with a pre-built image than deploying it manually one-by-one catering to different server environment. With the combination of a container and serverless infrastructure, the scalability of your pipeline/task is significantly increased and the cost of your infrastructure is significantly reduced. As containers are automatically spun down when they finish running the tasks defined, you don’t have to spin them down like what you would for a managed server.
Example Tech Stack: Docker + AWS ECS/Fargate
AWS ECS (Elastic Container Service) offers server and serverless container services for managing and hosting your containers. With the option of adopting Fargate on top of ECS, AWS Fargate will manage your containers for you without you managing an EC2 server. Your container images are hosted on AWS ECR (Elastic Container Registry). Then you can just schedule when you want to run your tasks and define how you would like Fargate to manage your containers. In addition, this tech stack comes with additional monitoring and logging capability from AWS Cloudwatch, which facilitates the process of debugging or troubleshooting.