What you’ll learn
In this guide, you will learn how to:- Understand why containers lose data when they stop.
- Create Docker volumes for persistent storage.
- Mount volumes to containers at runtime.
- Read and write data to volumes.
- Access volume data across multiple containers.
- Apply these concepts to Runpod’s storage solutions.
Requirements
Before starting, you should have:- Completed the Dockerfile creation guide.
- Docker Desktop installed and running.
- Basic familiarity with Docker commands.
Why persist data outside containers?
Containers are designed to be immutable and ephemeral. When a container stops or is removed, everything inside it—including files, data, and state—is deleted. This design makes containers portable and reproducible, but it creates a challenge when you need to preserve data. Consider these scenarios where persistence matters: Machine learning training: You train a model over hours or days. If the container stops, you lose all training progress, checkpoints, and the final model unless you save them outside the container. Data processing pipelines: You process large datasets and generate results. Without persistent storage, you’d need to reprocess everything if the container restarts. Application state: Databases, logs, user uploads, and configuration changes need to survive container restarts. Development workflows: You want to edit code on your host machine and have changes immediately available inside the container without rebuilding the image. Docker volumes provide the solution by storing data outside the container on the host system. When a container stops, the volume data remains intact and can be mounted to new containers.Step 1: Create a named volume
Start by creating a Docker volume that will store your persistent data:my-data managed by Docker. The volume exists independently of any container and persists until you explicitly delete it.
You can verify the volume was created:
my-data in the list of volumes.
Understanding volume storage
Docker stores volumes in a Docker-managed location on your host system (typically/var/lib/docker/volumes/ on Linux). You don’t need to worry about the exact location—Docker handles the storage details. The key point is that this storage exists outside any container’s filesystem.
Step 2: Create your project files
For this example, you’ll modify the Dockerfile from the previous guide to write data to a volume instead of just printing output. Create a new directory and navigate to it:Dockerfile:
- Uses
busyboxas the base image. - Sets
/dataas the working directory (where our script will write files). - Copies and makes the entrypoint script executable.
- Configures the script to run when containers start.
entrypoint.sh script:
- Generates a timestamp.
- Appends it to
/data/timestamps.txt(using>>to append, not overwrite). - Prints confirmation and shows all timestamps.
Step 3: Build the image
Build a Docker image from your Dockerfile:timestamp-logger that you can use to demonstrate persistent storage.
Step 4: Run a container with a mounted volume
Now run a container and mount your volume to the/data directory:
docker run: Creates and starts a new container.-v my-data:/data: Mounts themy-datavolume to/datainside the container.timestamp-logger: The image to use.
-v flag creates a mount point. Files written to /data inside the container are actually written to the my-data volume on the host. This means the data persists even after the container exits.
You should see output showing the timestamp was written and displaying the contents of the file.
Step 5: Verify data persistence
Run the container again several times to see data persist across container instances:Step 6: Access volume data from another container
You can access the persisted data from any container that mounts the volume, even using a completely different image:- Runs a new
busyboxcontainer (different from our custom image). - Mounts the same
my-datavolume to/data. - Runs
catto display the file contents. - Removes the container after it exits (
--rmflag).
Step 7: Inspect the volume
You can get detailed information about a volume:Understanding volume mount syntax
When using volumes, you specify mounts with the-v or --mount flag. The basic syntax is:
my-data) are managed by Docker and recommended for most use cases. Bind mounts map specific host directories and are useful for development when you want live code reloading.
Volume mount options
You can specify additional mount options::ro suffix makes the mount read-only inside the container, preventing accidental data modification.
Applying volumes to real-world scenarios
Machine learning training
For ML training workflows, mount a volume to store:- Training checkpoints: Save model state at intervals so you can resume if interrupted.
- Final models: Persist trained models for deployment.
- Training logs: Keep TensorBoard logs or custom metrics.
- Datasets: Store large datasets that don’t change often.
Data processing pipelines
For data processing, use volumes to:- Store input data: Mount datasets that multiple containers process.
- Save results: Write processed data to a volume for downstream tasks.
- Cache intermediates: Store intermediate processing results to avoid recomputation.
Development workflows
During development, mount your source code as a volume for live reloading:src directory immediately reflect inside the container without rebuilding the image.
Volumes and Runpod
Runpod provides volume-like functionality through network volumes, which work similarly to Docker volumes but with cloud-native features: For Serverless: Network volumes allow your workers to access shared data like models or datasets. Multiple workers can read from the same volume, avoiding the need to include large files in your container image. See Serverless storage for details. For Pods: You can attach network volumes to Pods to persist data across Pod restarts or share data between Pods. This is essential for training workflows where you need to preserve checkpoints and models. See Pod storage types for more information. Network volumes provide persistent storage that survives beyond individual containers, similar to the Docker volumes you’ve used in this guide, but optimized for cloud deployment.Cleaning up volumes
Volumes persist until you explicitly remove them. To clean up:docker volume prune—it removes all volumes not currently in use by containers, potentially deleting important data.
Troubleshooting
Volume is empty after mounting: If you mount a volume to a directory that exists in the image, the volume contents will appear instead of the image’s original directory contents. The image directory contents aren’t copied to the volume automatically. Permission errors: If you get permission errors when writing to a volume, it might be due to user ID mismatches. The container process runs as a specific user, and the volume permissions must allow that user to write. You may need to change permissions or run the container as a different user. Volume doesn’t persist after reboot: Docker volumes persist across Docker restarts and system reboots. If you’re losing data, verify you’re using named volumes (not anonymous volumes) and not removing them accidentally. Can’t remove volume: If you can’t remove a volume, a container might be using it even if stopped. List all containers withdocker ps -a, remove containers using the volume, then try removing the volume again.
Learning more
For deeper coverage of Docker storage concepts, see Docker’s official documentation:Next steps
You now understand how to persist data with Docker volumes, a critical skill for production deployments. Continue your learning:- Review the Docker commands reference for volume management commands.
- Explore Runpod network volumes for cloud-native persistent storage.
- Learn about Serverless storage options for your workers.
- Understand Pod storage types for long-running workloads.