A SysAdmin’s Tale: From a “Simple” Sync Error to a Full Server Health Check

This is the story of how I fixed a broken file sync. Except, it wasn’t just about the file sync. It was about what the sync error was hiding: a server at 93% disk capacity, applications silently crashing, and gigabytes of space being wasted by runaway logs. Join me as I share the exact commands and tools I used to diagnose the chaos, reclaim over 20GB of disk space by uninstalling unused apps and pruning logs, and finally bring every container back to a healthy, stable state.

It all started with a seemingly minor issue. I use Mountain Duck to mount a remote drive from my ARM Oracle Cloud server, making it easy to drag and drop files. One day, some files just wouldn’t sync. My first thought? “It must be a permissions issue.”

I was right, but I had no idea this small clue would unravel a series of deeper issues, leading me on a journey from a simple file sync problem to a full-blown server health check. This is the story of how I debugged my way back to a stable, clean, and well-monitored system.

Chapter 1: The First Clue – Permission Denied

The first step was to confirm my suspicion. The logs in Mountain Duck showed SFTP errors, and on the server, I found the problem immediately.

When I connected via SSH (ssh ubuntu@my_server_ip), a quick ls -ld /mnt/myvolume revealed the directory was owned by www-data, not my ubuntu user.

Lesson Learned: Tools like Mountain Duck act as the user you log in with. If that user doesn’t have write permissions to a directory, operations will fail.

My first instinct was to run sudo chown -R ubuntu:ubuntu /mnt/myvolume. This was a mistake.

Chapter 2: The Alarms Go Off – A Cascade of Failures

Seconds after running chown, my server monitoring tool, Netdata, lit up like a Christmas tree.

  • Critical Alert: out_of_disk_space_time
  • Warning: Docker container health failures for multiple services.

The chown -R command was recursively changing ownership on my Docker data directory, which was stored on this volume. This had two catastrophic effects:

  1. Docker Health: My running containers (like Supabase and others) suddenly lost permission to access their own files, causing them to crash.
  2. The I/O Storm: The chown command created a massive storm of disk write operations (metadata writes). Netdata saw this and predicted the disk would be full soon, triggering the “out of space” alert, even though the actual usage wasn’t changing much.

Lesson Learned: Never, ever run a recursive ownership change (chown -R) on a live Docker data directory. It will break your running services.

Chapter 3: The Recovery – Restoring Order

The plan was clear: stabilize first, then fix the permissions correctly.

  • Stop the Bleeding: I aborted the dangerous chown command (Ctrl + C).
  • Revert Ownership: I reverted everything back to its original state to get my services working again. This also took time but was necessary.
sudo chown -R www-data:www-data /mnt/myvolume
  • Restart Services: A simple reboot (sudo reboot) or restarting the Docker containers (sudo docker-compose restart) brought the crashed services back online.
  • The Correct Permission Fix: To give my ubuntu user write access without breaking the www-data services, I added my user to the group and gave the group write permissions.
# Add ubuntu user to the www-data group 
sudo usermod -aG www-data ubuntu 

# Give the group write permission 
sudo chmod -R g+w /mnt/myvolume 

This two-step process solved my original Mountain Duck sync issue safely.

Chapter 4: The Real Problem – The Disk Space Crisis

With the immediate fires put out, I was still left with a genuine Disk space usage: 93% warning from Netdata. It was time to become a digital detective and find out where my space had gone.

My Toolkit for Disk Investigation

  • The Overview (df -h): The first command to get the lay of the land. It shows the usage for all mounted volumes.
bash df -h 

This confirmed /mnt/myvolume was indeed the full one.

  • The Ultimate Analyzer (ncdu): This command-line tool is the MVP of disk analysis. It provides an interactive, size-sorted map of any directory.
# Install it first if you don't have it 
sudo apt update && sudo apt install ncdu 

# Run it on the volume 
sudo ncdu /mnt/myvolume 

With ncdu, I could navigate with arrow keys and instantly see the biggest directories. The culprit was clear: a 130 GB directory named /docker.

Drilling Down into the Docker Directory

I navigated into the /docker directory with ncdu and found the breakdown:

  • /overlay2: 80.8 GiB (The Docker images themselves)
  • /volumes: 14.5 GiB (Persistent data for my apps)
  • /containers: 234.9 MiB (Runtime files and logs)

The biggest offender was overlay2. This meant the space was being used by the application images themselves, not just their data.

Chapter 5: The Great Cleanup – Reclaiming Gigabytes

Armed with this knowledge, I executed a three-part cleanup strategy.

  1. Uninstall Unused Apps: I realized I was no longer using a full Supabase stack. Using the docker-compose down -v command in its directory, followed by sudo docker system prune -a, I instantly reclaimed nearly 20 GB of space.
  2. Hunt Down Runaway Logs: Some containers, especially chatty ones, can generate enormous log files. I used this command to find the biggest offenders:
sudo find /mnt/myvolume/docker/containers -name "*-json.log" -exec du -sh {} + | sort -rh | head -n 10 

I found several log files that were hundreds of megabytes or even gigabytes in size.

  • Immediate Fix: I emptied them without deleting them: sudo truncate -s 0 /path/to/the/giant/log.log
  • Permanent Fix: I edited the docker-compose.yml for those services to add log rotation, preventing them from ever growing uncontrollably again.
services:
  my-chatty-app:
    # ...
    logging:
      driver: "json-file"
      options:
        max-size: "20m" # Max 20MB per file
        max-file: "3"   # Keep 3 files max
  • Fixing the Last Broken Services: The earlier chaos had left a few services in a Restarting loop. A quick look at their logs with sudo docker logs revealed the classic Permission denied error.
    • I used sudo docker inspect <container_name> to find the exact host directory they used for data.
    • I applied the correct ownership (sudo chown -R 101:101 … for Redpanda, sudo chown -R 999:999 … for MariaDB).
    • A final docker-compose down && docker-compose up -d to recreate the containers with a clean slate brought them back to life.

Epilogue: Proactive Monitoring for a Peaceful Future

My server is now healthy, stable, and back to a comfortable 69% disk usage. The most important lesson? Don’t wait for things to break.

My new proactive workflow is simple:

  1. Daily Glance: Check the Netdata dashboard for disk space trends.
  2. Deep Dive: If space gets tight, use ncdu to immediately find out why.
  3. Manage Resources: Regularly review my running applications using docker images | sort -rh -k 3 and decide if they are still providing value worth the disk space they occupy.

This journey was a powerful reminder that in the world of system administration, a small symptom can often point to a much larger, hidden problem. By following the clues and using the right tools, you can not only fix the issue at hand but also make your entire system more resilient and robust for the future.

Share