Pathlib is a module in Python’s standard library that provides an object-oriented interface for working with file system paths. It offers a more intuitive and readable way to handle file and directory operations compared to traditional methods using the
os and os.path modules.Get total size of a directory#
Take a folder containing lots of nested subfolders also containing too many files:
from pathlib import Path
from time import time
def get_directory_size(path: Path) -> float:
return sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
start_time = time()
size = get_directory_size(Path("my big messy fatty folder"))
print(f"size : {round(size / 1024**3, 2)} GB\ntook : {round(time() - start_time, 2)} seconds")
size : 55.09 GB
took : 33.11 seconds
Faster way to get total size of a directory#
Take the same folder, add some concurrency:
import concurrent.futures
from pathlib import Path
from time import time
def calculate_size(path: Path) -> float:
return sum(f.stat().st_size for f in path.rglob("*") if f.is_file())
def get_directory_size(path: Path) -> float:
subpaths = [p for p in path.glob("*/*") if p.is_dir()]
with concurrent.futures.ProcessPoolExecutor() as executor:
sizes = executor.map(calculate_size, subpaths)
return sum(sizes)
start_time = time()
size = get_directory_size(Path("my big messy fatty folder"))
print(f"size : {round(size / 1024**3, 2)} GB\ntook : {round(time() - start_time, 2)} seconds")
size : 55.09 GB
took : 8.27 seconds
Performance Analysis#
The concurrent approach provides a 4x speed improvement for large directories with multiple subdirectories. Here’s why:
- Sequential: Processes files one by one
- Concurrent: Utilizes multiple CPU cores for parallel processing
- Best for: Large directories with many subdirectories
Best Practices#
✅ Do’s#
- Use
pathlib.Pathfor modern, readable path operations - Implement error handling for permission issues
- Consider memory usage for very large directories
- Use
ProcessPoolExecutorfor CPU-intensive tasks
❌ Don’ts#
- Don’t use string concatenation for paths
- Avoid blocking the main thread for large operations
- Don’t ignore file access permissions
Additional File Operations#
Check if Path Exists#
from pathlib import Path
file_path = Path("example.txt")
if file_path.exists():
print("File exists!")
Create Directories#
# Create directory with parents
Path("path/to/new/directory").mkdir(parents=True, exist_ok=True)
List Files with Filter#
# Get all Python files
python_files = list(Path(".").glob("**/*.py"))
print(f"Found {len(python_files)} Python files")
Read and Write Files#
# Write to a file
with open("example.txt", "w") as f:
f.write("Hello, World!")
# Read from a file
with open("example.txt", "r") as f:
content = f.read()
print(content)
