PredictiveOps Solutions is looking for a Machine Learning Ops Specialist to join our remote-first team. In this role, you will design, deploy, and maintain machine learning pipelines that drive predictive operations and intelligent automation for our clients. You will collaborate closely with DevOps engineers, AI product managers, and SRE teams to ensure ML models are integrated seamlessly into operational workflows, enabling real-time observability and automated incident response.
Responsibilities:
· Build and maintain ML pipelines for predictive monitoring and incident management.
· Collaborate with infrastructure and DevOps teams to integrate ML models into production systems.
· Monitor model performance and implement automated retraining workflows.
· Develop custom AI assistants for operations teams.
· Document workflows, best practices, and system architecture for internal use.
Required Skills:
· Python (strong experience in ML frameworks like TensorFlow, PyTorch, or scikit-learn)
· Data Engineering experience (ETL pipelines, SQL, data modeling)
· DevOps skills including Docker, Kubernetes, and CI/CD pipelines
· Cloud platforms: AWS, Google Cloud, or Azure
· Familiarity with observability tools and automated monitoring systems