Skip to content

AI/ML Architect/Lead

  • Hybrid
    • Tuzla/Sarajevo, Federacija Bosne i Hercegovine, Bosnia and Herzegovina

Job description

Salt Square is a growing outsourcing company providing high-quality software development services to clients across a wide range of industries. Our team is composed of skilled and dedicated professionals delivering innovative solutions that meet and exceed client expectations.

Our team is seeking an AI/ML Architect/Lead to own the architecture, infrastructure, and delivery of end-to-end AI/ML ecosystem. The ideal candidate is a technically deep, strategically minded engineer who has built and scaled production-grade AI/ML systems on AWS — someone who can design for long-term success while guiding implementation and delivery. This role sits at the intersection of platform engineering, data science, and applied AI, requiring hands-on expertise across the full ML lifecycle from model development through production observability.

WHAT YOU’LL BE DOING:

Reporting directly to the VP, Data Science & Analytics, the AI/ML Architect/Lead will serve as the technical authority for client’s AI/ML platform. You will design and own the infrastructure that powers our machine learning and NLP capabilities, establish MLOps standards and practices, lead the deployment and monitoring of production AI/ML systems, and provide architectural guidance and technical mentorship to the AI/ML engineering team.

WHAT YOU’LL BE RESPONSIBLE FOR:

The following responsibilities are considered essential functions of this position and are not intended to be

an exhaustive list of all duties.

  • AI/ML Infrastructure Architecture (AWS)

  • Architect, deploy, and manage scalable AWS infrastructure for AI/ML workloads including EC2, Lambda, ECS/EKS, S3, SageMaker, and related services

  • Design and maintain VPC networking, security group configurations, and IAM roles and policies governing all AI/ML platform components in partnership with infrastructure

  • Define and enforce infrastructure-as-code standards and CI/CD practices for AI/ML platform components

  • Monitor infrastructure health, optimize compute performance, and drive cost efficiency across model training and inference workloads

  • Ensure high availability, fault tolerance, and disaster recovery posture for all production AI/ML systems

  • Manage multi-account AWS environments with IAM roles, environment-specific security boundaries (dev, test, production), and secure access patterns

Model Deployment, Orchestration & MLOps

  • Architect and own the end-to-end ML lifecycle: experimentation, training, evaluation, deployment, monitoring, and retraining pipelines

  • Establish and enforce MLOps best practices including model versioning, experiment tracking, reproducibility standards, and deployment automation

  • Design and implement model serving infrastructure for both real-time inference and batch scoring, optimizing for latency, throughput, and cost

  • Build pipeline orchestration frameworks for ML workflows using tools such as Airflow, Step Functions, or equivalent

  • Implement model monitoring and observability frameworks to detect data drift, model degradation, and production anomalies

  • Own CI/CD pipelines for model and infrastructure promotion across development, testing, and production environments

Vector Databases & Embedding Infrastructure

  • Architect and manage Qdrant (or equivalent vector database) deployments for semantic search, similarity retrieval, and RAG (Retrieval-Augmented Generation) applications

  • Design embedding pipelines that transform patient-generated text into vector representations for downstream AI/ML applications

  • Optimize vector index configurations for query performance, recall, and storage efficiency at scale

  • Integrate vector retrieval layers with LLM-based applications and NLP pipelines

Data Engineering Integration

  • Partner with the Senior Data Engineers to ensure seamless integration between the Redshift data warehouse and AI/ML feature pipelines

  • Design and build feature stores and feature engineering pipelines that source structured and unstructured data for model training

  • Establish data contracts and quality standards between data engineering and AI/ML platform layers

  • Build ELT/ETL patterns tailored to AI/ML workloads including incremental feature computation, backfill strategies, and schema evolution handling

Security, Compliance & Governance

  • Own AI/ML platform security including IAM policies, encryption at rest and in transit, network access controls, and secure model artifact storage

  • Ensure compliance with HIPAA, SOC 2, and applicable healthcare data privacy regulations as they apply to AI/ML systems and model outputs

  • Design PII anonymization and de-identification pipelines to provision safe, production-representative training data to development and test environments

  • Implement model governance standards including audit logging, lineage tracking, and acces controls over model artifacts and inference endpoints

Technical Leadership & Collaboration

  • Serve as the technical authority and escalation point for the AI/ML engineering team, providing architectural guidance and hands-on support

  • Collaborate with Data Analytics, Data Engineering, Product, and Software Engineering teams to align AI/ML platform capabilities with roadmap priorities

  • Contribute to architecture reviews, technology evaluations, and build-vs-buy decisions across the AI/ML tooling landscape

  • Build and maintain architecture documentation, runbooks, and operational playbooks for all production AI/ML systems

  • Mentor AI/ML engineers on platform best practices, code quality standards, and production engineering principles

Job requirements

Technical Skills:

  • Cloud Infrastructure: 5+ years of hands-on experience architecting and managing AWS data and AI/ML infrastructure (EC2, Lambda, ECS/EKS, SageMaker, S3, IAM, VPC, CloudWatch)

  • MLOps & ML Lifecycle: 5+ years of experience building and operating production ML systems, including training pipelines, model serving, CI/CD automation, and monitoring

  • Model Deployment: Deep experience with model serving patterns (real-time inference, batch scoring, A/B testing, canary deployments) and frameworks such as SageMaker Endpoints, TorchServe, or equivalent

  • Vector Databases: Hands-on experience with Qdrant or equivalent vector databases (Pinecone, Weaviate, pgvector), including index design, embedding pipelines, and RAG architecture

  • Python: Strong Python development skills for ML pipeline engineering, including frameworks such as PyTorch, HuggingFace Transformers, LangChain, and AWS SDK (boto3)

  • Data Engineering: Experience integrating AI/ML platforms with cloud data warehouses (Redshift preferred) and building feature pipelines using tools such as dbt, Airflow, or Spark

  • Infrastructure as Code: Experience with Terraform, CloudFormation, or equivalent IaC tools for reproducible infrastructure provisioning

    Source Control & CI/CD: Proficiency with Git including branching strategies, pull request workflows, and integration with CI/CD platforms (Jenkins, GitHub Actions, or equivalent)

Analytical & Engineering Skills:

  • Strong systems thinking with the ability to design AI/ML platforms for scalability, reliability, and long-term maintainability

  • Demonstrated experience with ML experiment tracking, model versioning, and reproducibility frameworks (MLflow, Weights & Biases, or equivalent)

  • Experience with NLP and large language model (LLM) applications, including fine-tuning, prompt engineering, and RAG patterns

  • Ability to translate business and product requirements into architectural decisions and phased implementation plans

Healthcare & Compliance:

  • Experience with healthcare data, HIPAA regulations, and patient data privacy requirements

  • Experience working with a US-based product team

Communication & Collaboration:

  • Excellent verbal and written communication skills, with the ability to translate complex technical concepts for both technical and non-technical stakeholders

  • Proven ability to work cross-functionally and drive technical decisions collaboratively

  • Experience with project management and issue tracking software such as Jira

What we offer:

  • Competitive salary and benefits package.

  • 23 days of paid leave.

  • Opportunities for professional growth and development.

  • A collaborative work environment with talented and dedicated colleagues.

We offer a supportive work environment where your ideas are heard and your contributions make a real difference. If you’re passionate about building great products, we’d be excited to have you on board.

or