Research Interns - MLLM Serving Optimization

PBY Ventures a Vancouver based Venture Studio is seeking a contract MLLM Research Engineer to support the work of one of our Scholars in Residence. This will be a flexible remote contract role either full or part-time paying $35 per hour. Ideally you would be located in or around Vancouver BC but we are open to candidates anywhere in Canada.

All applications should be sent to apply@pbyventures.com

Responsibilities

Design, implement, and optimize a high-performance serving platform for MLLMs.

Integrate SOTA open-source serving frameworks such as vLLM, sglang, or lmdeploy.

Develop techniques for efficient resource utilization and low-latency inference for MLLMs in serverless environments.

Optimize memory usage, scalability, and throughput of the serving platform.

Conduct experiments to evaluate and benchmark MLLM serving performance.

Contribute novel ideas to improve serving efficiency and publish findings when applicable.

Qualifications

Bachelor’s degree or higher in Computer Science, Electrical and Computer Engineering (ECE), or a related field.

Experience with one or more SOTA LLM serving frameworks such as vLLM, sglang, or lmdeploy.

Strong proficiency in PyTorch.

Familiarity with distributed systems, serverless architectures, and cloud computing platforms.

Experience with inference optimization for large-scale AI models.

Familiarity with multimodal architectures and serving requirements.

Previous experience in deploying AI platforms on cloud services.

Job Types: Full-time, Part-time, Fixed term contract, Internship / Co-op

Page updated

Report abuse