Software Engineer, Machine Learning

Brooklyn, NY

About Pangram Labs

Pangram Labs is building a platform to authenticate human generated content online. Through our work, we aim to become deep domain experts and the industry leader in text generated by artificial intelligence. Our long term vision is to help protect the internet against an influx of inauthentic, deceptive, and low quality information produced by AI products. Our solution is built off of proprietary machine learning models trained on 10M+ documents to determine if text was generated by ChatGPT or other public LLM tools. People are using generative AI tools to produce fake or inaccurate information with low effort, allowing bad actors to exploit existing moderation controls and produce spammy and/or fraudulent content at scale. While this problem initially poses a risk to education/academia (where many of our competitors have focused), we believe this problem poses far more serious business and security risks as AI-text dilutes the trust of online information.

Job Description

We are hiring a machine learning intern to scale up our deep learning research efforts and modeling capabilities. In this role, you will build data pipelines to construct large-scale datasets to train our machine learning models, train neural networks in Pytorch, experiment with novel neural network architectures and algorithms to maximize their performance, and deploy and test your models in a real production environment. As we are an early-stage startup, you will wear many hats and contribute to other parts of the business per individual interest and ability and the changing needs of the company. Some other opportunities include product management, product design, directly interfacing with customers using our platform, A/B testing, web development, business intelligence and analytics work, and scoping new research projects and directions under the guidance of the founding team.

Responsibilities

Train core machine learning models to detect AI-generated content.
Build data pipelines to construct large-scale datasets.
Build out backend infrastructure to efficiently serve deep learning models.

Requirements

B.S. or M.S. in Computer Science or related degree
Coursework and preferably projects and/or research in Artificial Intelligence, Machine Learning, Deep Learning, Natural Language Processing, Audio Processing, Computer Vision, or other related fields
Proficiency in Python
Experience with deep learning libraries such as Pytorch, Tensorflow or Jax
Experience with basic data science tools such as Jupyter Notebooks, Pandas, NumPy, SQL, Matplotlib, working with the shell in Unix-like environments
Excitement for working for an early-stage startup!

Preferred Qualifications

Publication at a major machine learning conference such as NeurIPS, ICML, ICLR, EMNLP, CVPR, etc.
Experience training/fine-tuning open-source large language models
Experience with ChatGPT prompt engineering
Experience with web development
Familiarity with distributed computing systems, model parallel training, working in an HPC environment