Data Engineer



  • Design, build, optimize, launch and support new and existing data models and ETL processes in production
  • Interface with engineers, product managers and product analysts to understand data needs
  • Manage and verify data accuracy for Hadoop cluster
  • Responsible for support of Hadoop cluster environment including Hive, Spark, Hbase, Presto, etc.


  • Bachelor’s degree or equivalent experience in Computer Science or related field
  • 3+ years of experience in custom ETL design, implementation and maintenance on Hadoop clusters
  • 3+ years of experience with hand-on development coding
  • Understanding of Hadoop ecosystem such as HDFS, YARN, MapReduce, Zookeeper, Kafka, HBase, Spark and Hive
  • Strong SQL skills, especially in the area of data aggregation
  • Good understanding of distributed system, basic mathematics such as statistics and probability
  • Comfortable with Git version control


Other Qualifications

  • Experience building real-world data pipelines
  • Automation skills such as Airflow, Python and Bash code
  • Experience in the following is a plus: Druid, GeoMesa, or GeoWave
  • Experience with A/B testing environment
  • Experience with analytics tools like R, Matlab
  • Strong Java or Scala skills