Job Requirements:
We are looking for the candidates with the following:
· A minimum of 5 years’ experience in technology or a similar role
· 2+ years’ experience deploying data models in a production setting
· Experience in Java, Python, R or Scala. (Production level coding)
· Capability to architect highly scalable distributed data pipelines using open source tools and big data technologies such as Hadoop, HBase, Spark, Storm, ELK, etc.
· Experience of designing scalable solutions with proficiency in use of data structures and algorithms
· Experience in cloud-based environment with PaaS & IaaS
· Work iteratively in a team with continuous collaboration
Job Responsibilities:
· End-to-end module ownership including architecture, design, execution and management of a large and complex distributed data systems
· Monitoring performance and optimizing existing projects
· Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
· Understanding Business/Data requirements and implementing scalable solutions
Skillset:
· Experience with Big Data technologies, assisting clients in building software solutions that are distributed, highly scalable and across multiple data centers
· Hands on experience in architecting Big Data applications using Hadoop technologies such as Spark, MapReduce, YARN, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, Elasticsearch, Cassandra.
· Experience working with Business Intelligence teams, Data Integration developers, Data Scientists, Analysts and DBA’s to deliver well-architected and scalable Big Data & Analytics eco-system
· Experience working with NoSQL databases and search engines such as MongoDB, Cassandra, Elasticsearch
· Experience using Neo4J or understanding of graph databases
· Strong Experience with event stream processing technologies such as Spark streaming, Storm, Akka, Kafka
· Experience with at least one programming language (Java, Scala, Python)
· Extensive experience with at least one major Hadoop platform (Cloudera, Hortonworks, MapR)
· Proven track record of architecting Distributed Solutions dealing with real high volume of data(petabytes)
· Strong troubleshooting and performance tuning skills.
· Experience with SQL
· Deep understanding of cloud computing infrastructure and platforms.
· Good understanding of Big data design patterns
· Ability to analyze business requirement user stories and model it to domain based services
· Experience working under agile delivery methodology
You will need to have: (Core Competencies)
· HDFS, MapReduce, YARN, Pig, Hive, Sqoop, HBase, Spark, Zookeeper, Oozie, Flume
· Spark, Spark Streaming
· Storm
· SQL
· No SQL – MongoDB, Cassandra, Neo4J
· Tableau, QlickView
· Kafka, Elastic Search
· Understanding of Machine Learning Frameworks is a plus