The Goal, Inc. is actively seeking a Big Data Engineer to join our team in Washington, DC.
- Provide Architecture, develop, implement, and test data processing pipelines, and data mining/data science algorithms on a variety of hosted settings
- Assist customers with translating comples business analytics requirements into technical solutions and recommendations across diverse environments
- Experience defining and implementing data ingestion and transformation methodologies, including between classified and unclassified sources
- Communicate results and educate others through design and development of insightful visualizations, reports, and presentations
- Participate in the design, implementation and support of Big Data, Analytics and Cloud solutions through participation in all stages of development lifecycle
- Conduct regular peer code reviews to ensure code quality and compliance following best practices in the industry
- Design, implement and optimize leading Big Data frameworks (Hadoop, Spark, SAP HANA) across hybrid hosting platforms (AWS, Azure, on-prem)
- Review security requirements, analyze processes, and define security strategy to implement compliance and controls in line with organizational standards and industry best practices
- Develop accreditation and security documentation, including Systems Security Plans (SSP) and Authorization to Operate (ATO) packages.
- Provide thought leadership and innovation to provide recommendations on emerging technologies or optimization/efficiencies across architecture, implementation, hosting, etc.
- Lead the planning, development, and execution of data onboarding processing capabilities and services for diverse customers and data sets?
- 10+ years of progressive experience in architecting, developing, and operating modular, efficient and scalable big data and analytics solutions
- Fluency and demonstrated expertise across multiple programming languages, such as Python, Java, and C++, and the ability to pick up new languages and technologies quickly;
- At least 5 years of experience with distributed computing frameworks, specifically Hadoop 2.0+ (YARN) and associated tools including Avro, Flume, Oozie, Sqoop, Zookeeper, etc.
- Hands-on experience with Apache Hive, Apache Spark and its components (Streaming, SQL, MLLib)
- Hands-on experience with data warehousing and business intelligence software including Cloudera and Pentaho
- Experience developing data visualizations leveraging Tableau