JD - BIG DATA DEVELOPER WITH SPARK OR APACHE (ONSITE) ONLY LOCAL FROM TX at Apache, Oklahoma, USA |
Email: [email protected] |
Job Title: Senior Big Data Developer Location: Onsite in Dallas, TX (local to TX only) Mandatory Skills: Apache Spark: Spark Core: An open-source, distributed computing system that provides an in-memory data processing engine for big data processing, making it faster than traditional batch processing systems like MapReduce. Spark SQL, Spark Streaming, MLlib, GraphX: Additional libraries and components built on top of Spark for SQL queries, real-time data processing, machine learning, and graph processing. Apache Kafka: A distributed event streaming platform that enables the ingestion and processing of real-time data streams. Kafka is often used for building data pipelines and supporting event-driven architectures. Apache Flink: A stream processing framework for big data processing and analytics. Flink is designed for low-latency, high-throughput, and exactly-once processing of data streams. NoSQL Databases: Cassandra, MongoDB, Couchbase: NoSQL databases that are suitable for handling unstructured or semi-structured data with horizontal scalability. Data Warehousing: Amazon Redshift, Google BigQuery, Snowflake: Cloud-based data warehouses that allow for high-performance querying and analysis of large datasets. ETL (Extract, Transform, Load) Tools: Apache NiFi, Talend, Informatica: ETL tools facilitate the extraction, transformation, and loading of data from various sources into data storage or analytical systems. Containerization and Orchestration: Docker, Kubernetes: Containerization tools to package and deploy applications, and orchestration tools to manage and scale containerized applications efficiently. Batch and Stream Processing: Apache Beam: A unified model for both batch and stream processing that can run on various distributed processing backends. Storm, Samza: Stream processing frameworks for real-time data analytics. Data Lakes: Amazon S3, Azure Data Lake Storage, Google Cloud Storage: Cloud-based storage solutions that allow organizations to store large volumes of raw, unstructured data for later analysis. Workflow Management: Apache Airflow, Luigi: Workflow management systems for orchestrating complex data processing tasks and dependencies. Machine Learning Integration: MLflow: An open-source platform to manage the end-to-end machine learning lifecycle, including experimentation, reproducibility, and deployment. Other Responsibilities: You will be responsible for designing and developing software products that provide measurement data to a wide set of users across all of Amazon's advertising suite and Freewheel solutions. You will be able to demonstrate a variety of architectural approaches and design patterns and have a demonstrated competence in designing maintainable and scalable software written in a high-level language We enable advertisers to optimize ad spend and allocate budgets effectively by providing accurate, actionable and timely conversion measurement our streaming ad products. 8+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience 5+ years of non-internship professional software development experience 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience Experience programming with at least one software programming language Experience using big data technologies (Spark, EMR, Presto, etc.) Thanks & Regards Ayush Sharma IT Recruiter HMG America LLC Ph No.+1 732-790-5494 Email:[email protected] www.hmgamerica.com -- Keywords: sthree active directory information technology Texas |
[email protected] View all |
Wed Jan 03 19:41:00 UTC 2024 |