shashank - data engineer |
srikanth.sidda@syntechitsolutions.com |
Location: Dallas, Texas, USA |
Relocation: yes |
Visa: H1B |
NAME: Shashank Reddy
Email: shashankr259@gmail.com PH: (813)563-3789 Sr. Data Engineer LinkedIn ID: linkedin.com/in/shashank-r-523213258 PROFESSIONAL SUMMARY Over 8 years of experience in the IT industry, focusing on Big Data and the Hadoop ecosystem in retail and banking. Proficient in designing and developing applications using various Big Data technologies such as HDFS, MapReduce, Sqoop, Hive, PySpark, Spark SQL, Hbase, Python, Snowflake, S3 storage, and Airflow. Expertise in performance tuning for MapReduce jobs and optimizing complex Hive queries. 4+ years of experience in ETL tools, including Talend Data Integration, Informatica Power Center, and SSIS, for data warehousing, business intelligence, analytics, and data migration. Utilized Microsoft Azure Cloud, including Data Storage (Azure Data Lake, Azure Blob Storage, Azure Cosmos DB) and Data Processing engines (Azure Data Lake Analytics, Azure HDInsight, Analytics). Utilized Amazon Web Services (AWS) utilities such as EMR, S3, and CloudWatch to run and monitor Hadoop/Spark jobs on AWS. Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation, and aggregation techniques, Knowledge of tools such as DBT, DataStage Extensively used Python libraries, including PySpark, Pandas, PyArrow, NumPy, Scikit-Learn, and Boto3. Written PySpark jobs in AWS Glue, utilized Crawler to populate AWS Glue Data Catalog and ran ETL jobs with aggregation using PySpark code. Proficient in setting up databases in AWS using RDS, configuring storage using S3 bucket, and implementing instance backups. In-depth knowledge of data warehousing, data mining concepts, and ETL transformations, with a focus on intelligent data management solutions. Expertise in DevOps, Release Engineering, Configuration Management, Cloud Infrastructure, and Automation using AWS, Apache Maven, Jenkins, and GitHub. Worked on Big Data Integration and Analytics based on Hadoop and Kafka, with a strong understanding of Hadoop Distributed File System data modeling and architecture. Strong team player with excellent communication, project management, documentation, and interpersonal skills. Collaborated with cross-functional teams to design scalable data architectures aligned with organizational goals. Ability to work coherently in both AWS and Azure Cloud environments. TECHNICAL SKILLS: Big Data Ecosystem HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala Languages Shell scripting, SQL, PL/SQL, Python, R, PySpark, Pig, Hive QL, Scala, Web Technologies HTML, JavaScript, CSS Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. Version Control GITHUB, GITLAB, BITBUCKET, SVN IDE & Tools, Design Eclipse, Visual Studio, PyCharm, CI/CD, SQL Developer, MySQL, SQL Developer, Workbench, Tableau, VS code, IntelliJ Databases Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQL Database (HBase, MongoDB), Neo4J. Operating Systems Windows 7,10, Mac OS, Unix, Linux Cloud Technologies MS Azure, Amazon Web Services (AWS) Data Engineer/Big Data Tools/Cloud/ Visualization / Other Tools Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, MLlib, Oozie, Zookeeper, Informatica, etc. AWS, EC2, EMR, S3, Glue, Redshift, Azure Databricks, Azure Data Explorer, Azure HDInsight, KeyVault, Salesforce, Linux, Bash Shell, Unix, etc., Snaplogic, Tableau, Power BI, SAS, Dashboard Design. PROFESSIONAL EXPERIENCE: CIGNA Healthcare, Connecticut May 2024 - Present Data Engineer Responsibilities: Developed Spark scripts in Python based on requirements for processing large datasets on AWS EMR. Created Spark applications using Python APIs to load and transform data into Databricks. Handled huge datasets with Spark's in-memory capabilities, partitioning strategies, broadcast variables, effective joins, transformations, and other optimization techniques to streamline data ingestion. Utilized Python and Scala scripts for Spark transformations to process data efficiently in AWS EMR. Extracted, transformed, and loaded data from diverse source systems into Databricks using AWS Glue, Spark SQL, and Snowflake connectors. Set up and configured AWS Glue jobs to manage ETL processes, including reading data from S3 and writing data into Snowflake. Leveraged AWS S3 for staging and intermediate storage, enabling scalable and efficient data transfer and processing. Developed complex ETL pipelines using PySpark and Databricks for data transformation and ingestion workflows. Utilized Spark-SQL for efficient data processing, query optimization, and converting SQL logic into Spark-based solutions. Worked on query optimization and performance tuning in Databricks, leveraging features such as data sharing, time travel, and clustering for advanced analytics. Created and managed Databricks tables and data ingestion pipelines, ensuring data integrity and scalability. Interacted with client teams to gather deployment requirements and ensured solutions were aligned with business objectives. Collaborated with data scientists and analysts to deliver curated datasets and insights, enabling data-driven decision-making. Proficient in SQL, Python, and scripting languages like PowerShell to automate and streamline processes. Environment: Databricks, AWS Lambda, DynamoDB, Redshift, AWS S3, AWS EMR, AWS Glue, DBT, Python, PostgreSQL, Pandas, NumPy, Spark2.4, Airflow, Databricks. Keywords: continuous integration continuous deployment business intelligence sthree database rlang information technology microsoft procedural language Idaho |