Home

Brahma Katam - Data Engineer
[email protected]
Location: Remote, Remote, USA
Relocation: Yes
Visa: H1B
Brahma Katam
Data Engineer
5719335956
[email protected]

Yes
H1B



Professional Summary:
Over all 10+ years of extensive IT experience in all phases of Software Development Life Cycle.
Specialized in Hadoop and Spark ecosystems with 6+ years of extensive work in data ingestion, storage, querying, processing, and analysis using MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Spark, Spark SQL, Streaming, YARN, Python, Power Bi.
Proficient in working with various Hadoop distributions like Cloudera and Hortonworks, possessing in-depth knowledge of Hadoop architecture, including YARN and key components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node, MR v1, and MR v2.
Demonstrated ability in executing all phases of Big Data projects, from scoping studies and requirements gathering to design, development, implementation, and quality assurance, delivering end-to-end IT solutions.
Skilled in writing Hive, Impala queries for generating reports in Tableau and implementing standards and guidelines for Hadoop and Hive, ensuring efficient and scalable data processing.
Strong proficiency in Spark using Scala and Python, leveraging RDD, Data Frames, and Spark SQL API for accelerated data processing and analytics.
Implemented metadata-driven processes using Atlas, Spark, and Scala, facilitating new data warehouse implementations and ensuring data classification for effective data governance.
Experienced in problem-solving, analysis, installation, and configuration skills, contributing to performance tuning, optimization, and troubleshooting of complex Big Data projects.
Excellent team player with strong communication skills, commitment, and a constant quest for learning new technologies, adept at meeting set deadlines and excelling in challenging environments
Good experience in SQL and Python.
Education:
Bachelor of Technology in Computer Science & Engineering from JNTU Kakinada.

Technical Skills:
Languages SQL, Python
Hadoop Distributions Cloudera, Hortonworks
Big Data Stack Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, HBase, Yarn, Flume, Impala, Zookeeper, Scala, Spark Core, Spark SQL, Spark Streaming, Kafka, Tableau, Nifi, Cloudera Manager,Atlas.
Web Technologies J2EE(Servlets,Jsp,JDBC), JavaScript,Python
Cloud GCP (Big Table,Big Query, Cloud Storage, Pub/Sub, Data Proc, Dataflow, Cloud Scheduler, Composer, Cloud Function), Azure Containers, Blog storage. Data Bricks.
IDEs Eclipse and My Eclipse, IntelliJ, VS Code
Tracking, Build Tools
and Version Control Jira, Ant, Maven, SVN, GIT
Databases Oracle 11g, SQL Server 2008 R2, EDB (PostgreSQL), TeraData, My SQL.
No SQL HBase
Operating Systems Windows and Linux




Professional Experience:

CGI Inc - Austin, TX Apr 2022 Till Date
Data Engineer / Senior Hadoop Developer

Worked with business Users in requirement gathering, analysis and high-level design
Worked on data ingestion from different sources like Db2, Teradata and CSV Files.
Used to schedule daily import jobs using Control M Scheduler and cronjob entries.
Created Hive External and Managed tables for each source table in Hadoop Data Lake.
Worked extensively with Hive to improve query performance.
Importing data from hive table and running SQL queries over imported data and existing RDD's Using Spark SQL and developed the sub queries in Hive.
Created Hive views to get look back data and created temporary tables to store intermediate results.
Worked on different file formats (Text File, Sequence File, RC File and Parquet) to improve the query performance and reduce the size of disk space.
Extensively Used SQLs to join the tables and build the logic.
Write Shell Scripts to run the jobs using job manager and notify the job failures logs by email alerts.
Creating RDD's for Spark Programming.
Design, develop, and maintain Hadoop applications, including MapReduce jobs, Spark applications, and
Hive queries, customized for your on-premises cluster infrastructure.
Build and maintain data pipelines to ingest data from various on-premises sources into the Hadoop cluster.
Continuously evaluate and update code and configurations to ensure optimal performance in your on-premises environment.
Implement data transformation logic tailored to your on-premises data landscape.
Created ad hoc reports using Tableau and demoed to Product Owners.
Importing data from hive table and run SQL queries over imported data and existing RDD's Using Spark SQL.
Worked extensively on ETL using Spark, Spark SQL loads in to Hive.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, Correct level of Parallelism and memory tuning.
Actively debugging the applications P1, P2 issues on the clusters.

Environment: Hadoop, Cloudera 5.x, Hive, Python, Spark 2.x, Spark SQL, PL/SQL, MySQL, Control-M, Oracle, Shell Script, Linux and Windows, Git, Jira, Jenkins, Tableau 10.3, Snowflake.

DXC Technology KP - Bengaluru, India May 2016 Mar 2022
Senior Hadoop Developer

Roles and Responsibilities:
Worked on Self-Services BI for the customers for their transactional data using Hadoop as a data lake.
Worked with business users in requirement gathering, analysis and high-level design.
Designed and developed loosely coupled metadata driven processes (DWSP, DWSE, DWMC, DWME) using Spark and Scala.
Worked on metadata driven application to store all the metadata in Atlas for classification of usage.
Utilized Spark Scala API to implement batch processing of jobs
Trouble Shooting Spark applications for improved error tolerance.
Worked on performance tuning of Spark application to improve performance.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Experience in converting SQL queries into Spark Transformations using Spark RDDs and Scala and Performed map-side joins on RDD's.
Used to schedule daily jobs using oozie as data pipeline service.
Created Hive External table for each source table in Hadoop Data Lake.
Created Hive Partitions for tables on basis of load date.
Worked on Altering Hive managed table using temp table to add, update and delete columns.
Worked on Hive LLAP to improve the performance of Hive query execution.
Created metadata in Atlas for Standard attributes (audit fields) for each table in hive.
Worked extensively with Hive to improve query performance.
Worked on Atlas to create types for classification of business tables.
Worked on GSON and jason4s API s to parse Json data from atlas types.
Build data frames to get the schema of an event using Spark and Scala.
Worked on different file formats (Text File, ORC File and Parquet) to improve the query performance and reduce the size of disk space.
Worked on SBT plugins to build the jars and deploy the application in cluster.
Worked on lucid charts to build class diagrams and data flows.
Involved in creating the Oozie workflows to run multiple jobs which run independently with time and data availability.
Worked on Mockito and Flat Spec using Scala for unit test cases.

Environment: Hortonworks HDP 3.1, Hive, Atlas, Ranger, Scala, ER Studio, LLAP, Spark, Ambari, Java, REST API, Oozie, Agile Kanban, IntelliJ, Git, Lucid chart, Mockito, Grafana, Influx DB,Nifi

Cyber Information Systems Jan 2013 - May 2016
Software Engineer
Responsibilities:
Involved in Use Case meeting to understand and analyze the requirements, Coded as per Prototype.
Developed various UI (User Interface) components using Struts (MVC), JSP, and HTML.
Developed Controllers, created JSPs and configured in Struts-config.xml, Web.xml files.
Developed MVC architecture, Business Delegate, Service Locator, Session facade, and Data Access Object and Singleton patterns
Involved in writing all client side validations using Java Script, JSON.
Involved in the complete development, testing and maintenance process of the application.
Used Hibernate 2.0 as the ORM tool to communicate with the database.
Designed and created a web-based test client using Struts up on client s request, which is used to test the different parts of the application.
Used extensive JSP, HTML, and CSS to develop presentation layer to make it more user friendly.
Involved in different Testing phases like Unit Test, Integration Test and Regression Test.
Involved in Development process and have knowledge in usage of Tracker Tools like JIRA.
Developed back-end stored procedures and triggers using Oracle PL/SQL, involved in database objects creation, performance tuning of stored procedures, and query plan
Worked closely with the client and the offsite team; coordinated activities between them for effective implementation of the project.
Developed web services using RESTful
Involved in Web services (SOAP, RESTful) Testing using Infor EAM Web Service tool kit.
Keywords: user interface business intelligence database active directory information technology procedural language Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1372
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: