Data Engineer - 100% remote at Remote, Remote, USA |
Email: [email protected] |
From: pankaj, Stellent IT [email protected] Reply to: [email protected] Data Engineer 100% remote Phone and Skype Long Term Job Description: Technical Skills: Skill Years/Level of Experience MongoDB -P4 - Expert PostgreSQL- P4 - Expert Cloud Application Architecture- P4 - Expert P1 Beginner (0-2yrs experience) P2 Intermediate (3-5yrs experience) P3 Advanced (7-10yrs experience P4 Expert (10+yrs experience) Role Description: The Data Engineer provides the ETL support to the data science and software engineering team members. Build, modify, support infrastructure for optimal extraction, transformation, and loading of data from variety of structure, unstructured data sources and multi-terabyte distributed file system. Candidate will formulate and rapidly prototype various approaches as well as effectively communicate the pros and cons of each. Provide data-driven approaches to tackle various business problems. The candidate will have the ability to contribute to a high-performing, motivated workgroup by applying interpersonal and collaboration skills to achieve project goals Architect for ML data pipeline with data acquisition and preprocessing functionalities that gather data from heterogeneous data pool from the distributed file system, unstructured text extracted from multi-million images of medical records with varied OCR quality, their metadata from relational databases and custom annotations. Responsibilities: Provide current system architecture documentation, engineering/web development programming support for program/project requirements defined tasks, data science/data engineering related technical assessments Manage/maintain structured, semi-structured, and unstructured data, structuring and wrangling data as appropriate for statistical analysis Implement data warehouse concepts and relational databases, big data management techniques and tools (e.g. Hadoop, MAPReduce) Communicate with technical and non-technical users and managers, and server administration, to include hardware and software support to existing servers. Provide software engineering support to operate, maintain and enhance systems that are integrated with and/or relied upon by the data engineering lifecycle Integrate, analyze, and visualize data and information in near real-time (within 24 hours) from multiple disparate data sources. Optimize data storage and access Proficiency with Python and Java, Oracle enterprise manager, SQL, AWS Qualifications: Masters degree in related field + 5 years experience; or PhD +1 year experience; or Bachelors degree in related field + 7 years experience Minimum of 5 years experience conducting ETL tasks, performance engineering, run-time optimization, large data volume transfers Minimum 3 years experience with Regular Expressions, SQL (PostgreSQL), No-SQL (MongoDB) Minimum 1 year experience with Version control systems (Git) Preference to developer with experience working with healthcare data and Health IT Skills/Tools Utilized (at least 1-2 years exp in some of the following): Apache Hadoop (Cloudera) AWS Data Platforms (Redshift, S3, EMR/Hive) SQL Java Kafka Scala Kotlin Neo4j NiFi Flink Sqoop PostgreSQL EMR Apache Spark Python PHP Oracle Splunk BDD testing framework: Cucumber Knowledge of and experience using various NLP approaches, particularly: Pattern recognition/feature extraction Supervised, Unsupervised, and Semi-Supervised learning techniques Understanding of various language models (N-Gram, Skipgram, NLM, etc.) Chunking/Tokenization Semantic parsing Skills highly desired: Healthcare IT experience Statistical model building (particularly classification) Education Level Masters Option to Hire Yes No Terms as indicated in the suppliers Federal Contractor Exchange MSA. Work Location On-site (Government / AFS Site): Remote On-site %: 0% Keywords: machine learning sthree information technology |
[email protected] View all |
Thu Sep 07 20:20:00 UTC 2023 |