Fully Remote Site Reliability Engineer with Datadog at Remote, Remote, USA |
Email: [email protected] |
From: Sonali, KPG99 [email protected] Reply to: [email protected] Job Title: Fully Remote Site Reliability Engineer (Data Dog) Visa: USC only Duration: 6+ months Location: Remote LinkedIn is must Responsibilities Duties include: Develop and maintain comprehensive monitoring solutions in Data Dog for cloud-based and on premise-based services and applications. Configure monitoring tools and systems to collect relevant metrics, logs, and traces. Create custom monitoring dashboards and reports using Data Dog, to provide real-time insights into system performance and health. Continuously monitor the infrastructure's performance and capacity, anticipating and addressing potential scalability issues and create monitors with targeted notifications Understanding on how to install the Agent in Linux and Windows and configure the YAML file to monitor the systems. Aggregate and visualize data in the Datadog application. Familiarity the Data Dog API and how to write custom monitors and alerts. Familiarity with Networking to work with Network team to setup Dashboards and alerts Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance. Work on automating tasks to streamline operational processes and reduce manual intervention. Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users. Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures. Understand the overall architecture of our systems to identify gaps in monitoring and troubleshoot issues. Configure and maintain custom dashboards and alerts in various monitoring tools. Create custom reports, deliver report presentations to various stakeholders. YAML, JSON, Python, and shell scripting Develop metrics for both the business and technical teams to determine the health of systems. Provide on-call support as needed. Leads and coordinates performance engineering for medium to large initiatives. Collect and document expected system performance and operational characteristics. Collect and/or prepare test data for test execution. Develop and execute performance tests including load, stress, endurance, fail-over and interoperability. Conduct technical analysis of performance test results and production systems, and provide recommendations on performance tuning, systems, and infrastructure. Identify, report, and review defects in assessing system performance and stability. Defining the strategy for enabling performance diagnostics and monitoring using an Application Performance Management (APM) tool, other monitoring tools, and diagnostic techniques. Collaborating with developers to promote the concept of performance engineering during all phases of the SDLC to detect and correct performance issues earlier in the lifecycle. Leads peer reviews to ensure the completeness of all test assets created. Resolve performance and stability issues in performance test environment. Develop performance engineering work plan structure and project schedule. Review architectural design for performance risks and potential issues. Prepare capacity analysis when applicable. Minimum Requirements: Requires an BA/BS degree in Information Technology, Computer Science or related field of study and a minimum of 7 years performance engineering and performance testing experience; or any combination of education and experience, which would provide an equivalent background. Preferred Skills, Capabilities and Experiences: Experience managing performance engineering efforts for an application strongly preferred. Proficiency with the following tools is preferred (Splunk, DataDog, DynaTrace among others). Experience managing performance engineering efforts for an application strongly preferred. Knowledge of developing scripts for monitoring (PowerShell, Python and Shell scripting). 5 years of Splunk programming proficiency is highly preferred. 5-6 years experience using .NET and Java application and Application Monitoring Tools like App Dynamics or Datadog are highly preferred. Proficiency is performance tuning is preferred. Good understanding of the UI, Middleware and backend Databases Thanks & Regards Sonali Kumari Technical Recruiter KPG99, INC Keywords: business analyst user interface |
[email protected] View all |
Tue Feb 06 23:12:00 UTC 2024 |