HCLTech

Site Reliability Engineer

About the Employer

Job Description

HCLTech is a global technology company, home to 219,000 people across 54 countries, delivering industry-leading capabilities centered on digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of $13 billion. Position Overview We are seeking an experienced and highly skilled DevOps Engineer/Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a proven track record in implementing DevOps and SRE practices, along with extensive experience in cloud platforms, containerization, and observability tools. This role requires a deep understanding of modern infrastructure, software development methodologies, and a proactive approach to optimizing system performance and reliability. Key Responsibilities Drive the implementation and evolution of DevOps and SRE practices across multi-disciplinary teams, fostering a culture of continuous improvement and collaboration. Leverage advanced knowledge and hands-on experience with cloud platforms and Infrastructure as a Service (IaaS) offerings, preferably Amazon Web Services (AWS) or Microsoft Azure. Utilize strong and proven Java skills to develop, maintain, and optimize application performance. Apply expertise in Linux and networking fundamentals to ensure robust and secure infrastructure. Deploy and manage containerization technologies, ideally using Docker and Kubernetes, to streamline application delivery and scalability. Oversee the processes involved in release, integration, and deployment, ensuring efficient and reliable promotion pathways within these processes. Implement and maintain observability principles and practices, including monitoring, logging, tracing, and alerting systems, using tools such as Dynatrace and Datadog, to provide transparency and actionable insights into system performance and health. Address performance and optimization issues, demonstrating a capability to diagnose and resolve problems efficiently. Operate across the entire stack, including hardware, application, and network layers, to ensure comprehensive system reliability and performance. Champion agile software development methodologies and environments to enhance team productivity and project outcomes. Qualifications Significant experience in DevOps and SRE implementation. Advanced knowledge and hands-on experience with cloud platforms and IaaS offerings, preferably AWS or Microsoft Azure. Strong and proven Java skills.