Site Reliability Engineer (SRE)

Home
/
Jobs
/
Site Reliability Engineer (SRE)

Posted On 14 November

Site Reliability Engineer (SRE)
- Company NTT Ltd.
- No. of Openings 10+
- Salary Not Disclosed
- Work Type on-site
Job Description :

The NTT Cloud Site Reliability Engineering team is responsible for delivering a scalable, reliable, and secure computing environment to support the millions of transactions that happen every day. We are looking to expand our Site Reliability Engineering team as we embark on a new phase of growth for our product. We are a metrics-driven organization that strives to deliver world-class service both externally and internally. The team strongly believes in the DevOps methodology and works very closely with our peers on the development team.

Working at NTT

Responsibilities
- Implement and support NTT Cloud hosted web applications, virtual machines, databases, storage systems, and service buses in cloud deployments by working with engineering organizations in support of development and test functions.
- Identify, implement, and support application monitoring solutions for supported applications.
- Troubleshoot and solve complex problems.
- Support various Wintel-based services to ensure maximum uptime, performance, and security.
- Assist in the creation and refinement of operational documentation.
- Use your expertise to support your fellow team members.
- Analyze performance trends across a variety of systems for capacity planning.
- Work closely with engineering teams to roll out new products and services.
- Handle day-to-day system administration tasks such as account management, patching, application deployment, system installations, and other routine maintenance.
- Own and enforce security compliance processes and controls.
- Programmatically automate routine cloud deployment, administration, and monitoring tasks
- Participate in 24x7 on-call pager rotation. Participate in the incident management process.
Requirements

Must:
- 3-5 years of experience in a production (Web Facing) Linux, Solaris or *BSD environments at medium to large scale.
- Deep Experience with AWS/Azure including deploying and/or migrating services to AWS/Azure. Experience with containerization with Docker, Kubernetes/EKS/AKS
- Knowledge of well-known open-source tools for monitoring, trending, and configuration management. Familiarity with Observability tools like Prometheus, Cortex, Grafana, NewRelic, DataDog, and Splunk. Experience with CI/CD tools like Jenkins/Groovy DSL
- Knowledge of key protocols including TCP/IP, SSH, DNS, SMTP, SNMP, SSL, HTTP and LDAP
- Experience with configuration management tools like Chef/Puppet/CFengine/Ansible. Basic understanding of Terraform and/or Cloud Formation Templates etc.
- Excellent verbal and written communication skills. Self-driven, eager to gets things done.
Desired/Preferred:
- Experience with different caching architectures
- Experience with Microsoft products such as MS Exchange, Azure and M365 will be an advantage
- Knowledge of security compliance frameworks, such as SOC II, PCI, HIPPA, ISO27001 and FedRAMP
- Programming skills, particularly with Python/Java/Go
- A desire to provide a reliable, secure, and scalable environment that supports millions of users.
- Experience with MySQL, Java, Apache, & Tomcat
- Ability and determination to solve complex system/application problems.
- Assist in the creation and refinement of operational documentation.
- Manage our uptime and performance using service level indicators and objectives (SLx).
Information
- HR Name :Human Resource
- HR Email :jane.doe@global.ntt
- HR Phone :+65 6659 0123

Relevant Courses

Top

Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Job Description :

Information

Relevant Courses

Popular Courses