MD Anderson Cancer Center Jobs

Job Information

MD Anderson Cancer Center Principal Data Engineer -Data Product Buildout in Houston, Texas

The Principal Data Engineer in the area of Data Engineering is a pivotal role in the Enterprise Data Engineering & Analytics Department in operationalizing critical data engineering and analytics initiatives for MD Anderson's digital business initiatives. The Principal Data Engineer manages, plans, builds and optimizes end-to-end solution delivery, data pipelines within the Context Engine, as well as partners with other Enterprise Data Engineering & Analytics teams to manage & build analytics deliverables for production use by our key data and analytics consumers.

The Principal Data Engineer also manages and coordinates compliance with data governance processes and data security requirements while creating, improving and operationalizing these integrated and reusable data pipelines. This results in enabling faster data access, integrated data reuse and vastly improved time-to-solution for MD Anderson data and analytics initiatives.

The Principal Data Engineer role requires working creatively and collaboratively with IS and Institutional leaders across the enterprise. It involves evangelizing effective data management practices and promoting better understanding of data and analytics. The Principal Data Engineer- Data Engineering partners closely with teams across MD Anderson, including Enterprise Development & Integration and Enterprise Data Science departments in the build out and delivery of data pipelines and analytics through the Context Engine Framework.

Data Engineering - End-to-End Solution Delivery:

  1. Lead/Communicate/Participate End-to-end solution delivery that increases information capabilities and realizes data value across the institution. End-to-End solutions include build out of data sources and tools across the Context Engine framework by integrating data governance processes through data ingestion, ingress, egress, curation, pipeline build, data transformation and modeling steps. Incorporating highly integrated data governance processes that consistently tracking data provenance, security, data quality and ontology as well as through to data visualization and insights.

  2. Lead/Communicate/Participate in the planning, architecture, analysis, design and build of end-to-end data pipelines & solutions in partnership with IS, Data Offices, Data Governance teams, other partners for efficient end-to-end management of MD Anderson data across the Context Engine.

  3. Lead/Communicate/Participate existing end-to-end data pipelines & solutions consisting of a series of stages through which data flows (for example, from data sources or endpoints of acquisition to integration to consumption for specific use cases)

  4. Lead and incorporate repeatable solution designs & data models, build data curation pipelines including profiling, specification creation, cleansing, transforming, standardizing, mastering, harmonizing, validating, aggregating data and monitoring data quality across our Context Engine.

  5. Lead/Communicate/Participate and incorporate data governance and metadata management processes into the data ingestion, curation and pipeline building efforts.

  6. Explore and promote innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity

Standards, Testing & System Maintenance:

  1. Manage, coordinate and adhere to standard operating procedures set by IS division as well as all MDA policies and maintain build standards (data steward / governance oversight sign off) for support of MDA Institutional data strategy including Context Engine.

  2. Manage Documentation preparation as needed for the implementation of enhancements or new technology

  3. Manage & follow documented change control processes and may perform change control audits

  4. Manage & perform quality control and testing, and review the build of other analysts to ensure that solutions are technically sound

  5. Oversee analytics system updates/new releases for assigned modules

  6. Manage and execute the adherence to regulatory requirements, quality standards and best practices for systems and processes, and collaborate with internal and external stakeholders

  7. Lead and/or participate in after-hours application support and downtime procedures

Educate and train:

  1. Lead, promote & train counterparts, such as data scientists, data analysts, users or any data consumers, in data pipelining and preparation techniques, which make it easier for them to integrate and consume the data they need for their own use cases.

  2. Lead, plan & establish training plans for various systems in the Context Engine Tools suite and develop curricula in partnership with the MDA Training team and EDEA system experts.

  3. Provide institutional, department and one-on-one training on EDEA deliverables

  4. Coach and provide advice, guidance, encouragement, constructive feedback and transfer knowledge to less experienced team members across OneIS and the institution

  5. Manage liaison relationships with customers and OneIS to provide effective technical solutions and customer service


  1. To provide innovative, quality, and sustainable IT solutions and services. Our success is driven by our people through Integrity and Trust, Partnership, and Quality.

  2. Promote trust, respect, support, and honestly with customers and each other.

  3. Commit to being a good partner focused on building productive, collaborative, and trusting relationships with our customers and each other.

  4. Model a commitment to excellence and strives to continually improve. Achieves desired outcomes, usability, and value that exceed expectations of others and our own.

Other duties as assigned

Education: Bachelor's degree.

Preferred Education: Master's Level Degree

Certification: Must obtain at least one Epic Data Model certification (Clinical, Access, or Revenue) issued by Epic within 180 days of date of entry into job.

Preferred Certification: Python, PySpark, Spark certifications

Experience Required: Seven years of relevant information technology experience. May substitute required education with years of related experience on a one to one basis. With preferred degree, five years of experience required.

Preferred Experience:

IT Healthcare experience.

It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law.

Additional Information

  • Requisition ID: 155396

  • Employment Status: Full-Time

  • Employee Status: Regular

  • Work Week: Days

  • Minimum Salary: US Dollar (USD) 113,000

  • Midpoint Salary: US Dollar (USD) 141,500

  • Maximum Salary : US Dollar (USD) 170,000

  • FLSA: exempt and not eligible for overtime pay

  • Fund Type: Hard

  • Work Location: Remote (within Texas only)

  • Pivotal Position: No

  • Referral Bonus Available?: Yes

  • Relocation Assistance Available?: Yes

  • Science Jobs: No