Job Description Summary:
Under the direction of the Senior Director of Informatics, the Data Scientist (DS) will be transformational in bringing data analytics and Data as a Service. The Data Scientist possesses deep oncology expertise combined with statistical data knowledge to develop the data platform to further cancer research. The Data Scientist works cross functionally with the trusted data’ and DaaS teams to provide guidance as to the type, cleanliness and level of data granularity needs for artificial intelligence and machine learning inputs. A Data Scientist is responsible for mining cancer related data to create concepts that can be presented in the data platform.
As a Data Scientist you will be engaged in data modeling, data mining and statistical analysis to deliver meaningful information to business users, researchers, clinicians and patients. A Data Scientist assists the Informatics team in inspecting all phases of the data maturity process. The role of a Data Scientist involves the development and consultation to deliver highly dimension data intensive concepts. To thrive on the Informatics team the Data Scientist must be an active contributor in a cross-functional entrepreneurial agile scrum environment to deliver agile analytics.
Primary Key Performance Areas
KPA 1 – Data Science
- Data mining, statistical analysis and visualization using state-of-the-art methods for data enrichment
- Create automated anomaly detections systems and constant performance tracking
- Support the development and expansion of oncology data assets with expertise to onboard, process, cleanse, verify and quality assure data maturity transformations from various data sources.
- Lead efforts to update, standardize and centralize disease factors such as cancer subtype, stage, therapy, and diagnostic factors
- Select features, build, and optimize classifiers, using machine learning techniques or big data tools
- develops and manages predictive and prescriptive analytic models in support of the organization priorities
- Works in a team to drive innovation for improved quality of care, clinical outcomes, reduced costs, temporal efficiencies and process improvements.
- Stays abreast of state-of-the-art literature in the fields of operations research, statistical modeling, statistical process control and mathematical optimization
KPA 2 – Data Governance
- Conduct data profiling, data aggregation, mapping, reference data and data governance evangelization activities to educate non-data minded colleagues of domain with superb communication skills to professionally articulate the high stakes associated with getting the data right.
- Domain understanding of data modeling, data warehousing, business intelligence, ELT development and ability to read, write, maintain and tune SQL queries.
- Critical thinking and logic skills with attention to details and ability to identify root cause for data anomalies and data quality issues.
- Engages with data stewards throughout the business to collaboratively determine the data standards that will be used in our enterprise data warehouse and for data visualization purposes to optimize the business.
- Promotes governance first mindset to influence data integrity educational opportunities, process improvement and business requirements to drive tool selection and consolidation activities.
- Assist with the development, maintenance and adherence to policies, SOPs and data management plans related to data acquisition, use, security and compliance
KPA 3 – DevOps
- Collaboration with project teams to drive project plan, metrics, specifications, design, development, integration, verification, validation and launch of solutions consistent with organizational development processes.
- Endorse database change automation in CI/CD pipeline to enable reliable, rapid and effortless releases into production to provide agile analytics.
- Enjoys participation in ideation sessions with the team while also able to work independently in support of organizational initiatives to meet project deadlines while maintaining a high level of accuracy and attention to detail.
- Strong quantitative/qualitative business case analysis skills with the ability to manage multiple competing initiatives in a cross-functional environment.
Position Qualifications/Requirements Education:
- Ph.D. in a technical, scientific or quantitative field preferred
- Master’s degree in business analytics, computer science, engineering, accounting, finance, management information systems or another related field. In lieu of degree, 10+ years of experience is required or the equivalent combination for education and experience.
- 5+ years’ experience in a healthcare setting and familiarity with pathology and oncology data and medical records
- Technical competencies within data modeler, SQL to create/maintain DB objects, query/load required data using data governance and visualization tools.
- Proven understanding of how to examine data from multiple disparate sources to create relationships, patterns, associations and other factors that drive data sets and visualizations.
- Applied statistics skills, such as distributions, statistical testing, correlations, clustering, regression, and principle components etc.
- Robust scripting and programming skills to perform ETL and data cleansing operations
- Data-oriented hands on experience using Relational models, Object Models, Hierarchy data models and network data models
- Medical ontologies, lexicons and tools required
- Background in computer science, operations research, statistics or a physical engineering discipline.
- Machine Learning/Predictive Analytics and Mathematical Optimization
- Development and deployment of healthcare-relevant predictive and prescriptive models with ability to perform standard statistical analysis
- Understanding of clinical data and data quality needs in ambulatory oncology setting.
- Work closely with Clinical Data Architect, Senior Data Integration Analyst and stakeholders to build products that will create the Informatics service line.
- Ability to develop algorithmic approaches to solve clinical/business use cases, triage issues and identifying data anomalies.
- Algorithm development in Python and R.
- Robust SQL query language skills.
- Experience with big data technologies
- Experience with NoSQL databases, such as MongoDB, Cassandra, HBase
- Proficiency in MS Office Word, Excel, Power Point, and Outlook required.
- SSMS, SSRS, Power BI and (No)SQL required.
- Familiarity with NLP technologies.
- Data governance knowledge of industry best practice and leading software providers to assist in data steward implementation, maintenance and support.