Regenstrief Healthcare Analytics Core


Overview of Services

The Regenstrief Healthcare Analytics Core (RHAC) is a HIPAA - aligned data housing, management, and analytics core with state-of-the-art computing infrastructure. Our services include de-identifying data, workflow assistance, patient cohort phenotyping, machine learning, randomized control trial design, and causal inference from observational and experimental data.                                                                                                                                                                                                                                                                             

We currently support storage and computation with advanced high perfromance computing capabilities. The system currently has 300TB of distributed storage capacity with Hadoop engine and programming language interface. All our current software are open source tehcnoloiges.

We currently have the following data sets for research purpose:                                                                                                                                                                                                                                                                                                                                                                                                   

  • Cerner Health Facts - The data set has the EHR of 69M patients spanning 20 years and 750 facilities in the US. The size of the data set is ~6000 Gb. This data set has significant potential for Purdue researchers at the intersection of life science and data science. This longitudinal data set gives Purdue a unique edge over other engineering focused universities. Only a handful of medical schools (e.g., University of Texas Health and Southern California CTSI ) have access to this data providing Purdue a competitive advantage to generate unique insights.                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • MIMIC- Medical Information Mart for Intensive Care (MIMIC) is an openly available dataset developed from Beth Israel Deaconess Hospital through the MIT Laboratory for Computational Physiology, comprising de-identified health data associated with >40,000 critical care patients . It includes demographics, vital signs, laboratory tests, and medications. MIMIC is one of the most granular publicly available data sets, collected at minute scale over 10 years. RCHE has a strong collaboration with the MIT LCP Lab which manages the data. Currently, approximately 50 researchers are using the MIMIC database in our cluster                                                                                                                                                                                                                                                                                                   
  • Purdue Claims Data - A fully de-identified dataset of medical claims for all Purdue staff and their families, including prescription claims, eligibility, biometrics, and Johns Hopkins risk assessment information. The data is from 2014 to present. This data set is available to researchers pending IRB approval. See HIPAA for more details on RCHE's data access policy and procedures.                                                                                                                                                                                                                                                                                                   
  • Indiana Family and Social Services Medicaid Data – This data set is comprised of de-identified data of all Medicaid enrollees in Indiana focusing on healthcare improvements for long-term care (LTC) and substance use disorder (SUD) patients. It includes demographics, provider information, diagnoses, procedures, and medications from the Indiana Medicaid Enterprise Data Warehouse (EDW). This encompasses over 5 years of data and will be updated every 6 months.      


Mohammad Adibuzzaman|Lab Manager

Maralee Hayworth|Operations Manager

Additional Core Staff:

Kit Klutzke|Programmer Analyst

Poching Delaurentis|Research Scientist

Ping Huang|Research Scientist

Location and hours of operation

 Hours      Location

Monday - Friday
8:00 am - 5:00 pm

      Gerald D. and Edna E. Mann Hall, Suite 225 203 S. Martin Jischke Drive,  West Lafayette, IN 47907

Links and Resources


