Tiago Marques Godinho1 & Rui Lebre1,2 & João Rafael Almeida1,2 & Carlos Costa1
Published online: 31 January 2019 # Society for Imaging Informatics in Medicine 2019
Abstract In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.
Keywords PACS . Business Intelligence . DICOM .Data Analytics . Cloud . Big data
Nowadays, medical imaging repositories contain a wide range of valuable metadata that describes thoroughly all the stake- holders involved in medical imaging practice. Despite being mostly used for supporting medical diagnosis and treatment,
many recent initiatives claim the utility of medical imaging studies in clinical research scenarios and in the improvement of the medical institutional business practices.
The current paradigm of medical imaging repositories fits well with the definition of big data . The continuous pro- duction of huge volumes of data, its heterogeneous nature, and the increasing number of performed examinations make the analysis of medical imaging repositories very difficult for con- ventional tools. Moreover, the new trend of distributed Picture Archive and Communications Systems (PACS) architectures that makes possible to federate multiple institutions in the same PACS archive at cloud  promotes the creation of large and more useful datasets. Therefore, DA and BI techniques applied to this scenario have potential to increase the efficien- cy and quality of the medical practice.
This article proposes an ETL framework for medical imag- ing repositories that feeds, in real-time, a BI platform oriented to medical imaging practice and research. The solution can index distinct data sources and aims to provide the necessary environment for conducting research on top of live institution- al repositories. It leverages all the metadata stored in those
* Rui Lebre firstname.lastname@example.org
Tiago Marques Godinho email@example.com
João Rafael Almeida firstname.lastname@example.org
Carlos Costa email@example.com
1 University of Aveiro, DETI/IEETA, Campus Universitário de Santiago, Aveiro, Portugal
2 Department of Information and Communications Technologies, University of A Coruña, A Coruña, Spain
Journal of Digital Imaging (2019) 32:870–879 https://doi.org/10.1007/s10278-019-00184-5
repositories without requiring a data warehouse, predefined data models, or imposing rigid data flows. The developed system takes advantage of Dicoogle’s data mining features  for extracting data from production PACS and provides a series of exploratory techniques and visualization tools for a deep understanding of the working dataset and extraction of valuable information. Moreover, its design facilitates the use of analytics tools without requiring user programming skills commonly used in other platforms (e.g., Python and R). It provides an intuitive Web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Moreover, it features an extensible dashboard with customiz- able charts and reports.