Championing Mental Health with Big Data – Part 1
According to a study by the Office for National Statistics 1 in 20 people often or always feel lonely, often starting in childhood. These feelings can lead to a range of health problems including mental ill health.
At Chanua we work on a range of health and enterprise projects focusing on early stage and preventative services and support. As part of our outcomes driven digital roadmap, we want to be able to explore the data that we collect from our programmes, products and connect that with public datasets from organisations such as; the NHS, Local Authorities and open data initiatives. We then want to feedback the analytics to our partners and customers in a visual format that is user friendly, and in line with their KPI’s.
To achieve the above we needed to technically be able to accomplish the following:
- The ability to collect vast amounts of data from different sources and formats. In particular, have the ability to convert proprietary data formats that we collect from these sources into open data formats such as Apache Parquet, ORC, Avro, JSON etc.
- The ability to query data by applying different lenses to the same data source because our stakeholders are diverse and may need to extract different insights from the same data source. This includes the ability for applying various type of Machine Learning Algorithms to the data we collect (e.g K-Means clustering).
- The ability to visualise the data in a friendly manner.
Before moving towards designing and implementing a system that enabled us to fulfil the above, our digital team led by our newly appointed CTO Bamborde ran a series of digital roadmap sessions with the goal of establishing a common digital vision amongst Chanua’s key stakeholders for the future of our programmes and products. We then created a roadmap for phased design and implementation using Agile (Scrum) methodology.
After completing the digital roadmap sessions with Chanua’s key stakeholders, it was clear that a Serverless Data Platform (read more about Serverless Computing here) is the right computing paradigm for Chanua’s long term outcomes driven data intelligence strategy. To implement this new paradigm, we chose Amazon Web Services (AWS) as the core infrastructure to run our Serverless Data Platform. Without going into deep technical details around the actual implementation, the following are some of the AWS services that are part of our serverless stack.
S3 – AWS’s secure and scalable object store service that acts as our first AWS landing zone from data that we collect from diverse sources and formats (essentially our data lake). Because technically we are following the “schema on read” model, S3 gives us great flexibility in terms of costs and scalability.
Glue – AWS’ serverless ETL (Extract Transform Load) service. It is a powerful ETL service that generates ETL job code that is customisable, reusable, and portable, using familiar technologies such as; Scala, Python, and Apache Spark.
Athena – AWS’ serverless query engine based on the popular open source distributed SQL query engine called Apache Presto. The cool thing with Athena is that it queries the data stored in S3 directly.
In future follow up posts we will go into more technical details about our journey by releasing concrete data analysis done using the serverless paradigm. In the meantime, if you are a developer, data scientist (or wanna be data scientist with some relevant background) looking for new challenges please feel free to reach out email@example.com!