Accurately visualizing Dutch biodiversity
ICT & Artificial Intelligence
Client company:Waarneming.nl
Carmen Engelen
Tim La Haije
Wesley Wijnen
Project description
Design an accurate data dashboard to enable people who have some domain knowledge (like students, educators, researchers) in nature observation to look into the data of waarneming.nl to gain more insight into Dutch biodiversity.
Context
Waarneming.nl is a nature platform which enables volunteers to record Dutch biodiversity through taking pictures on an app. They have asked us to give users more insight into Dutch biodiversity through an interactive data visualisation. However, because the data from waarnemingen.nl is based on volunteers who observe nature and record it there is a bias in the data that’s preventing them from accurately representing Dutch biodiversity.
The goal of this project was to give the users of waarneming.nl more insight into Dutch biodiversity through: developing a visualisation, defining and quantifying Dutch biodiversity and mitigating the bias in the data.
Results
Defined and quantified biodiversity
One of the first steps was finding out how to define and quantify biodiversity accurately. After extensive literature research and an expert interview it has been found out that biologists have not settled upon a regular agreed upon way to quantify biodiversity. In relation to the context of this project Biodiversity has been defined as the variety of flora and fauna species in the kingdom of the Netherlands.
In order to be able to use the definition of biodiversity a formula has been created. The biodiversity index(B.I.) is calculated by dividing the number of taxonomically distant species in an area by the total number of individuals in the area. This gives a score between 0 and 1 for each species which is then summed for each species to get the total score.
Mitigated the bias
As the data is created by volunteers the data contains a lot of bias. Not only are some species groups overrepresented because of the interests of the users, not all areas have an equal number of observers. This makes it hard to say something useful about biodiversity in those areas. Another point is that observations do not contain a quantity but because of observers a hidden quantity can be found in the data. Which is users in the same area making the same observation.
To find out how best to deal with the bias present in the dataset a literature study has been conducted as well as an expert interview with the Jheronimus Academy of Data Science (JADS) who have given insight on how to deal with the bias present in the data. Some of the suggestions used were weighing the growth of the observations against the growth of the user base to gain a better view of the trend. Focussing only on the consistent users in the dataset, which has been added as filters in the final visualisation. Additions from the team's research were getting rid of ‘duplicates’ in the dataset by looking for each month in each city how many of each of the same species there were and only keeping one.
Developed a data dashboard
The design of the visualisation was developed by doing research. The target audience for this visualisation are people who have some domain knowledge (students, educators, researchers). In order to understand the design choices behind good or bad dashboards we did library research, POC’s, designed possible dashboards and tested them with the audience. Informed by the earlier research we made user stories. Multiple dashboards were designed according to the user stories and tested with the target audience with a semantic test.
Methodology
Data visualisation:
Library - best good and bad practises :
Analysed the design choices behind good and bad data dashboards to understand the design choices and make a better design ourselves.
Lab -Wizard of Oz:
Showed possible dashboard designs with the target audience.
Stepping stones - Requirement list:
Wrote a requirement list to ensure the data dashboard design meets all the demands.
Workshop - Proof of Concept:
Developed multiple data visualisations with different visualisations tools.
Field - survey:
Surveyed our target audience (students, teachers, researchers) in a semantic test.
Stepping stones - Test report:
Described test results of the semantic test to ensure later project groups would understand the research behind our decisions.
Mitigating the bias:
Library - Literature research:
Gained insight into the different types of bias and how to deal with them.
Field - Exploratory Data Analysis (EDA):
gained initial insight into the dataset and investigated the bias in the data.
Library - Expert interview
Interview with JADS to gain new insight on how to deal with the bias in data.
Lab - Data analytics
Gain insight in the effects of the measures taken on the bias.
Workshop - Brainstorm
Brainstorm with researcher on how to deal with bias.
Defining and quantifying biodiversity
Library - Literature research:
Looked into different ways biodiversity is defined as there is no one definition that is agreed upon.
Library - Expert interview
Interview with a soil consultant to better understand the domain and come to the right definition of biodiversity.
Stepping stones - Prototype:
Defined and quantified Dutch biodiversity and tested this with experts and the client.
General:
Showroom - Ethical Check:
Investigated which decisions could lead to ethical dilemmas and explore the possible views of people from diverse backgrounds. With the TICT tool and in a discussion with an ethics teacher.
Showroom - Product review:
Have meetings with the client and tutors in order to show the final product and receive feedback.
About the project group
The project group all study applied data science at Fontys. The group consisted of three students: Wesley & Tim who had a business background and Carmen who had a media design background.