The end goal of the collaboration between Cardiff and Dyfed Powys is to produce a data analytics hub.
The hub will take a suspect and identify whether they are the author/operator of one or more possible alias accounts/devices that are already known to be part of criminal activity.
The hub will do this by taking in data from the suspect's "day to day" accounts and devices such as emails, text messages and information from their social media accounts and compare it to similar data from the alias accounts/devices.
It will look at various aspects such as:
Each capability of the analytics hub ( such as different data sources, different methods of analysing) can be developed in a modular fashion.
This enabled us to focus on one aspect and by the end of the 8 weeks hand over a module that was ready to reduce the workload of the cyber crime unit.
The focus chosen was identifying if a suspect operates one or more ’target’ email accounts by comparing the writing style of the target accounts with the writing style of an account already known to be owned by the suspect.
Our approach to writing prints is based on previous work (Iqbal et al., 2010) which outlined a set of features the researchers believed could characterise people's writing styles. We also added to the list, bringing the total to over 400 total features.
Examples of the features analysed:
Whilst we took the concept of the writing print from previous work and added a few features to it. We also overhauled the way the writing prints could be analysed and compared which allowed for more consistent results across larger groups of emails clusters.
During our research we developed techniques to analyse and refine the importance of certain features. We looked at which features were producing false positives and which were key indicators of authorship match.
Obviously there are plenty more modules to built as part of the larger analytics hub however there are also expansions on this specific work to explore: