I am a Data Scientist and Engineer with a proven track record in applying cutting-edge technologies to solve data-driven problems including: extracting information from biomedical publications to aid drug discovery; collecting and analysing tweets in real-time to help journalists track breaking stories; and most recently, identifying misinformation in web content. With many years experience of applying machine learning (ML) and natural language processing (NLP) to solve real-world problems both in industry and academia, I am always looking for ways to apply and extend my expertise by exploring new areas.
Factmata Ltd January 2018 - present
Lead ML/NLP Engineer
My role is to research and develop methods to identify trustworthiness and quality of web content, especially news articles.
Leading a small team of NLP engineers including setting goals and mentoring. Also helping the recruitment of engineers, interns and other staff.
Leading the evaluation and improvement of several machine learning algorithms for detecting problematic content. This has lead to clearer insights into when system components performed well, allowing resources to be focussed more effectively.
Helping to establish and maintain links with various academic partners and presenting work at conferences and meet-ups.
Technologies used include: Python, Scikit-learn, GitHub, Jira.
Signal Media Ltd September 2014 - December 2017
My role was to discover, evaluate and apply the latest research in NLP and ML to analyse and classify news articles at a large scale in real-time. Working in agile teams, our aim was to develop and evaluate prototypes based on current research and turn them into product features.
Led the development of novel AI components and proofs-of-concept including a novel entity recognition system, a topic classification system and a horizon-scanning / sentiment analysis tool.
Managed a number of key 3-6 month projects with multinational clients, working closely with external stakeholders to understand the brief, communicating their requirements to engineering colleagues and demonstrating progress throughout the projects through reports and presentations.
Strengthened Signal’s links to the academic and research communities. Outreach included presenting work at WSDM and Search Solutions; giving regular guest lectures at City University.
Hosted, mentored and supervised 7 MSc Data Science and Machine Learning students during projects by providing technical guidance and advice for their data science work, which formed the basis of new product features at Signal. Two students went on to study PhDs and two joined Deepmind.
Led interactive sessions on the basics of AI/machine learning to all staff with a view to bridging the gap between technical and non-technical teams.
Analysed and indexed regulation documents collected from several jurisdictions. This work included entity and topic recognition using machine learning as well as several specialist modules to aid user’s in their search tasks. Each piece of work included either statistical data-driven evaluation or a user-centred evaluation stage.
Technologies used include: Python, Clojure, ElasticSearch, AWS, GitHub, NLTK, spaCy and Scikit-learn.
Robert Gordon University & City University London
April 2012-September 2014
Senior Research Fellow
My role was to work with journalists and developers to build tools to find and organise real-time news from Twitter.
Worked closely with journalists from City University and elsewhere to understand their needs. I developed methods and algorithms to help journalists find breaking news stories from Twitter through novel trend-detection and ‘news-hound’ discovery methods.
Co-organized the SNOW workshop data challenge, where I led the evaluation of 10 international teams’ submissions to a news-detection task.
Acted as bridge between software engineering colleagues and journalist colleagues, translating each others needs into terms the others could understand more easily.
Lead author of quarterly & annual reports to our key stakeholder (the European Commission), which included coordinating with partners from several organisations across Europe.
Technologies used include: Java, R, MongoDB, Twitter APIs.
Department of Computing, University of Surrey 2009-2012
My role was to develop innovative tools to analyse pictures of plant specimens from Kew Gardens to aid species identification and understand the effects of climate change.
Developed image processing software to extract botanical characteristics from images of herbarium leaf specimens stored at Kew Gardens.
Developed a machine learning system that could assign species labels to these images.
Created a proof-of-concept system which required me to rapidly develop skills in both botany and image processing.
Defined and collected a unique set of data to help develop and evaluate the system
Technologies used include: Matlab, Java.
University of Hertfordshire 2008-2011
Part-time visiting lecturer
My responsibilities include online supervision of undergraduate honours degree students, including marking coursework and exams.
Institute of Ophthalmology, UCL 2006-2009
My role was to improve understanding of visual perception through computer modelling and data analysis.
Investigated human and insect vision in collaboration with visual and computational neuroscientists.
Used statistical and machine learning tools such as neural networks, to produce “virtual animals” that learned to interpret simple scenes within a synthetic ecology.
Demonstrated likely evolutionary origins of optical illusions.
Queen Mary, University of London 2004-2006
Part-time distance learning tutor
Responsible for the online supervision of undergraduate students, including marking coursework and exams.
University College, London, Department of Computer Science 2001-2006
Senior Research Fellow
Worked with pharmaceutical researchers and developed tools to perform information extract from research papers. I developed software (BioRAT) designed to locate research papers on the internet and to extract useful information from them to build a database.
Helped develop a machine learning algorithm to discover novel patterns of information in unstructured text using NLP.
Worked with a major pharmaceutical company to assist their drug-development programs.
Worked with medical and pharmaceutical researchers and with information architects to understand their needs.
Fraser Williams plc 1995-1997
London software house where I was involved in designing and programming large-scale database systems. These involved long-term projects for clients drawn from both the public and private sectors.
Visited clients on-site to discuss and clarify their needs, and to provide training.
Supervised junior programmers and provided on-the-job training.
PhD Computer Science at University College London 1998-2002
My thesis title was “Intelligent Analysis of Small Data Sets for Food Design”, and concerned the development and evaluation of machine learning methods, motivated by product design work within the food industry. The aim was to model consumer preferences of food products by learning relationships from very small data sets. Areas researched include feature selection, cluster analysis, outlier detection, regression, and Bayesian belief networks.
Unilever plc sponsored this work and provided data and advice throughout. I spent 6 months at one of their research centres, which allowed me to disseminate current academic thinking within Unilever and learn more about their approaches to data analysis.
MSc Computational Intelligence (with Distinction) at Plymouth University 1997-1998
This included study of adaptive intelligent systems such as genetic algorithms and neural networks, and their application to engineering, business and financial systems. My project work investigated the use of “genetic programming” for modelling consumer laundry datasets provided by Unilever plc.
BSc (Hons.) Cognitive Science, Class 2 (ii) from Exeter University 1991-1994
This included study of artificial intelligence, neural networks, perception, cognition and linguistics, along with more general computer science and psychology modules.
Skills and Experiences
I have professional experience of several major programming languages and databases, including Python, Clojure, Matlab, Java, ElasticSearch and MongoDB, along with exposure to R, C++, VB, Prolog, SQL and PRO-IV. I have also used major libraries such as Scikit-learn, NLTK, spaCy, GATE and tools including GitHub, Jira and AWS. For much of this work, I have been a member of agile and cross-functional teams including developers, designers and end-users.
Keeping fit is an important part of my life and I enjoy running, regularly competing in 10k races. For two years, I served as the treasurer for a local tenants and residents association, helping to track expenses and plan spending on several community projects. Recently, I’ve become skilled in woodwork, making children’s toys, decorations and several small items of furniture.
A full set of my peer-reviewed publications is available online, and copies of all papers are available on request. Recent papers include:
D. Corney, D. Albakour, M. Martinez and S. Moussa (2016) “What do a Million News Articles Look Like?” in First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16; co-located with ECIR 2016), Padua, Italy. Full text
S. Schifferes, N. Newman, N. Thurman, D. Corney, A. Göker, and C. Martin, (2014) “Identifying and verifying news through social media,” Digital Journalism 2(3), pp. 406-418.
E. Byrne and D. Corney (2014) “Sweet FA: sentiment, swearing and soccer,” in ICMR2014 1st Workshop on Social Multimedia and Storytelling, Glasgow, UK, Apr. 2014. Pre-print.
Aiello, L.M., Petkos,G., Martin, C., Corney, D.P.A., Papadopoulos, S., Skraba, R., Goker, A., Kompatsiaris,Y., Jaimes A. (2013) “Sensing trending topics in Twitter”, IEEE Transactions on Multimedia. DOI: dx.doi.org/10.1109/TMM.2013.2265080