David Corney, PhD
I am a Data Scientist and Engineer with a proven track record in applying ahead of the curve technologies to solve data-driven problems including: extracting information from biomedical publications to aid drug discovery; collecting and analysing tweets in real-time to help journalists track breaking stories; and analysing large volumes of news articles for a media monitoring tool. With many years experience of applying machine learning and natural language processing to solve real-world problems both in industry and academia, I am always interested in new opportunities in organisations that will allow me to continue to apply and extend my expertise by exploring new areas.
Full Fact February 2019 - present
Lead NLP Engineer
I lead the application of machine learning and NLP techniques to help fact checkers work more effectively. This includes developed a “claim type” classifier, that models whether a sentence contains a prediction, or a personal believe, or a quantitative claim. The latter are often easier to check, so this can help fact checkers to select claims to check more quickly. I’ve also worked on a fully-automated “robochecking” tool, though this remains a working prototype.
Dunnhumby April 2018 - February 2019
Senior Data Scientist
I helped develop a fine-grained sales prediction model, predicting sales levels of individual items at a daily level for individual stores. My work included developing high performance Python code, capable of making millions of forecasts quickly.
Factmata January 2018 - April 2018
Lead Machine Learning/NLP Engineer
I developed a prototype “fake news” detector as well as running several pilot studies with adtech companies. I was also involved with supervising more junior staff, recruitment, external collaborations and co-organizing the Hyperpartisan News Detection Challenge, part of Semeval19.
Signal Media Ltd September 2014 - December 2017
My role was to discover, evaluate and apply the latest research in natural language processing (NLP) and machine learning to analyse and classify news articles at a large scale in real-time. Working in agile teams, our aim is to develop prototypes based on current research with production-quality code and turn them into products.
Leading the development of novel AI components and proofs-of-concept including a novel entity recognition system, a topic classification system and a horizon-scanning / sentiment analysis tool.
Managed a number of key 3-6 month projects with multinational clients, working closely with external stakeholders to understand the brief, communicating their requirements to engineering colleagues and demonstrating progress throughout the projects through reports and presentations.
Maintaining and strengthening Signal’s links to the academic and research communities. Outreach includes presenting work at WSDM and Search Solutions; giving regular guest lectures at City University.
Hosted, mentored and supervised 7 MSc Data Science and Machine Learning students during projects by providing technical guidance and advice for their data science work, which formed the basis of new product features at signal. Two students went on to study PhDs and two joined Deepmind.
Led interactive sessions on the basics of AI/machine learning to all staff with a view to bridging the gap between technical and non-technical teams.
Analysed and indexed regulation documents collected from several jurisdictions. This work included entity and topic recognition using machine learning as well as several specialist modules to aid user’s in their search tasks. Each piece of work included either statistical data-driven evaluation or a user-centred evaluation stage.
Bring the benefits of rapid scalability to the components I develop for Signal’s platform by using Amazon’s cloud computing system (such as the EC2 and S3 features), whilst controlling costs.
Technologies used include: Python, Clojure, ElasticSearch, AWS, GitHub, NLTK, spaCy and scikit-learn.
Robert Gordon University & City University London April 2012-September 2014
Senior Research Fellow
My role was to work with journalists and developers to build tools to find and organise real-time news from Twitter.
Worked closely with journalists from City University and elsewhere to understand their needs. I developed methods and algorithms to help journalists find breaking news stories from Twitter through novel trend-detection and ‘news-hound’ discovery methods.
Co-organized the SNOW workshop data challenge, where I led the evaluation of 10 international teams’ submissions to a news-detection task.
Acted as bridge between software engineering colleagues and journalist colleagues, translating each others needs into terms the others could understand more easily.
Lead author of quarterly & annual reports to our key stakeholder (the European Commission), which included coordinating with partners from several organisations across Europe.
Technologies used include: Java, R, MongoDB, Twitter APIs.
Department of Computing, University of Surrey 2009-2012
My role was to develop innovative tools to analyse pictures of plant specimens from Kew Gardens to aid species identification and understand the effects of climate change.
Developed image processing software to extract botanical characteristics from images of herbarium leaf specimens stored at Kew Gardens.
Developed a machine learning system that could assign species labels to these images.
Created a proof-of-concept system which required me to rapidly develop skills in both botany and image processing.
Defined and collected a unique set of data to help develop and evaluate the system
Technologies used include: Matlab, Java.
University of Hertfordshire 2008-2011
Part-time visiting lecturer
My responsibilities include online supervision of undergraduate honours degree students, including marking coursework and exams.
Institute of Ophthalmology, UCL 2006-2009
My role was to improve understanding of visual perception through computer modelling and data analysis.
Investigated human and insect vision in collaboration with visual and computational neuroscientists
Used statistical and machine learning tools such as neural networks, to produce “virtual animals” that learned to interpret simple scenes within a synthetic ecology
Demonstrated likely evolutionary origins of optical illusions
Queen Mary, University of London 2004-2006
Part-time distance learning tutor
Responsible for the online supervision of undergraduate honours degree students, including marking coursework and exams.
University College, London, Department of Computer Science 2001-2006
Senior Research Fellow
Worked with pharmaceutical researchers and developed tools to automatically extract information from research papers. I developed software (BioRAT) designed to locate research papers on the internet and to extract useful information from them to build a database.
Helped develop a machine learning algorithm to discover novel patterns of information in unstructured text using NLP.
Worked with a major pharmaceutical company to assist their drug-development programs.
Worked with medical and pharmaceutical researchers and with information architects to understand their needs.
Regularly presented work to senior managers, including budget holders.
Technologies used include: Java, GATE.
UCL April-September 1999
Part-time research consultant
During the my PhD, I was employed as a research consultant on a project bringing together retailers and academics to investigate targeted advertising for home shoppers.
My work included the evaluation of several data mining tools and an initial set of data mining studies.
Co-authored several reports and presentations to the partners.
Fraser Williams plc 1995-1997
London software house where I was involved in designing and programming large-scale database systems. These involved long-term projects for clients drawn from both the public and private sectors.
Visited clients on-site to discuss and clarify their needs, and to provide training
Supervised junior programmers and provided on-the-job training.
Technologies used include: PRO-IV, SQL, VB.
PhD Computer Science at University College London 1998-2002
My thesis title was “Intelligent Analysis of Small Data Sets for Food Design”, and concerned the development and evaluation of machine learning methods, motivated by product design work within the food industry. The aim was to model consumer preferences of food products by learning relationships from very small data sets. Areas researched include feature selection, cluster analysis, outlier detection, regression, and Bayesian belief networks.
Unilever plc sponsored this work and provided data and advice throughout. I spent 6 months at one of their research centres, which allowed me to disseminate current academic thinking within Unilever and learn more about their approaches to data analysis.
MSc Computational Intelligence (with Distinction) at Plymouth University 1997-1998
This included study of adaptive intelligent systems such as genetic algorithms and neural networks, and their application to engineering, business and financial systems. My project work investigated the use of “genetic programming” for modelling consumer laundry datasets provided by Unilever plc.
BSc (Hons.) Cognitive Science, Class 2 (ii) from Exeter University 1991-1994
This included study of artificial intelligence, neural networks, perception, cognition and linguistics, along with more general computer science and psychology modules.
Skills and Experiences
I have professional experience of several major programming languages and databases, including Python, Clojure, Matlab, Java, ElasticSearch and MongoDB, along with exposure to R, C++, VB, Prolog, SQL and PRO-IV. I have also used major libraries such as scikit-learn, NLTK, spaCy, GATE and tools including GitHub and AWS. For much of this work, I have been a member of agile and cross-functional teams including developers, designers and end-users.
Keeping fit is an important part of my life and I enjoy running, regularly competing in 10k races. For two years, I served as the treasurer for a local tenants and residents association, helping to track expenses and plan spending on several community projects. Recently, I’ve become skilled in woodwork, making children’s toys, decorations and several small items of furniture.
A full set of my peer-reviewed publications is available online at dcorney.com/publications, and copies of all papers are available on request. Recent papers include:
D. Corney, D. Albakour, M. Martinez and S. Moussa (2016) “What do a Million News Articles Look Like?” in First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16; co-located with ECIR 2016), Padua, Italy. Full text
S. Schifferes, N. Newman, N. Thurman, D. Corney, A. Göker, and C. Martin, (2014) “Identifying and verifying news through social media,” Digital Journalism 2(3), pp. 406-418.
E. Byrne and D. Corney (2014) “Sweet FA: sentiment, swearing and soccer,” in ICMR2014 1st Workshop on Social Multimedia and Storytelling, Glasgow, UK, Apr. 2014. Pre-print.
Aiello, L.M., Petkos,G., Martin, C., Corney, D.P.A., Papadopoulos, S., Skraba, R., Goker, A., Kompatsiaris,Y., Jaimes A. (2013) “Sensing trending topics in Twitter”, IEEE Transactions on Multimedia. DOI: dx.doi.org/10.1109/TMM.2013.2265080