David Corney

CV

David Corney, PhD

Profile

I am a Data Scientist and Engineer with a proven track record in applying ahead of the curve technologies to solve data-driven problems including: extracting information from biomedical publications to aid drug discovery; collecting and analysing tweets in real-time to help journalists track breaking stories; and analysing large volumes of news articles for a media monitoring tool. With many years experience of applying machine learning and natural language processing to solve real-world problems both in industry and academia, I am always interested in new opportunities in organisations that will allow me to continue to apply and extend my expertise by exploring new areas.

Professional Experience

Full Fact    February 2019 - present

Senior Data Scientist / Lead NLP Engineer

I lead the application of machine learning and NLP techniques to help fact checkers work more effectively. This includes a “checkworthy claim” tool that finds claims that may be harmful or misleading and so helps fact checkers save time with their media monitoring. We also developed a claim matching tool that finds repeats of semantically-equivalent claims in the media. We recently extended these tools to work in multiple languages, including French and Arabic. I lead a small team of data scientists and regularly collaborate with external partners.

Dunnhumby    April 2018 - February 2019

Senior Data Scientist

I helped develop a fine-grained sales prediction model, predicting sales levels of individual items at a daily level for individual stores. My work included developing high performance Python code, capable of making millions of forecasts quickly.

Factmata    January 2018 - April 2018

Lead Machine Learning/NLP Engineer

I developed a prototype “fake news” detector as well as running several pilot studies with adtech companies. I was also involved with supervising more junior staff, recruitment, external collaborations and co-organizing the Hyperpartisan News Detection Challenge, part of Semeval19.

Signal Media Ltd    September 2014 - December 2017

Data Scientist

My role was to discover, evaluate and apply the latest research in natural language processing (NLP) and machine learning to analyse and classify news articles at a large scale in real-time. Working in agile teams, our aim is to develop prototypes based on current research with production-quality code and turn them into products.

  • Leading the development of novel AI components and proofs-of-concept including a novel entity recognition system, a topic classification system and a horizon-scanning / sentiment analysis tool.

  • Managed a number of key 3-6 month projects with multinational clients, working closely with external stakeholders to understand the brief, communicating their requirements to engineering colleagues and demonstrating progress throughout the projects through reports and presentations.

  • Maintaining and strengthening Signal’s links to the academic and research communities. Outreach includes presenting work at WSDM and Search Solutions; giving regular guest lectures at City University.

  • Hosted, mentored and supervised 7 MSc Data Science and Machine Learning students during projects by providing technical guidance and advice for their data science work, which formed the basis of new product features at signal. Two students went on to study PhDs and two joined Deepmind.

  • Led interactive sessions on the basics of AI/machine learning to all staff with a view to bridging the gap between technical and non-technical teams.

  • Analysed and indexed regulation documents collected from several jurisdictions. This work included entity and topic recognition using machine learning as well as several specialist modules to aid user’s in their search tasks. Each piece of work included either statistical data-driven evaluation or a user-centred evaluation stage.

  • Bring the benefits of rapid scalability to the components I develop for Signal’s platform by using Amazon’s cloud computing system (such as the EC2 and S3 features), whilst controlling costs.

  • Technologies used include: Python, Clojure, ElasticSearch, AWS, GitHub, NLTK, spaCy and scikit-learn.

Robert Gordon University & City University London     April 2012-September 2014

Senior Research Fellow

My role was to work with journalists and developers to build tools to find and organise real-time news from Twitter.

  • Worked closely with journalists from City University and elsewhere to understand their needs. I developed methods and algorithms to help journalists find breaking news stories from Twitter through novel trend-detection and ‘news-hound’ discovery methods.

  • Co-organized the SNOW workshop data challenge, where I led the evaluation of 10 international teams’’ submissions to a news-detection task.

  • Acted as bridge between software engineering colleagues and journalist colleagues, translating each others needs into terms the others could understand more easily.

  • Lead author of quarterly & annual reports to our key stakeholder (the European Commission), which included coordinating with partners from several organisations across Europe.

  • Technologies used include: Java, R, MongoDB, Twitter APIs.

Department of Computing, University of Surrey     2009-2012

Research Fellow

My role was to develop innovative tools to analyse pictures of plant specimens from Kew Gardens to aid species identification and understand the effects of climate change.

  • Developed image processing software to extract botanical characteristics from images of herbarium leaf specimens stored at Kew Gardens.

  • Developed a machine learning system that could assign species labels to these images.

  • Created a proof-of-concept system which required me to rapidly develop skills in both botany and image processing.

  • Defined and collected a unique set of data to help develop and evaluate the system

  • Technologies used include: Matlab, Java.

University of Hertfordshire     2008-2011

Part-time visiting lecturer

My responsibilities include online supervision of undergraduate honours degree students, including marking coursework and exams.

Institute of Ophthalmology, UCL     2006-2009

Research fellow

My role was to improve understanding of visual perception through computer modelling and data analysis.

  • Investigated human and insect vision in collaboration with visual and computational neuroscientists

  • Used statistical and machine learning tools such as neural networks, to produce “virtual animals” that learned to interpret simple scenes within a synthetic ecology

  • Demonstrated likely evolutionary origins of optical illusions

Queen Mary, University of London    2004-2006

Part-time distance learning tutor

Responsible for the online supervision of undergraduate honours degree students, including marking coursework and exams.

University College, London, Department of Computer Science    2001-2006

Senior Research Fellow

Worked with pharmaceutical researchers and developed tools to automatically extract information from research papers. I developed software (BioRAT) designed to locate research papers on the internet and to extract useful information from them to build a database.

  • Helped develop a machine learning algorithm to discover novel patterns of information in unstructured text using NLP.

  • Worked with a major pharmaceutical company to assist their drug-development programs.

  • Worked with medical and pharmaceutical researchers and with information architects to understand their needs.

  • Regularly presented work to senior managers, including budget holders.

  • Technologies used include: Java, GATE.

UCL     April-September 1999

Part-time research consultant

During the my PhD, I was employed as a research consultant on a project bringing together retailers and academics to investigate targeted advertising for home shoppers.

  • My work included the evaluation of several data mining tools and an initial set of data mining studies.

  • Co-authored several reports and presentations to the partners.

Fraser Williams plc    1995-1997

Analyst Programmer

London software house where I was involved in designing and programming large-scale database systems. These involved long-term projects for clients drawn from both the public and private sectors.

  • Visited clients on-site to discuss and clarify their needs, and to provide training

  • Supervised junior programmers and provided on-the-job training.

  • Technologies used include: PRO-IV, SQL, VB.

Education

PhD Computer Science at University College London    1998-2002

My thesis title was “Intelligent Analysis of Small Data Sets for Food Design”, and concerned the development and evaluation of machine learning methods, motivated by product design work within the food industry. The aim was to model consumer preferences of food products by learning relationships from very small data sets. Areas researched include feature selection, cluster analysis, outlier detection, regression, and Bayesian belief networks.

Unilever plc sponsored this work and provided data and advice throughout. I spent 6 months at one of their research centres, which allowed me to disseminate current academic thinking within Unilever and learn more about their approaches to data analysis.

MSc Computational Intelligence (with Distinction) at Plymouth University    1997-1998

This included study of adaptive intelligent systems such as genetic algorithms and neural networks, and their application to engineering, business and financial systems. My project work investigated the use of “genetic programming” for modelling consumer laundry datasets provided by Unilever plc.

BSc (Hons.) Cognitive Science, Class 2 (ii) from Exeter University     1991-1994
This included study of artificial intelligence, neural networks, perception, cognition and linguistics, along with more general computer science and psychology modules. I trained my first neural network c.1993!

Skills and Experiences

Computing Skills

I have professional experience of several major programming languages and databases, including Python, Clojure, Matlab, Java, ElasticSearch and MongoDB, along with exposure to R, C++, VB, Prolog, SQL and PRO-IV. I have also used major libraries such as scikit-learn, NLTK, spaCy, GATE and tools including GitHub and AWS. For much of this work, I have been a member of agile and cross-functional teams including developers, designers and end-users.

Hobbies

Keeping fit is an important part of my life and I enjoy running, regularly competing in 10k races. For two years, I served as the treasurer for a local tenants and residents association, helping to track expenses and plan spending on several community projects. Recently, I’ve become skilled in woodwork, making children’s toys, decorations and several small items of furniture.

Selected Publications

A full set of my peer-reviewed publications is available online at dcorney.com/publications, and copies of all papers are available on request. Some favourite papers include:

Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Luca Ciampaglia, G., Corney, D., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Miguez, R., Nakov, P., Scheufele, D., Sharma, S. and Zagni G. (2024) Factuality challenges in the era of large language models and opportunities for fact-checking, Nature Machine Intelligence, 2024.

D. Corney, D. Albakour, M. Martinez and S. Moussa (2016) “What do a Million News Articles Look Like?” in First International Workshop on Recent Trends in News Information Retrieval (NewsIR’16; co-located with ECIR 2016), Padua, Italy. Full text

S. Schifferes, N. Newman, N. Thurman, D. Corney, A. Göker, and C. Martin, (2014) “Identifying and verifying news through social media,” Digital Journalism 2(3), pp. 406-418.

E. Byrne and D. Corney (2014) “Sweet FA: sentiment, swearing and soccer,” in ICMR2014 1st Workshop on Social Multimedia and Storytelling, Glasgow, UK, Apr. 2014. Pre-print.

Aiello, L.M., Petkos,G., Martin, C., Corney, D.P.A., Papadopoulos, S., Skraba, R., Goker, A., Kompatsiaris,Y., Jaimes A. (2013) “Sensing trending topics in Twitter”, IEEE Transactions on Multimedia. DOI: dx.doi.org/10.1109/TMM.2013.2265080