Senior Research Software Engineer, University of East Anglia04-2017 — current
Various short term contracts, e.g. assisting with writing research proposals and currently upgrading the Network Rail ETL system to use Prefect, ActiveMQ, and Apache Spark.
Was the senior researcher on a RSSB feasibility study investigating how machine learning, expert systems, and graph databases could be used to improve the management of train delays. Primary duties included requirements analysis and design/development prototyping of the AIEDSS system which predicts train performance and reactionary train delays, then devises action plans for train controllers. This included design/development of a ETL pipeline for getting data from Network Rail.
Also TA / lecturer for various modules: Artificial Intelligence and Machine Learning, Research Techniques, and Django based web development.
- the AIEDSS system architecture
- the framework for running machine learning experiments
- a ETL pipeline for processing Network Rail datasets
- a Apache Hive / Drill / Tez / HDFS Hadoop cluster
- a Neo4j database that stores delay propagation graphs
- a novel data visualisation tool for analysing the delay graphs
- the research into models and ensembles for best predicting train running times
Independent Consultant08-2013 — current
Contractor / consultant for a number of websites; work involved front-end and back-end development as well as security assessment and systems administration.
Jnction: lead for an Node.js expert system subsystem/microservice for their CEIS software that will assist train companies (TOCs) in managing disruptions to rail service. Also designed/developed a microservice for TOC rail blockage and station disruption plans.
World Energy & Meteorology Council: led R&D project to develop a prototype of a news article analysis and classification system that uses NLP for their TEAL climate change awareness and environmental data visualisation tool.
Advised & assisted with setting up AWS infrastructure that fulfills WEMC's High Performance Computing needs and their Timescale/Postgresql time series database. Also designed/developed the FastAPI based API to their Timescale database for TEAL.
- Developed the heavily customized Django based administrative backend that would enable opticians to easily manage their wares, sales, etc in the startup's reseller system
M.Sc. Student (Knowledge Discovery and Data Mining), UEA10-2016 — 09-2017
MSc with Distinction - dissertation was winner of department's 'Best Dissertation' prize. [pdf]
Projects included research into using machine learning to evaluate the credibility of websites and the information they provide as well as a information visualization system for analyzing the network graph of reviews, reviewers, and products for sites like Amazon.
Specific research interests include machine learning ensembles, natural language processing (NLP), authorship analysis, and automated assessments of website credibility and fake news.
Senior Research Engineer, Proxama03-2014 — 10-2015
Responsible for researching, analyzing, and developing new technologies for use by Proxama, such as recommendation systems, iBeacons, and HCE for contactless mobile payments.
- Developed a roadmap for Proxama's future efforts to streamline and enhance the data science aspects of its beacon network and the TapPoint marketing campaign / rewards system
- Responsible for design of a machine learning based recommendation system for analyzing consumers' financial transactions in order to increase effectiveness of marketing efforts through TapPoint
- Development and maintenance of Django based servers of a prototype EMV and HCE compatible system (i.e. contactless mobile payment) akin to ApplePay
- Team lead for a small team that developed the Certificate Authority and Web Services components of a prototype J2EE (JBoss and Glassfish) system based on the Django based prototype
- Development of Python scripts for statistical analysis of beacon related data and technical reports along with a Django app and Python library for integrating segment.com with Proxama's systems
- Researched iBeacon / Eddystone technology as part of effort towards evaluating beacon hardware from manufacturers - e.g. developed firmware for a "secure" iBeacon to prevent network sniffing
- Responsible for the creation of iOS and Android beacon related test and research applications
- Developed library for Android applications, based on beacon research, to emulate and improve on iOS's beacon proximity status
Head of IT Department, Microcinema01-2002 — 04-2014
Responsible for all IT matters such as system administration, website development, and IT strategy - the position was labor-on-demand as Microcinema's main focus was on reselling obscure and art house type films (DVDs) in bulk to wholesalers and large educational institutions.
- Used Django as the basis for the new websites and backend application for managing the product database - notable work included designing a multi-site / multi-database Django installation so both of Microcinema's websites could run off one master product database
- Designed a Video on Demand system based on Amazon Web Services
- Duties also included all system administration of the servers, including Apache, Exim, Courier, Dovecot, MySQL, etc. -- migrated server and website across several hosting companies
- Designed / developed a web based accounting system for tracking sales and royalties
- Integration of Microcinema's systems with other companies such as e-commerce platforms (Netsuite.com, Mals-e.com, Cybersource, SagePay, Paypal) and USPS
Founding Engineer and Head of IT, Linescape03-2008 — 04-2013
Founding engineer / system architect for Linescape/Tarisoga, an aggregator of ocean shipping schedules much like Sabre is for the airline industry, or Expedia - also responsible for managing the part time contractors.
- System architect and lead developer for all the backend IT systems, which include the following -
- Database layer, made up of MySQL and the Neo4j graph database which holds the schedule data
- Web Services API (REST) on top of Neo4j that utilizes OAuth
- Selenium based web scraping system for processing 300+ carrier websites and their online schedules
- Data processing pipelines which clean and transform EDI, XML, HTML, text schedules to Tarisoga's XML
- Amazon RDS based system for creating large search result sets for customers
- Datafeed system for delivering weekly customized search results to customers
- Designed and developed the portal for advertisers to manage their Linescape website accounts
- Designed and developed an online marketplace for shippers to request and respond to freight rate requests
- Duties also include all system administration of the Debian based servers, including Nginx, Lighttpd, Exim, Courier, etc
Lead Developer, Ventures, Etc01-2004 — 10-2007
Full time contractor hired to design and develop the J2EE backend for Xpressbet.com (an online racetrack gambling site) that handles the wagering process and integrates with other companies' systems.
- Replaced the previous designed for C++ code / application that was jammed into Weblogic with a properly designed J2EE architecture
- SLA increased from 82% to 99.9%, maximum concurrent users increased from 2200 to beyond 4500
- Revenue increased 22% due to increase in website's stability (a million dollars a weekend sometimes)
- Was rehired to port, redesign, and enhance the back end when Xpressbet moved to using JBoss
- Installed and administered the Weblogic servers; duties included packaging and deployment of the application as well as development
- Developed Web Services for integrating Weblogic server with .NET system of partner company
- Wrote Ant based and UNIX scripts for automating daily tasks as well as for testing
- Wrote test plans and directed the Quality Assurance phase of the project
- Assisted with maintenance and enhancements of the PHP based front end of Xpressbet.com
Senior QA Engineer, Cohera12-1999 — 04-2002
Senior developer in the QA department responsible for designing and developing automated test systems for junior employees to use in their duties.
- Designed / developed a scriptable Java based multithreaded automated test harness (based on JUnit) for Cohera’s web scraping software that used multiple web servers and DBMSes
- Designed and developed a Java based automated test harness for testing the various components of Cohera’s J2EE n-tier catalog management and content integration system
- Developed a library of SilkTest and SQA Robot test routines for the Weblogic based Catalog Management System to alleviate the QA department’s workload of creating test scripts
- Maintained and extended the existing Java based multithreaded automated test system that tested the Cohera Hub, a front end for integrating disparate DBMSs into a large distributed database system
- Developed Perl scripts for easily installing and updating standardized test databases
- Led integration of Cohera's UNIX based QA system into Peoplesoft's Windows development environment after Peoplesoft bought Cohera
Senior Developer, Science Applications International Corporation03-1994 — 12-1999
Developer for a large contracting company that mostly focused on the US defence industry - was system architect and lead developer for several projects.
- Technical lead for the Interim Tactical Orderwire System, a multithreaded client/server C++ text and voice orderwire system used in remote satellite terminals for communications management purposes. Some notable achievements on this project included the following -
- Saved a contract that was at risk of being rejected by working closely with a frustrated customer
- Developed an automated testing system for exercising the GUI, network communications layer, and the customized memory management subsystem
- Designed and implemented a reliable multicast satellite communications protocol as TCP/IP was not usable along with a protocol for properly replicating the server’s database at the client sites
- Wrote unit and integration test plans; wrote the software design documents and user manual
- Received several out-of-cycle raises, bonuses, and a promotion for outstanding performance and productivity. In one performance review, my supervisor stated "I consider Doug one of the top two software developers in my group and one of the top five in the Operation."
- Participated in several projects that assessed the security of customers' computer systems; wrote automated scripts for those purposes and acquired a working knowledge of firewalls, voice mail hacking, Internet and website security, and various security packages
- Software architect for an multithreaded SAIC war dialer that has advanced features such as scanning using multiple modems, a remote control capability, and an automated “intelligent” break-in capability
- Created SNMP software that enabled a customer to manage the performance and security of dial-in access devices on their company's intranet
- Participated in the design and development of a Nexpert Object-based expert system for use in a device that is capable of exercising intelligent control of jamming resources to minimize ‘communications fratricide’
- Technical lead for the Demand Assigned Multiple Access satellite communications C++ system which allowed users to dynamically create and modify satellite communication links between network terminals. Some notable achievements include the following -
- Designed the C++ library of satellite modem drivers, database module, the network protocol, and the database schemas
- Responsible for quality control / integration of co-workers’ modules which were written in C
Lead Developer, Pragmatics, Inc.06-1992 — 03-1994
System architect and lead developer for one component / box of two satellite networking projects for the US Department of Defense.
- Technical lead in the design, development, and testing for the Object Oriented Store and Forward Message Processing System (SFP). The SFP is a hub and router for messages from other components of the Secure Survivable Communications Network (SSCN), a distributed satellite communication network
- Designed a networking algorithm to prevent duplicates messages and to prevent messages from flooding the subnetworks connected to the SFP
- Rewrote low level hardware interrupt routines of the SFP’s OS for greater efficiency
- Designed and wrote a C++ library for controlling satellite modems and an OO database for other team members working on a specialized demand assigned bandwidth communications network
- Developed an OO database as well as the database schema for DABS, a specialized demand assigned bandwidth packet communications network that allows users to dynamically configure connections
- Generated system and unit test plans, software design documents, functional requirements, and interface specifications for these projects based on the DoD 2167A standard
- Responsible for the white box testing of the GUI and business logic modules of DABS as well as the unit and integration testing of the SFP
|Platforms||UNIX, OS X, AWS, iOS, Android|
|Libraries||Leaflet, Cytoscape, Highcharts, D3, jQuery, Apache POI, xlswriter, scikit-learn, numpy, pandas, nltk, skplot, TPLOT, Empathy, Stanford NLP suite|
|Software||Tigergraph, Neo4j, Hive, Drill, Spark, RStudio, Jupyter / IPython, LaTeX, MySQL, Git, Node.js, ActiveMQ, NetBeans, WEKA|
|Frameworks||Prefect, FastAPI, Express.js, Plotly, Dash, Dask, Hadoop, HDFS, Django, MLFlow, Wordpress|
|Technologies||Hadoop, J2EE, EJB, Web Services, REST, iBeacon, SOAP, AJAX|
|Sysadmin||Debian/Ubuntu, AWS, Nginx, Apache, Lighttpd, SpamAssassin, ASSP, Exim, Postfix, Fabric, Dovecot, Courier, CPanel / WHM|
Railcron is a Prefect based application for fetching and processing the daily data Network Rail offers (e.g. train schedules, live DARWIN data feed) as part of its Open Data initiative.
DelayExplorer is a Python based Dash/React system for visualizing how the reactionary delays in the UK rail network cascade (based on Network Rail's Historic Delay Attribution data).graph database, data science, Neo4j, Dash
Revuze is a data visualization system for investigating review data networks (the review, reviewer, and the reviewed item). Initially developed for use in my dissertation, but will open source when finished.machine learning, data science, Highcharts, Neo4j, D3
Sorted is a bookmark categorizer, based on the machine learning research into analyzing and categorizing web pages. Currently researching the machine learning aspects and designing the basic system.machine learning, data science, classification, Numpy, sci-kit, statistical learning
Research into developing custom heterogenous machine learning ensembles to improve the detection of fake reviews (spam reviews, opinion spam) on sites such as Amazon and TripAdvisor. Cognitive linguistics and natural language processing (NLP) are major aspects.DelayExplorer Proof Of Concept
A technical report about DelayExplorer, its functionality (versions 1 and 2), and the custom graph layouts developed for better visualizations of how train reactionary delays cascade.The Data Science of Good Writing
A series investigating the stylometry of different authors to determine what makes writing 'good'.Automated Assessment of Website Credibility
A literature review of research into credibility and how it applies to evaluating websites.Ensuring Veracity in Heterogeneous Data Mining
A review of the problems and solutions related to veracity in multiple heterogenous data sets.Improving Management of Forest Cover
A research study into the use of decision trees and clustering for the managment of forests.
M.Sc. Knowledge Discovery and Data Mining, University of East Anglia2016 — 2017
Dissertation is on improving fake review detection through linguistics, NLP, and machine learning ensembles.
M.Sc. Informatics and Computer Science, University of Edinburgh1998 — 1999
The Informatics course focused on the practical aspects of software engineering, not Comp. Sci theory.Thesis was “MINT: A Toolbox for the Design and Simulation of Multistage Interconnection Networks”
B.Sc. Computer Engineering, Virginia Polytechnic Institute and State University1988 — 1992
The courses covered both hardware design (E.E.) and software (C.S.). Minors in C.S. and psychology.
Systems Engineering Certificate Program, George Washington University1998 — 1998
The certificate program covered the fundamentals of systems engineering.
Diploma, Writing for Film and Television, Vancouver Film School2002 — 2003
VFS is a trade school, so the courses were in the craft of film making along with making several student films. In semester 2, I switched to the writing department.