Douglas Fraser

Systems research and development consultant focusing on AI, machine learning, NLP, and graph databases. Extensive experience in software and system development in various business sectors and industries such as IoT, ocean shipping, rail transport, e-commerce and the web.

employment

Senior Research Software Engineer, University of East Anglia

04-2017 — current

Various short term contracts, e.g. assisting with writing research proposals and currently upgrading the Network Rail ETL system to use Prefect, ActiveMQ, and Apache Spark.

Was the senior researcher on a RSSB feasibility study investigating how machine learning, expert systems, and graph databases could be used to improve the management of train delays. Primary duties included requirements analysis and design/development prototyping of the AIEDSS system which predicts train performance and reactionary train delays, then devises action plans for train controllers. This included design/development of a ETL pipeline for getting data from Network Rail.

Also TA / lecturer for various modules: Artificial Intelligence and Machine Learning, Research Techniques, and Django based web development.

  • the AIEDSS system architecture
  • the framework for running machine learning experiments
  • a ETL pipeline for processing Network Rail datasets
  • a Apache Hive / Drill / Tez / HDFS Hadoop cluster
  • a Neo4j database that stores delay propagation graphs
  • a novel data visualisation tool for analysing the delay graphs
  • the research into models and ensembles for best predicting train running times
Python Hadoop HDFS Hive Drill Neo4j machine learning data science

Independent Consultant

08-2013 — current

Contractor / consultant for a number of websites; work involved front-end and back-end development as well as security assessment and systems administration.

Jnction: lead for an Node.js expert system subsystem/microservice for their CEIS software that will assist train companies (TOCs) in managing disruptions to rail service. Also designed/developed a microservice for TOC rail blockage and station disruption plans.

World Energy & Meteorology Council: led R&D project to develop a prototype of a news article analysis and classification system that uses NLP for their TEAL climate change awareness and environmental data visualisation tool.

Advised & assisted with setting up AWS infrastructure that fulfills WEMC's High Performance Computing needs and their Timescale/Postgresql time series database. Also designed/developed the FastAPI based API to their Timescale database for TEAL.

  • Developed the heavily customized Django based administrative backend that would enable opticians to easily manage their wares, sales, etc in the startup's reseller system
Python Django Mezzanine

M.Sc. Student (Knowledge Discovery and Data Mining), UEA

10-2016 — 09-2017

MSc with Distinction - dissertation was winner of department's 'Best Dissertation' prize. [pdf]


Projects included research into using machine learning to evaluate the credibility of websites and the information they provide as well as a information visualization system for analyzing the network graph of reviews, reviewers, and products for sites like Amazon.


Specific research interests include machine learning ensembles, natural language processing (NLP), authorship analysis, and automated assessments of website credibility and fake news.

Senior Research Engineer, Proxama

03-2014 — 10-2015

Responsible for researching, analyzing, and developing new technologies for use by Proxama, such as recommendation systems, iBeacons, and HCE for contactless mobile payments.

  • Developed a roadmap for Proxama's future efforts to streamline and enhance the data science aspects of its beacon network and the TapPoint marketing campaign / rewards system
  • Responsible for design of a machine learning based recommendation system for analyzing consumers' financial transactions in order to increase effectiveness of marketing efforts through TapPoint
  • Development and maintenance of Django based servers of a prototype EMV and HCE compatible system (i.e. contactless mobile payment) akin to ApplePay
  • Team lead for a small team that developed the Certificate Authority and Web Services components of a prototype J2EE (JBoss and Glassfish) system based on the Django based prototype
  • Development of Python scripts for statistical analysis of beacon related data and technical reports along with a Django app and Python library for integrating segment.com with Proxama's systems
  • Researched iBeacon / Eddystone technology as part of effort towards evaluating beacon hardware from manufacturers - e.g. developed firmware for a "secure" iBeacon to prevent network sniffing
  • Responsible for the creation of iOS and Android beacon related test and research applications
  • Developed library for Android applications, based on beacon research, to emulate and improve on iOS's beacon proximity status
Django iBeacon Eddystone AWS HCE mobile payments iOS Android Python J2EE JBoss Glassfish data science

Head of IT Department, Microcinema

01-2002 — 04-2014

Responsible for all IT matters such as system administration, website development, and IT strategy - the position was labor-on-demand as Microcinema's main focus was on reselling obscure and art house type films (DVDs) in bulk to wholesalers and large educational institutions.

Another aspect of the position was also managing the IT for Independent Exposure (www.independentexposure.com), a short film festival and curated archive that Microcinema ran for 13 years

  • Used Django as the basis for the new websites and backend application for managing the product database - notable work included designing a multi-site / multi-database Django installation so both of Microcinema's websites could run off one master product database
  • Designed a Video on Demand system based on Amazon Web Services
  • Duties also included all system administration of the servers, including Apache, Exim, Courier, Dovecot, MySQL, etc. -- migrated server and website across several hosting companies
  • Designed / developed a web based accounting system for tracking sales and royalties
  • Integration of Microcinema's systems with other companies such as e-commerce platforms (Netsuite.com, Mals-e.com, Cybersource, SagePay, Paypal) and USPS
Django Python DVD film film festivals Netsuite Apache MySQL PHP Paypal Cybersource Sagepay

Founding Engineer and Head of IT, Linescape

03-2008 — 04-2013

Founding engineer / system architect for Linescape/Tarisoga, an aggregator of ocean shipping schedules much like Sabre is for the airline industry, or Expedia - also responsible for managing the part time contractors.

  • System architect and lead developer for all the backend IT systems, which include the following -
    • Database layer, made up of MySQL and the Neo4j graph database which holds the schedule data
    • Web Services API (REST) on top of Neo4j that utilizes OAuth
    • Selenium based web scraping system for processing 300+ carrier websites and their online schedules
    • Data processing pipelines which clean and transform EDI, XML, HTML, text schedules to Tarisoga's XML
    • Amazon RDS based system for creating large search result sets for customers
    • Datafeed system for delivering weekly customized search results to customers
  • Supervised the development of Tarisoga's public Symfony based website (linescape.com), an Expedia-like Javascript based one page application for searching the schedule database
  • Designed and developed the portal for advertisers to manage their Linescape website accounts
  • Designed and developed an online marketplace for shippers to request and respond to freight rate requests
  • Duties also include all system administration of the Debian based servers, including Nginx, Lighttpd, Exim, Courier, etc
ocean shipping Debian Neo4j Javascript Apache Nginx Exim Courier PHP Python Symfony MySQL Percona Expedia UNIX shell scripts AWS XSLT XML Selenium

Lead Developer, Ventures, Etc

01-2004 — 10-2007

Full time contractor hired to design and develop the J2EE backend for Xpressbet.com (an online racetrack gambling site) that handles the wagering process and integrates with other companies' systems.

  • Replaced the previous designed for C++ code / application that was jammed into Weblogic with a properly designed J2EE architecture
  • SLA increased from 82% to 99.9%, maximum concurrent users increased from 2200 to beyond 4500
  • Revenue increased 22% due to increase in website's stability (a million dollars a weekend sometimes)
  • Was rehired to port, redesign, and enhance the back end when Xpressbet moved to using JBoss
  • Installed and administered the Weblogic servers; duties included packaging and deployment of the application as well as development
  • Developed Web Services for integrating Weblogic server with .NET system of partner company
  • Wrote Ant based and UNIX scripts for automating daily tasks as well as for testing
  • Wrote test plans and directed the Quality Assurance phase of the project
  • Assisted with maintenance and enhancements of the PHP based front end of Xpressbet.com
Java J2EE Weblogic JBoss EJB Web Services

Senior QA Engineer, Cohera

12-1999 — 04-2002

Senior developer in the QA department responsible for designing and developing automated test systems for junior employees to use in their duties.

  • Designed / developed a scriptable Java based multithreaded automated test harness (based on JUnit) for Cohera’s web scraping software that used multiple web servers and DBMSes
  • Designed and developed a Java based automated test harness for testing the various components of Cohera’s J2EE n-tier catalog management and content integration system
  • Developed a library of SilkTest and SQA Robot test routines for the Weblogic based Catalog Management System to alleviate the QA department’s workload of creating test scripts
  • Maintained and extended the existing Java based multithreaded automated test system that tested the Cohera Hub, a front end for integrating disparate DBMSs into a large distributed database system
  • Developed Perl scripts for easily installing and updating standardized test databases
  • Led integration of Cohera's UNIX based QA system into Peoplesoft's Windows development environment after Peoplesoft bought Cohera
Java J2EE automated testing QA JUnit multithreaded programming Weblogic MySQL Oracle Postgresql web scraping

Senior Developer, Science Applications International Corporation

03-1994 — 12-1999

Developer for a large contracting company that mostly focused on the US defence industry - was system architect and lead developer for several projects.

  • Technical lead for the Interim Tactical Orderwire System, a multithreaded client/server C++ text and voice orderwire system used in remote satellite terminals for communications management purposes. Some notable achievements on this project included the following -
    • Saved a contract that was at risk of being rejected by working closely with a frustrated customer
    • Developed an automated testing system for exercising the GUI, network communications layer, and the customized memory management subsystem
    • Designed and implemented a reliable multicast satellite communications protocol as TCP/IP was not usable along with a protocol for properly replicating the server’s database at the client sites
    • Wrote unit and integration test plans; wrote the software design documents and user manual
  • Received several out-of-cycle raises, bonuses, and a promotion for outstanding performance and productivity. In one performance review, my supervisor stated "I consider Doug one of the top two software developers in my group and one of the top five in the Operation."
  • Participated in several projects that assessed the security of customers' computer systems; wrote automated scripts for those purposes and acquired a working knowledge of firewalls, voice mail hacking, Internet and website security, and various security packages
  • Software architect for an multithreaded SAIC war dialer that has advanced features such as scanning using multiple modems, a remote control capability, and an automated “intelligent” break-in capability
  • Created SNMP software that enabled a customer to manage the performance and security of dial-in access devices on their company's intranet
  • Participated in the design and development of a Nexpert Object-based expert system for use in a device that is capable of exercising intelligent control of jamming resources to minimize ‘communications fratricide’
  • Technical lead for the Demand Assigned Multiple Access satellite communications C++ system which allowed users to dynamically create and modify satellite communication links between network terminals. Some notable achievements include the following -
  • Designed the C++ library of satellite modem drivers, database module, the network protocol, and the database schemas
  • Responsible for quality control / integration of co-workers’ modules which were written in C
Pascal C C++ satellite networks TCP/IP IT security assessments OOP wardialing

Lead Developer, Pragmatics, Inc.

06-1992 — 03-1994

System architect and lead developer for one component / box of two satellite networking projects for the US Department of Defense.

  • Technical lead in the design, development, and testing for the Object Oriented Store and Forward Message Processing System (SFP). The SFP is a hub and router for messages from other components of the Secure Survivable Communications Network (SSCN), a distributed satellite communication network
  • Designed a networking algorithm to prevent duplicates messages and to prevent messages from flooding the subnetworks connected to the SFP
  • Rewrote low level hardware interrupt routines of the SFP’s OS for greater efficiency
  • Designed and wrote a C++ library for controlling satellite modems and an OO database for other team members working on a specialized demand assigned bandwidth communications network
  • Developed an OO database as well as the database schema for DABS, a specialized demand assigned bandwidth packet communications network that allows users to dynamically configure connections
  • Generated system and unit test plans, software design documents, functional requirements, and interface specifications for these projects based on the DoD 2167A standard
  • Responsible for the white box testing of the GUI and business logic modules of DABS as well as the unit and integration testing of the SFP
OOP satellite networks Pascal C++
Additional employment details on request

skills

Skill Keywords
Platforms UNIX, OS X, AWS, iOS, Android
Languages Python, Julia, Java, Awk, Javascript, shell scripts, SQL, PHP, XSLT, XHTML, CSS
Libraries Leaflet, Cytoscape, Highcharts, D3, jQuery, Apache POI, xlswriter, scikit-learn, numpy, pandas, nltk, skplot, TPLOT, Empathy, Stanford NLP suite
Software Tigergraph, Neo4j, Hive, Drill, Spark, RStudio, Jupyter / IPython, LaTeX, MySQL, Git, Node.js, ActiveMQ, NetBeans, WEKA
Frameworks Prefect, FastAPI, Express.js, Plotly, Dash, Dask, Hadoop, HDFS, Django, MLFlow, Wordpress
Technologies Hadoop, J2EE, EJB, Web Services, REST, iBeacon, SOAP, AJAX
Sysadmin Debian/Ubuntu, AWS, Nginx, Apache, Lighttpd, SpamAssassin, ASSP, Exim, Postfix, Fabric, Dovecot, Courier, CPanel / WHM

projects

Railcron

Railcron is a Prefect based application for fetching and processing the daily data Network Rail offers (e.g. train schedules, live DARWIN data feed) as part of its Open Data initiative.

DelayExplorer

DelayExplorer is a Python based Dash/React system for visualizing how the reactionary delays in the UK rail network cascade (based on Network Rail's Historic Delay Attribution data).

graph database, data science, Neo4j, Dash

Revuze

Revuze is a data visualization system for investigating review data networks (the review, reviewer, and the reviewed item). Initially developed for use in my dissertation, but will open source when finished.

machine learning, data science, Highcharts, Neo4j, D3

Sorted

Sorted is a bookmark categorizer, based on the machine learning research into analyzing and categorizing web pages. Currently researching the machine learning aspects and designing the basic system.

machine learning, data science, classification, Numpy, sci-kit, statistical learning

Webcv

Webcv is a Django based CMS for putting a resume/CV on the web which utilizes the FRESCA resume standard. Currently designing and developing.

Django, Python, node.js, hackmyresume, FRESCA, JSResume

papers

Developing Feature And Decision Level Ensembles For Classifying Fake Reviews

Research into developing custom heterogenous machine learning ensembles to improve the detection of fake reviews (spam reviews, opinion spam) on sites such as Amazon and TripAdvisor. Cognitive linguistics and natural language processing (NLP) are major aspects.

DelayExplorer Proof Of Concept

A technical report about DelayExplorer, its functionality (versions 1 and 2), and the custom graph layouts developed for better visualizations of how train reactionary delays cascade.

The Data Science of Good Writing

A series investigating the stylometry of different authors to determine what makes writing 'good'.

Automated Assessment of Website Credibility

A literature review of research into credibility and how it applies to evaluating websites.

Ensuring Veracity in Heterogeneous Data Mining

A review of the problems and solutions related to veracity in multiple heterogenous data sets.

Improving Management of Forest Cover

A research study into the use of decision trees and clustering for the managment of forests.

education

M.Sc. Knowledge Discovery and Data Mining, University of East Anglia

2016 — 2017

Dissertation is on improving fake review detection through linguistics, NLP, and machine learning ensembles.

M.Sc. Informatics and Computer Science, University of Edinburgh

1998 — 1999

The Informatics course focused on the practical aspects of software engineering, not Comp. Sci theory.

Thesis was “MINT: A Toolbox for the Design and Simulation of Multistage Interconnection Networks”

B.Sc. Computer Engineering, Virginia Polytechnic Institute and State University

1988 — 1992

The courses covered both hardware design (E.E.) and software (C.S.). Minors in C.S. and psychology.

Systems Engineering Certificate Program, George Washington University

1998 — 1998

The certificate program covered the fundamentals of systems engineering.

Summer Professional Program: Embodied Intelligence, MIT

1997 — 1997

The summer course covered the basics of artificial intelligence and robotics.

Diploma, Writing for Film and Television, Vancouver Film School

2002 — 2003

VFS is a trade school, so the courses were in the craft of film making along with making several student films. In semester 2, I switched to the writing department.

Professional Certificate, Stanford Online

2016-01 — Present

A series of courses in Machine Learning and Data Science topics

statistical learning, machine learning