Dave Beckett — Resume

Location
San Francisco, California, USA
Email
dave@dajobe.org
Telephone
Google Voice: 650-450-8421 (will call screen)
Sites
Home page: www.dajobe.org
Blog: www.dajobe.org/blog/
Software: github.com/dajobe
Digital resumes
www.dajobe.org/cv/
Stack Overflow
www.linkedin.com/in/dajobe

Interests and Experience

Reliability: scaling, complexity, change
Web: technologies, software design and architecture
Data and metadata: Big Data (Hadoop full stack), NoSQL, Semantic Web and RDF, relational (SQL), semi-structured, real time/low latency, distributed
Open: standards, Open Source / Free Software development, open data

Not interested in: "web3", crypto, blockchain, NFT

Key Skills

Software development
Analysis, design and architecture for large-scale software systems
Strong skills in technical leadership, training, mentoring and communicating
Coding considering long-term portability, packaging, maintenance and support
Technical writing, documentation and presentations
Languages: C, Python, Perl, automake, autoconf, shell, flex and bison (expert 10+ years); Ansible, Hive SQL, MySQL (experienced); Chef, Java, Ruby, PHP (known)
Expert on Resource Description Framework (RDF) and Semantic Web Technology
Expert on XML, XML Namespaces, XML Infoset and web architecture
Extensive experience with Web concepts, architecture and technologies
Experience with geo and local search technologies and business.
Experience with social networking techologies and products.
Systems and reliability engineering
Capacity planning of servers, services to the multi-datacenter level
Forecasting of future capacity. Expert at spreadsheets (google sheets /excel) and pivot tables.
Systems, services and distributed monitoring and observability
Deployment and configuration management.
Performance optimizing, tuning, specifying hardware requirements and workig with hardware engineers to develop, test and productionize new hardware.
Problem and incident analysis, remediation and identification of longer term steps
SRE practice and processes: SLA, SLI, SLO, error budgets and executing that with development partners. Training on how SRE works.
Free Software / Open Source
Licensing, collaboration, community, policy issues.
Founder of Redland RDF, Flickcurl Flickr API projects
Standards development activity: W3C, RDF and Dublin Core
Co-author of 1 W3C Recommendation on Turtle with Sir Tim Berners-Lee, Eric Prud'hommeaux and Gavin Carothers (Feb 2014)
Editor of 3 W3C RDF Recommendations, 1 Dublin Core Recommendation
Member of W3C RDF Data Access Working Group (2004-2005)
Member of W3C RDF Core Working Group (2001-2004)
W3C representative for the University of Bristol (2002-2005) and University of Kent (2000)
Portable Network Graphics (PNG) (1995-) and the first browser implementation of it
Technologies
Hadoop stack: HDFS, Map Reduce, YARN, Hive, HBase
Operation of Linux (RedHat CentOS, Debian / Ubuntu, Gentoo), OSX
Familiar with Docker such as used in docker nghttp2
Configuration management: Puppet, Ansible, some chef
Linux systems administration and network administration.
Software, Community and Professional roles
Program committee member for O'Reilly Strata conference on big data (2011-2015)
W3C Semantic Web Interest Group (2000-2015)
Debian Project Developer (2005-2022)
Co-founded planetrdf.com (2004-2020)
Co-ran W3C Semantic Web Interest Group IRC logs and community scratchpad (2000-2020)

Professional Experience

May 2016 — Feb 2023: Twitter Inc, San Francisco, California, USA
Senior Staff Site Reliability Engineer (Jun 2021 — Feb 2023)
Keeping the Data Platform at Twitter operating both on-premise and in GCP. Includes some of the largest Hadoop clusters in the world. Leading the team into improving automation, capacity planning, fixing operational problems, adding operational features and performing upgrades. Working with management on strategic and technical challenges, organizationand planning. Dedicated to making our SRE team successful wherever they work.
Achievements: Optimizing and planning capacity saving many $Ms several times. Setting optimal hardware requirements and working to execute it. Safety and reliability strategy company wide. Coded automation isystem for removing toil of hadoop bare metal fleet maintenance including reboots, upgrades, problem discovery, remediation and more
Staff Site Reliability Engineer (Mar 2017 — Jun 2021)
Keeping the Hadoop clusters at Twitter (some of the largest in the world) running both on-premise and in cloud along with the rest of the Data Platform. Leading the team into improving automation, capacity planning, fixing operational problems, adding operational features and performing upgrades. Working with management on strategic and technical challenges and planning.
Achievements: Leading data platform cloud migration for both deployment and network design, in collaboration with cloud vendor (GCP).
Senior Site Reliability Engineer (May 2016 — Mar 2017)
Keeping the Hadoop clusters at Twitter (some of the largest in the world) running on-premise (bare metal). Leading the team into improving automation, capacity planning, fixing operational problems, adding operational features and performing upgrades.
Achievements: technical evaluation of multiple cloud vendors for data platform. Planning migration approaches with leadership. Providing technical input into cloud decision process.
July 2013 — May 2016: Rackspace Hosting Inc, San Francisco, California, USA
Senior Software Engineer
Building Hadoop-based big data enterprise platforms coding in python and devops with Chef and Ansible. Application coding in Map-Reduce Hadoop with HBase and Hive in Java and some Scala. Performing Hadoop day-to-day operations (HDFS, Map-Reduce, Hbase, Hive, ...) including operation, deployment and debugging of job issues. Single handedly administering and supporting multiple HDP clusters via command line and more recently with Apache Ambari. Developed Hive-based analytics over large data feeds including managing data schema mappings and data management with Airflow and some Cascading and Scalding. I track big data industry technology trends developing longer term tech strategies. Learning Spark.
Achievements: Optimizing of large scale reporting Scalding jobs with custom Hive windowing functions.
June 2012 — July 2013: Turner Broadcasting Inc, San Francisco, California, USA
Senior Software Engineer
Social news and content managment software in Python with Linux and Chef configuration management work.
September 2010 — May 2012: Digg Inc, San Francisco, California, USA
Lead Software Engineer (September 2010-May 2012)
Coding with Python, PHP and a little JavaScript. Working with Cassandra, Redis, Memcached, Hive, Hadoop Map-Reduce and Tornado. Developed with Gerrit code review and GIT with continuous integration via Hudson. Engineering infrastructure design and architecture. Documented existing systems design and synthesized architecture. Lead on tracking and analytics stack supporting business metrics and analysis needs. Mobile device and mobile web lead fixing Digg main and mobile sites on touch and small screen devices. Lead on public web API supporting IOS app, dealing with client and server OAuth and developing new APIs. Doing whatever it takes to get the job done.
October 2005 — August 2010: Yahoo! Inc, Sunnyvale, California, USA
Principal Software Architect (Jan 2010 — Aug 2010)
Social media technology domain architect for Yahoo! Media property group: News, Sports, Finance, Entertainment globally. Providing technical leadership over multiple projects in the social media area, looking at integration with Facebook, Twitter and other networks, social engagement technology such as blogging and commenting, polls, ratings, reviews. Designing integrations and developing social technology strategy working with product, business and technology leadership. Mentoring and training other technical contributors.
Senior Software Architect (Feb 2009 — Jan 2010)
Technical leadership over multiple projects and Technical Leads using Web, Storage and Serving technologies at large scale. Designing major projects from scratch with global reach, scaling as needed, with best of breed storage and search technology. Architect of Yahoo! Local serving local event and business listings integrated with maps and geo/local search.
Software Architect (Jul 2007 — Feb 2009)
Technical leadership over multiple projects and Technical Leads using Web, Database, XML, Semantic/Natural Language and Semantic Web and other novel technologies. Designing software architectures, large scale deployments and developing the long term technical plans and visions. Participating in company-wide leading-edge technological developments and plans.
Principal/Senior Software Engineer (Oct 2005-Jul 2007)
Technical lead on projects using Web and Semantic Web technologies. Designing web APIs and implementing them in PHP and C. Moved RDF via the Redland libraries into a key technology for managing Yahoo! content and metadata.
2000 — October 2005: University of Bristol, UK
Senior Technical Researcher, technical leader, IEMSR Project (Aug 2004-Oct 2005)
Management and administration: responsibilities including project technical direction, project team management, co-leading ILRT Web Futures Group including bidding for funding.
Worked on the W3C RDF Data Access Working Group developing the SPARQL RDF query language (2004-).
Java development with Eclipse, SWT and JFace.
Senior Technical Researcher, SWAD Europe (Dec 2002-Oct 2004)
Ran development, outreach and workshops for SWAD Europe
Designed and developed the portable Redland RDF API, Raptor RDF parser and Rasqal RDF query libraries
Worked on the W3C RDF Core Working Group (WG) editing two W3C Recommendations
Participated in many RDF developer communities and activities
Built Web Search Environments (WSE) novel web crawling/metadata system
1998 — 2000: University of Kent at Canterbury, UK
Research Fellow
UK Mirror Service (UKMS): designed, implemented and operated.
Created the UKMS metadata, search, web mirroring and logging systems.
Extensive Linux and Solaris administration.
Created the premier online RDF Resource Guide (1998-present)
Operated and maintained the database-driven department web site.
1990 — 1998: University of Kent at Canterbury, UK
Computing Officer
Parallel computing with INMOS Tranputers, Meiko, occam language
Support Parallel Computing/HPC service center for south east UK
Created and operated the Internet Parallel Computing Archive (IPCA) (1993-1998).
Participated in the Dublin Core Metadata Initiative (1995-)

Education

1987-1990, University of Bristol
BSc (Hons) Degree in Computer Science

Selected publications

Turtle - Terse RDF Triple Language, W3C Recommendation. Edited by Eric Prud'hommeaux and Gavin Carothers. Co-authored with Sir Tim Berners-Lee, Eric Prud'hommeaux and Gavin Carothers, 25 February 2014
SPARQL Query Results XML Format, Sandro Hawke (second edition editor), Dave Beckett and Jeen Broekstra (editors), W3C Recommendation, 21 March 2013.
Semantics Through the Tag paper (slides) presented at XTech 2006, Amsterdam 19 May 2006.
RDF/XML Syntax Specification (Revised), Dave Beckett (editor), W3C Recommendation, 10 February 2004
RDF Test Cases, Jan Grant and Dave Beckett (editors), W3C Recommendation, 10 February 2004
SWAD Europe deliverable report on Workshop on Semantic Web Storage and Retrieval, held 13-14 November 2003 at Vrije Universiteit, Amsterdam. 12 January 2004

Selected presentations and events

How @TwitterHadoop Chose Google Cloud presentation and video of presentation at Google Cloud Next 2018 with Derek Lyon. Videoed with Derek for promotion that appears at http://cloud.google.com/twitter
Moving the Twitter Hadoop Elephant Partly on Clouds The 2020 Cloud Next was cancelled but I had an accepted presentation on the SRE work for above
Screencast video: Command Line Semantic Web with Redland presented at the Semantic Web Austin Meetup during SXSW, 15 March 2010.
Open Source Semantic Web, Semantic Technoogy Conference 2009 open source Code Camp, 14 June 2009.
Invited keynote panel speaker, Semantic Technology Conference, San Jose, May 2007
Redland, Raptor and Rasqal - Open Source RDF in C, Perl, Python, PHP, Ruby, Tcl, Java and C#, invited talk at XMLOpen, Cambridge, 21-23 September 2004
Invited participant to speak on the semantic web at the Rueschlikon conference on information policy in the New Economy, organised by the John F. Kennedy School of Government, Harvard University, sponsored by The Rueschlikon Centre for Global Dialogue, Switzerland, 19-21 June 2003
Semantic Web Technologies for UK HE and FE Institutions (session details), Invited lecture given at Institutional Web Management Workshop 2003, University of Kent, Canterbury, 12 June 2003
Semantic Web Today, invited lecture in Electronic Commerce and New Media series, Department of Information Systems, Vienna University of Economics and Business Administration, Austria, 21 May 2001.