<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Break | Better]]></title><description><![CDATA[Tutorials, Side-Projects, and Data-Driven Storytelling.]]></description><link>http://blog.patricktriest.com/</link><image><url>http://blog.patricktriest.com/favicon.png</url><title>Break | Better</title><link>http://blog.patricktriest.com/</link></image><generator>Ghost 1.20</generator><lastBuildDate>Thu, 11 Jul 2024 11:52:00 GMT</lastBuildDate><atom:link href="http://blog.patricktriest.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Building a Full-Text Search App Using Docker and Elasticsearch]]></title><description><![CDATA[Adding fast, flexible full-text search to apps can be a challenge.  In this tutorial, we'll walk through setting up a full-text search application using Docker, Elasticsearch, and 100 classic novels.]]></description><link>http://blog.patricktriest.com/text-search-docker-elasticsearch/</link><guid isPermaLink="false">5a63908e4305a908e18d4d8c</guid><category><![CDATA[Guides]]></category><category><![CDATA[Javascript]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Thu, 25 Jan 2018 18:00:00 GMT</pubDate><media:content url="https://blog-images.patricktriest.com/uploads/library.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog-images.patricktriest.com/uploads/library.jpg" alt="Building a Full-Text Search App Using Docker and Elasticsearch"><p><em>How does Wikipedia sort though 5+ million articles to find the most relevant one for your research?</em></p>
<p><em>How does Facebook find the friend who you're looking for (and whose name you've misspelled), across a userbase of 2+ billion people?</em></p>
<p><em>How does Google search the entire internet for webpages relevant to your vague, typo-filled search query?</em></p>
<p>In this tutorial, we'll walk through setting up our own full-text search application (of an admittedly lesser complexity than the systems in the questions above).  Our example app will provide a UI and API to search the complete texts of 100 literary classics such as <em>Peter Pan</em>, <em>Frankenstein</em>, and <em>Treasure Island</em>.</p>
<p>You can preview a completed version of the tutorial app here - <a href="https://search.patricktriest.com">https://search.patricktriest.com</a></p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_4_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>The source code for the application is 100% open-source and can be found at the GitHub repository here - <a href="https://github.com/triestpa/guttenberg-search">https://github.com/triestpa/guttenberg-search</a></p>
<p>Adding fast, flexible full-text search to apps can be a challenge.  Most mainstream databases, such as <a href="https://www.postgresql.org/">PostgreSQL</a> and <a href="https://www.mongodb.com/">MongoDB</a>, offer very basic text searching capabilities due to limitations on their existing query and index structures.  In order to implement high quality full-text search, a separate datastore is often the best option.  <a href="https://www.elastic.co/">Elasticsearch</a> is a leading open-source datastore that is optimized to perform incredibly flexible and fast full-text search.</p>
<p>We'll be using <a href="https://www.docker.com/">Docker</a>  to setup our project environment and dependencies.  Docker is a containerization engine used by the likes of <a href="https://www.uber.com/">Uber</a>, <a href="https://www.spotify.com/us/">Spotify</a>, <a href="https://www.adp.com/">ADP</a>, and <a href="https://www.paypal.com/us/home">Paypal</a>.  A major advantage of building a containerized app is that the project setup is virtually the same on Windows, macOS, and Linux - which makes writing this tutorial quite a bit simpler for me.  Don't worry if you've never used Docker, we'll go through the full project configuration further down.</p>
<p>We'll also be using <a href="https://nodejs.org/en/">Node.js</a> (with the <a href="http://koajs.com/">Koa</a> framework),  and <a href="https://vuejs.org/">Vue.js</a> to build our search API and frontend web app respectively.</p>
<h2 id="1whatiselasticsearch">1 - What is Elasticsearch?</h2>
<p>Full-text search is a heavily requested feature in modern applications.  Search can also be one of the most difficult features to implement competently - many popular websites have subpar search functionality that returns results slowly and has trouble finding non-exact matches.  Often, this is due to limitations in the underlying database: most standard relational databases are limited to basic <code>CONTAINS</code> or <code>LIKE</code>  SQL queries, which provide only the most basic string matching functionality.</p>
<p>We'd like our search app to be :</p>
<ol>
<li><strong>Fast</strong> - Search results should be returned almost instantly, in order to provide a responsive user experience.</li>
<li><strong>Flexible</strong> - We'll want to be able to modify how the search is performed, in order to optimize for different datasets and use cases.</li>
<li><strong>Forgiving</strong> -  If a search contains a typo, we'd still like to return relevant results for what the user might have been trying to search for.</li>
<li><strong>Full-Text</strong> - We don't want to limit our search to specific matching keywords or tags - we want to search <em>everything</em> in our datastore (including large text fields) for a match.</li>
</ol>
<p><img src="https://storage.googleapis.com/cdn.patricktriest.com/blog/images/posts/elastic-library/Elasticsearch-Logo.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>In order to build a super-powered search feature, it’s often most ideal to use a datastore that is optimized for the task of full-text search.  This is where <a href="https://www.elastic.co/">Elasticsearch</a> comes into play; Elasticsearch is an open-source in-memory datastore written in Java and originally built on the <a href="https://lucene.apache.org/core/">Apache Lucene</a> library.</p>
<p>Here are some examples of real-world Elasticsearch use cases from the official <a href="https://www.elastic.co/guide/en/elasticsearch/guide/2.x/getting-started.html">Elastic website</a>.</p>
<ul>
<li>Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.</li>
<li>The Guardian uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles.</li>
<li>Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.</li>
<li>GitHub uses Elasticsearch to query 130 billion lines of code.</li>
</ul>
<h3 id="whatmakeselasticsearchdifferentfromanormaldatabase">What makes Elasticsearch different from a &quot;normal&quot; database?</h3>
<p>At its core, Elasticsearch is able to provide fast and flexible full-text search through the use of <em>inverted indices</em>.</p>
<p>An &quot;index&quot; is a data structure to allow for ultra-fast data query and retrieval operations in databases.  Databases generally index entries by storing an association of fields with the matching table rows.  By storing the index in a searchable data structure (often a <a href="https://en.wikipedia.org/wiki/B-tree">B-Tree</a>), databases can achieve sub-linear time on optimized queries (such as “Find the row with ID = 5”).</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/db_index.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>We can think of a database index like an old-school library card catalog - it tells you precisely where the entry that you're searching for is located, as long as you already know the title and author of the book.  Database tables generally have multiple indices in order to speed up queries on specific fields (i.e. an index on the <code>name</code> column would greatly speed up queries for rows with a specific name).</p>
<p>Inverted indexes work in a substantially different manner.  The content of each row (or document) is split up, and each individual entry (in this case each word) points back to any documents that it was found within.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/invertedIndex.jpg" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>This inverted-index data structure allows us to very quickly find, say, all of the documents where “football” was mentioned.  Through the use of a heavily optimized in-memory inverted index, Elasticsearch enables us to perform some very powerful and customizable full-text searches on our stored data.</p>
<h2 id="2projectsetup">2 - Project Setup</h2>
<h3 id="20docker">2.0 - Docker</h3>
<p>We'll be using <a href="https://www.docker.com/">Docker</a> to manage the environments and dependencies for this project.  Docker is a containerization engine that allows applications to be run in isolated environments, unaffected by the host operating system and local development environment.  Many web-scale companies run a majority of their server infrastructure in containers now, due to the increased flexibility and composability of containerized application components.</p>
<p><img src="https://storage.googleapis.com/cdn.patricktriest.com/blog/images/posts/elastic-library/docker.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>The advantage of using Docker for me, as the friendly author of this tutorial, is that the local environment setup is minimal and consistent across Windows, macOS, and Linux systems.  Instead of going through divergent installation instructions for Node.js, Elasticsearch, and Nginx, we can instead just define these dependencies in Docker configuration files, and then run our app anywhere using this configuration.  Furthermore, since each application component will run in it's own isolated container, there is much less potential for existing junk on our local machines to interfere, so &quot;But it works on my machine!&quot; types of scenarios will be much more rare when debugging issues.</p>
<h3 id="21installdockerdockercompose">2.1 - Install Docker &amp; Docker-Compose</h3>
<p>The only dependencies for this project are <a href="https://www.docker.com/">Docker</a> and <a href="https://docs.docker.com/compose/">docker-compose</a>, the later of which is an officially supported tool for defining multiple container configurations to <em>compose</em> into a single application stack.</p>
<p>Install Docker - <a href="https://docs.docker.com/engine/installation/">https://docs.docker.com/engine/installation/</a><br>
Install Docker Compose - <a href="https://docs.docker.com/compose/install/">https://docs.docker.com/compose/install/</a></p>
<h3 id="22setupprojectdirectories">2.2 - Setup Project Directories</h3>
<p>Create a base directory  (say <code>guttenberg_search</code>) for the project. To organize our project we'll work within two main subdirectories.</p>
<ul>
<li><code>/public</code> - Store files for the frontend Vue.js webapp.</li>
<li><code>/server</code> - Server-side Node.js source code</li>
</ul>
<h3 id="23adddockercomposeconfig">2.3 - Add Docker-Compose Config</h3>
<p>Next, we'll create a <code>docker-compose.yml</code> file to define each container in our application stack.</p>
<ol>
<li><code>gs-api</code> - The Node.js container for the backend application logic.</li>
<li><code>gs-frontend</code> - An Ngnix container for serving the frontend webapp files.</li>
<li><code>gs-search</code> - An Elasticsearch container for storing and searching data.</li>
</ol>
<pre><code class="language-yaml">version: '3'

services:
  api: # Node.js App
    container_name: gs-api
    build: .
    ports:
      - &quot;3000:3000&quot; # Expose API port
      - &quot;9229:9229&quot; # Expose Node process debug port (disable in production)
    environment: # Set ENV vars
     - NODE_ENV=local
     - ES_HOST=elasticsearch
     - PORT=3000
    volumes: # Attach local book data directory
      - ./books:/usr/src/app/books

  frontend: # Nginx Server For Frontend App
    container_name: gs-frontend
    image: nginx
    volumes: # Serve local &quot;public&quot; dir
      - ./public:/usr/share/nginx/html
    ports:
      - &quot;8080:80&quot; # Forward site to localhost:8080

  elasticsearch: # Elasticsearch Instance
    container_name: gs-search
    image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
    volumes: # Persist ES data in seperate &quot;esdata&quot; volume
      - esdata:/usr/share/elasticsearch/data
    environment:
      - bootstrap.memory_lock=true
      - &quot;ES_JAVA_OPTS=-Xms512m -Xmx512m&quot;
      - discovery.type=single-node
    ports: # Expose Elasticsearch ports
      - &quot;9300:9300&quot;
      - &quot;9200:9200&quot;

volumes: # Define seperate volume for Elasticsearch data
  esdata:
</code></pre>
<br>
<p>This file defines our entire application stack - no need to install Elasticsearch, Node, or Nginx on your local system.  Each container is forwarding ports to the host system (<code>localhost</code>), in order for us to access and debug the Node API, Elasticsearch instance, and fronted web app from our host machine.</p>
<h3 id="24adddockerfile">2.4 - Add Dockerfile</h3>
<p>We are using official prebuilt images for Nginx and Elasticsearch, but we'll need to build our own image for the Node.js app.</p>
<p>Define a simple <code>Dockerfile</code> configuration in the application root directory.</p>
<pre><code class="language-docker"># Use Node v8.9.0 LTS
FROM node:carbon

# Setup app working directory
WORKDIR /usr/src/app

# Copy package.json and package-lock.json
COPY package*.json ./

# Install app dependencies
RUN npm install

# Copy sourcecode
COPY . .

# Start app
CMD [ &quot;npm&quot;, &quot;start&quot; ]
</code></pre>
<br>
<p>This Docker configuration extends the official Node.js image, copies our application source code, and installs the NPM dependencies within the container.</p>
<p>We'll also add a <code>.dockerignore</code> file to avoid copying unneeded files into the container.</p>
<pre><code class="language-text">node_modules/
npm-debug.log
books/
public/
</code></pre>
<br>
<blockquote>
<p>Note that we're not copying the <code>node_modules</code> directory into our container - this is because we'll be running <code>npm install</code> from within the container build process.  Attempting to copy the <code>node_modules</code> from the host system into a container can cause errors since some packages need to be specifically built for certain operating systems.  For instance, installing the <code>bcrypt</code> package on macOS and attempting to copy that module directly to an Ubuntu container will not work because <code>bcyrpt</code> relies on a binary that needs to be built specifically for each operating system.</p>
</blockquote>
<h3 id="25addbasefiles">2.5 - Add Base Files</h3>
<p>In order to test out the configuration, we'll need to add some placeholder files to the app directories.</p>
<p>Add this base HTML file at <code>public/index.html</code></p>
<pre><code class="language-html">&lt;html&gt;&lt;body&gt;Hello World From The Frontend Container&lt;/body&gt;&lt;/html&gt;
</code></pre>
<br>
<p>Next, add the placeholder Node.js app file at <code>server/app.js</code>.</p>
<pre><code class="language-javascript">const Koa = require('koa')
const app = new Koa()

app.use(async (ctx, next) =&gt; {
  ctx.body = 'Hello World From the Backend Container'
})

const port = process.env.PORT || 3000

app.listen(port, err =&gt; {
  if (err) console.error(err)
  console.log(`App Listening on Port ${port}`)
})
</code></pre>
<br>
<p>Finally, add our <code>package.json</code> Node app configuration.</p>
<pre><code class="language-json">{
  &quot;name&quot;: &quot;guttenberg-search&quot;,
  &quot;version&quot;: &quot;0.0.1&quot;,
  &quot;description&quot;: &quot;Source code for Elasticsearch tutorial using 100 classic open source books.&quot;,
  &quot;scripts&quot;: {
    &quot;start&quot;: &quot;node --inspect=0.0.0.0:9229 server/app.js&quot;
  },
  &quot;repository&quot;: {
    &quot;type&quot;: &quot;git&quot;,
    &quot;url&quot;: &quot;git+https://github.com/triestpa/guttenberg-search.git&quot;
  },
  &quot;author&quot;: &quot;patrick.triest@gmail.com&quot;,
  &quot;license&quot;: &quot;MIT&quot;,
  &quot;bugs&quot;: {
    &quot;url&quot;: &quot;https://github.com/triestpa/guttenberg-search/issues&quot;
  },
  &quot;homepage&quot;: &quot;https://github.com/triestpa/guttenberg-search#readme&quot;,
  &quot;dependencies&quot;: {
    &quot;elasticsearch&quot;: &quot;13.3.1&quot;,
    &quot;joi&quot;: &quot;13.0.1&quot;,
    &quot;koa&quot;: &quot;2.4.1&quot;,
    &quot;koa-joi-validate&quot;: &quot;0.5.1&quot;,
    &quot;koa-router&quot;: &quot;7.2.1&quot;
  }
}
</code></pre>
<br>
<p>This file defines the application start command and the Node.js package dependencies.</p>
<blockquote>
<p>Note - You don't have to run <code>npm install</code> - the dependencies will be installed inside the container when it is built.</p>
</blockquote>
<h3 id="26tryitout">2.6 - Try it Out</h3>
<p>Everything is in place now to test out each component of the app.  From the base directory, run <code>docker-compose build</code>, which will build our Node.js application container.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_0_3.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>Next, run <code>docker-compose up</code> to launch our entire application stack.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_0_2.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<blockquote>
<p>This step might take a few minutes since Docker has to download the base images for each container.  In subsequent runs, starting the app should be nearly instantaneous, since the required images will have already been downloaded.</p>
</blockquote>
<p>Try visiting <code>localhost:8080</code> in your browser - you should see a simple &quot;Hello World&quot; webpage.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_0_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>Visit <code>localhost:3000</code> to verify that our Node server returns it's own &quot;Hello World&quot; message.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_0_1.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>Finally, visit <code>localhost:9200</code> to check that Elasticsearch is running.  It should return information similar to this.</p>
<pre><code class="language-json">{
  &quot;name&quot; : &quot;SLTcfpI&quot;,
  &quot;cluster_name&quot; : &quot;docker-cluster&quot;,
  &quot;cluster_uuid&quot; : &quot;iId8e0ZeS_mgh9ALlWQ7-w&quot;,
  &quot;version&quot; : {
    &quot;number&quot; : &quot;6.1.1&quot;,
    &quot;build_hash&quot; : &quot;bd92e7f&quot;,
    &quot;build_date&quot; : &quot;2017-12-17T20:23:25.338Z&quot;,
    &quot;build_snapshot&quot; : false,
    &quot;lucene_version&quot; : &quot;7.1.0&quot;,
    &quot;minimum_wire_compatibility_version&quot; : &quot;5.6.0&quot;,
    &quot;minimum_index_compatibility_version&quot; : &quot;5.0.0&quot;
  },
  &quot;tagline&quot; : &quot;You Know, for Search&quot;
}
</code></pre>
<br>
<p>If all three URLs display data successfully, congrats!  The entire containerized stack is running, so now we can move on to the fun part.</p>
<h2 id="3connecttoelasticsearch">3 - Connect To Elasticsearch</h2>
<p>The first thing that we'll need to do in our app is connect to our local Elasticsearch instance.</p>
<h3 id="30addesconnectionmodule">3.0 - Add ES Connection Module</h3>
<p>Add the following Elasticsearch initialization code to a new file <code>server/connection.js</code>.</p>
<pre><code class="language-javascript">const elasticsearch = require('elasticsearch')

// Core ES variables for this project
const index = 'library'
const type = 'novel'
const port = 9200
const host = process.env.ES_HOST || 'localhost'
const client = new elasticsearch.Client({ host: { host, port } })

/** Check the ES connection status */
async function checkConnection () {
  let isConnected = false
  while (!isConnected) {
    console.log('Connecting to ES')
    try {
      const health = await client.cluster.health({})
      console.log(health)
      isConnected = true
    } catch (err) {
      console.log('Connection Failed, Retrying...', err)
    }
  }
}

checkConnection()
</code></pre>
<br>
<p>Let's rebuild our Node app now that we've made changes, using <code>docker-compose build</code>.  Next, run <code>docker-compose up -d</code> to start the application stack as a background daemon process.</p>
<p>With the app started, run <code>docker exec gs-api &quot;node&quot; &quot;server/connection.js&quot;</code> on the command line in order to run our script within the container.  You should see some system output similar to the following.</p>
<pre><code class="language-javascript">{ cluster_name: 'docker-cluster',
  status: 'yellow',
  timed_out: false,
  number_of_nodes: 1,
  number_of_data_nodes: 1,
  active_primary_shards: 1,
  active_shards: 1,
  relocating_shards: 0,
  initializing_shards: 0,
  unassigned_shards: 1,
  delayed_unassigned_shards: 0,
  number_of_pending_tasks: 0,
  number_of_in_flight_fetch: 0,
  task_max_waiting_in_queue_millis: 0,
  active_shards_percent_as_number: 50 }
</code></pre>
<br>
<p>Go ahead and remove the <code>checkConnection()</code> call at the bottom before moving on, since in our final app we'll be making that call from outside the connection module.</p>
<h3 id="31addhelperfunctiontoresetindex">3.1 - Add Helper Function To Reset Index</h3>
<p>In <code>server/connection.js</code> add the following function below <code>checkConnection</code>, in order to provide an easy way to reset our Elasticsearch index.</p>
<pre><code class="language-javascript">/** Clear the index, recreate it, and add mappings */
async function resetIndex () {
  if (await client.indices.exists({ index })) {
    await client.indices.delete({ index })
  }

  await client.indices.create({ index })
  await putBookMapping()
}
</code></pre>
<br>
<h3 id="32addbookschema">3.2 - Add Book Schema</h3>
<p>Next, we'll want to add a &quot;mapping&quot; for the book data schema.  Add the following function below <code>resetIndex</code> in <code>server/connection.js</code>.</p>
<pre><code class="language-javascript">/** Add book section schema mapping to ES */
async function putBookMapping () {
  const schema = {
    title: { type: 'keyword' },
    author: { type: 'keyword' },
    location: { type: 'integer' },
    text: { type: 'text' }
  }

  return client.indices.putMapping({ index, type, body: { properties: schema } })
}
</code></pre>
<br>
<p>Here we are defining a mapping for the <code>book</code> index.  An Elasticsearch <code>index</code> is roughly analogous to a SQL <code>table</code> or a MongoDB <code>collection</code>.  Adding a mapping allows us to specify each field and datatype for the stored documents.  Elasticsearch is schema-less, so we don't technically need to add a mapping, but doing so will give us more control over how the data is handled.</p>
<p>For instance - we're assigning the <code>keyword</code> type to the &quot;title&quot; and &quot;author&quot; fields, and the <code>text</code> type to the &quot;text&quot; field.  Doing so will cause the search engine to treat these string fields differently - During a search, the engine will search <em>within</em> the <code>text</code> field for potential matches, whereas <code>keyword</code> fields will be matched based on their full content.  This might seem like a minor distinction, but it can have a huge impact on the behavior and speed of different searches.</p>
<p>Export the exposed properties and functions at the bottom of the file, so that they can be accessed by other modules in our app.</p>
<pre><code class="language-javascript">module.exports = {
  client, index, type, checkConnection, resetIndex
}
</code></pre>
<br>
<h2 id="4loadtherawdata">4 - Load The Raw Data</h2>
<p>We'll be using data from <a href="http://www.gutenberg.org/">Project Gutenberg</a> - an online effort dedicated to providing free, digital copies of books within the public domain.  For this project, we'll be populating our library with 100 classic books, including texts such as <em>The Adventures of Sherlock Holmes</em>, <em>Treasure Island</em>, <em>The Count of Monte Cristo</em>, <em>Around the World in 80 Days</em>, <em>Romeo and Juliet</em>, and <em>The Odyssey</em>.</p>
<p><img src="https://storage.googleapis.com/cdn.patricktriest.com/blog/images/posts/elastic-library/books.jpg" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<h3 id="41downloadbookfiles">4.1 - Download Book Files</h3>
<p>I've zipped the 100 books into a file that you can download here -<br>
<a href="https://cdn.patricktriest.com/data/books.zip">https://cdn.patricktriest.com/data/books.zip</a></p>
<p>Extract this file into a <code>books/</code> directory in your project.</p>
<p>If you want, you can do this by using the following commands (requires <a href="https://www.gnu.org/software/wget/">wget</a> and <a href="https://theunarchiver.com/command-line">&quot;The Unarchiver&quot; CLI</a>).</p>
<pre><code class="language-bash">wget https://cdn.patricktriest.com/data/books.zip
unar books.zip
</code></pre>
<br>
<h3 id="42previewabook">4.2 - Preview A Book</h3>
<p>Try opening one of the book files, say <code>219-0.txt</code>.  You'll notice that it starts with an open access license, followed by some lines identifying the book title, author, release dates, language and character encoding.</p>
<pre><code class="language-txt">Title: Heart of Darkness

Author: Joseph Conrad

Release Date: February 1995 [EBook #219]
Last Updated: September 7, 2016

Language: English

Character set encoding: UTF-8
</code></pre>
<br>
<p>After these lines comes <code>*** START OF THIS PROJECT GUTENBERG EBOOK HEART OF DARKNESS ***</code>, after which the book content actually starts.</p>
<p>If you scroll to the end of the book you'll see the matching message <code>*** END OF THIS PROJECT GUTENBERG EBOOK HEART OF DARKNESS ***</code>, which is followed by a much more detailed version of the book's license.</p>
<p>In the next steps, we'll programmatically parse the book metadata from this header and extract the book content from between the <code>*** START OF</code> and <code>***END OF</code> place markers.</p>
<h3 id="43readdatadir">4.3 - Read Data Dir</h3>
<p>Let's write a script to read the content of each book and to add that data to Elasticsearch.  We'll define a new Javascript file <code>server/load_data.js</code> in order to perform these operations.</p>
<p>First, we'll obtain a list of every file within the <code>books/</code> data directory.</p>
<p>Add the following content to <code>server/load_data.js</code>.</p>
<pre><code class="language-javascript">const fs = require('fs')
const path = require('path')
const esConnection = require('./connection')

/** Clear ES index, parse and index all files from the books directory */
async function readAndInsertBooks () {
  try {
    // Clear previous ES index
    await esConnection.resetIndex()

    // Read books directory
    let files = fs.readdirSync('./books').filter(file =&gt; file.slice(-4) === '.txt')
    console.log(`Found ${files.length} Files`)

    // Read each book file, and index each paragraph in elasticsearch
    for (let file of files) {
      console.log(`Reading File - ${file}`)
      const filePath = path.join('./books', file)
      const { title, author, paragraphs } = parseBookFile(filePath)
      await insertBookData(title, author, paragraphs)
    }
  } catch (err) {
    console.error(err)
  }
}

readAndInsertBooks()
</code></pre>
<br>
<p>We'll use a shortcut command to rebuild our Node.js app and update the running container.</p>
<p>Run <code>docker-compose up -d --build</code> to update the application.  This is a shortcut for running <code>docker-compose build</code> and <code>docker-compose up -d</code>.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_1_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>Run<code>docker exec gs-api &quot;node&quot; &quot;server/load_data.js&quot;</code> in order to run our <code>load_data</code> script within the container.  You should see the Elasticsearch status output, followed by <code>Found 100 Books</code>.</p>
<p>After this, the script will exit due to an error because we're calling a helper function (<code>parseBookFile</code>) that we have not yet defined.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_1_1.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<h3 id="44readdatafile">4.4 - Read Data File</h3>
<p>Next, we'll read the metadata and content for each book.</p>
<p>Define a new function in <code>server/load_data.js</code>.</p>
<pre><code class="language-javascript">/** Read an individual book text file, and extract the title, author, and paragraphs */
function parseBookFile (filePath) {
  // Read text file
  const book = fs.readFileSync(filePath, 'utf8')

  // Find book title and author
  const title = book.match(/^Title:\s(.+)$/m)[1]
  const authorMatch = book.match(/^Author:\s(.+)$/m)
  const author = (!authorMatch || authorMatch[1].trim() === '') ? 'Unknown Author' : authorMatch[1]

  console.log(`Reading Book - ${title} By ${author}`)

  // Find Guttenberg metadata header and footer
  const startOfBookMatch = book.match(/^\*{3}\s*START OF (THIS|THE) PROJECT GUTENBERG EBOOK.+\*{3}$/m)
  const startOfBookIndex = startOfBookMatch.index + startOfBookMatch[0].length
  const endOfBookIndex = book.match(/^\*{3}\s*END OF (THIS|THE) PROJECT GUTENBERG EBOOK.+\*{3}$/m).index

  // Clean book text and split into array of paragraphs
  const paragraphs = book
    .slice(startOfBookIndex, endOfBookIndex) // Remove Guttenberg header and footer
    .split(/\n\s+\n/g) // Split each paragraph into it's own array entry
    .map(line =&gt; line.replace(/\r\n/g, ' ').trim()) // Remove paragraph line breaks and whitespace
    .map(line =&gt; line.replace(/_/g, '')) // Guttenberg uses &quot;_&quot; to signify italics.  We'll remove it, since it makes the raw text look messy.
    .filter((line) =&gt; (line &amp;&amp; line !== '')) // Remove empty lines

  console.log(`Parsed ${paragraphs.length} Paragraphs\n`)
  return { title, author, paragraphs }
}
</code></pre>
<br>
<p>This function performs a few important tasks.</p>
<ol>
<li>Read book text from the file system.</li>
<li>Use regular expressions (check out <a href="https://blog.patricktriest.com/you-should-learn-regex/">this post</a> for a primer on using regex) to parse the book title and author.</li>
<li>Identify the start and end of the book content, by matching on the all-caps &quot;Project Guttenberg&quot; header and footer.</li>
<li>Extract the book text content.</li>
<li>Split each paragraph into its own array.</li>
<li>Clean up the text and remove blank lines.</li>
</ol>
<p>As a return value, we'll form an object containing the book's title, author, and an array of paragraphs within the book.</p>
<p>Run <code>docker-compose up -d --build</code> and <code>docker exec gs-api &quot;node&quot; &quot;server/load_data.js&quot;</code> again, and you should see the same output as before, this time with three extra lines at the end of the output.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_2_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"><br>
<br></p>
<p>Success!  Our script successfully parsed the title and author from the text file.  The script will again end with an error since we still have to define one more helper function.</p>
<h3 id="45indexdatafileines">4.5 - Index Datafile in ES</h3>
<p>As a final step, we'll bulk-upload each array of paragraphs into the Elasticsearch index.</p>
<p>Add a new <code>insertBookData</code> function to <code>load_data.js</code>.</p>
<pre><code class="language-javascript">/** Bulk index the book data in Elasticsearch */
async function insertBookData (title, author, paragraphs) {
  let bulkOps = [] // Array to store bulk operations

  // Add an index operation for each section in the book
  for (let i = 0; i &lt; paragraphs.length; i++) {
    // Describe action
    bulkOps.push({ index: { _index: esConnection.index, _type: esConnection.type } })

    // Add document
    bulkOps.push({
      author,
      title,
      location: i,
      text: paragraphs[i]
    })

    if (i &gt; 0 &amp;&amp; i % 500 === 0) { // Do bulk insert in 500 paragraph batches
      await esConnection.client.bulk({ body: bulkOps })
      bulkOps = []
      console.log(`Indexed Paragraphs ${i - 499} - ${i}`)
    }
  }

  // Insert remainder of bulk ops array
  await esConnection.client.bulk({ body: bulkOps })
  console.log(`Indexed Paragraphs ${paragraphs.length - (bulkOps.length / 2)} - ${paragraphs.length}\n\n\n`)
}
</code></pre>
<br>
<p>This function will index each paragraph of the book, with author, title, and paragraph location metadata attached.  We are inserting the paragraphs using a bulk operation, which is much faster than indexing each paragraph individually.</p>
<blockquote>
<p>We're bulk indexing the paragraphs in batches, instead of inserting all of them at once.  This was a last minute optimization which I added in order for the app to run on the low-ish memory (1.7 GB) host machine that serves <code>search.patricktriest.com</code>.  If you have a reasonable amount of RAM (4+ GB), you probably don't need to worry about batching each bulk upload,</p>
</blockquote>
<p>Run <code>docker-compose up -d --build</code> and <code>docker exec gs-api &quot;node&quot; &quot;server/load_data.js&quot;</code> one more time - you should now see a full output of 100 books being parsed and inserted in Elasticsearch.  This might take a minute or so.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_3_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<h2 id="5search">5 - Search</h2>
<p>Now that Elasticsearch has been populated with one hundred books (amounting to roughly 230,000 paragraphs), let's try out some search queries.</p>
<h3 id="50simplehttpquery">5.0 - Simple HTTP Query</h3>
<p>First, let's just query Elasticsearch directly using it's HTTP API.</p>
<p>Visit this URL in your browser - <code>http://localhost:9200/library/_search?q=text:Java&amp;pretty</code></p>
<p>Here, we are performing a bare-bones full-text search to find the word &quot;Java&quot; within our library of books.</p>
<p>You should see a JSON response similar to the following.</p>
<pre><code class="language-json">{
  &quot;took&quot; : 11,
  &quot;timed_out&quot; : false,
  &quot;_shards&quot; : {
    &quot;total&quot; : 5,
    &quot;successful&quot; : 5,
    &quot;skipped&quot; : 0,
    &quot;failed&quot; : 0
  },
  &quot;hits&quot; : {
    &quot;total&quot; : 13,
    &quot;max_score&quot; : 14.259304,
    &quot;hits&quot; : [
      {
        &quot;_index&quot; : &quot;library&quot;,
        &quot;_type&quot; : &quot;novel&quot;,
        &quot;_id&quot; : &quot;p_GwFWEBaZvLlaAUdQgV&quot;,
        &quot;_score&quot; : 14.259304,
        &quot;_source&quot; : {
          &quot;author&quot; : &quot;Charles Darwin&quot;,
          &quot;title&quot; : &quot;On the Origin of Species&quot;,
          &quot;location&quot; : 1080,
          &quot;text&quot; : &quot;Java, plants of, 375.&quot;
        }
      },
      {
        &quot;_index&quot; : &quot;library&quot;,
        &quot;_type&quot; : &quot;novel&quot;,
        &quot;_id&quot; : &quot;wfKwFWEBaZvLlaAUkjfk&quot;,
        &quot;_score&quot; : 10.186235,
        &quot;_source&quot; : {
          &quot;author&quot; : &quot;Edgar Allan Poe&quot;,
          &quot;title&quot; : &quot;The Works of Edgar Allan Poe&quot;,
          &quot;location&quot; : 827,
          &quot;text&quot; : &quot;After many years spent in foreign travel, I sailed in the year 18-- , from the port of Batavia, in the rich and populous island of Java, on a voyage to the Archipelago of the Sunda islands. I went as passenger--having no other inducement than a kind of nervous restlessness which haunted me as a fiend.&quot;
        }
      },
      ...
    ]
  }
}
</code></pre>
<br>
<p>The Elasticseach HTTP interface is useful for testing that our data is inserted successfully, but exposing this API directly to the web app would be a huge security risk.  The API exposes administrative functionality (such as directly adding and deleting documents), and should ideally not ever be exposed publicly.  Instead, we'll write a simple Node.js API to receive requests from the client, and make the appropriate query (within our private local network) to Elasticsearch.</p>
<h3 id="51queryscript">5.1 - Query Script</h3>
<p>Let's now try querying Elasticsearch from our Node.js application.</p>
<p>Create a new file, <code>server/search.js</code>.</p>
<pre><code class="language-javascript">const { client, index, type } = require('./connection')

module.exports = {
  /** Query ES index for the provided term */
  queryTerm (term, offset = 0) {
    const body = {
      from: offset,
      query: { match: {
        text: {
          query: term,
          operator: 'and',
          fuzziness: 'auto'
        } } },
      highlight: { fields: { text: {} } }
    }

    return client.search({ index, type, body })
  }
}
</code></pre>
<br>
<p>Our search module defines a simple <code>search</code> function, which will perform a <code>match</code> query using the input term.</p>
<p>Here are query fields broken down -</p>
<ul>
<li><code>from</code> - Allows us to paginate the results.  Each query returns 10 results by default, so specifying <code>from: 10</code> would allow us to retrieve results 10-20.</li>
<li><code>query</code> - Where we specify the actual term that we are searching for.</li>
<li><code>operator</code> - We can modify the search behavior; in this case, we're using the &quot;and&quot; operator to prioritize results that contain all of the tokens (words) in the query.</li>
<li><code>fuzziness</code> - Adjusts tolerance for spelling mistakes, <code>auto</code> defaults to <code>fuzziness: 2</code>.  A higher fuzziness will allow for more corrections in result hits.  For instance, <code>fuzziness: 1</code> would allow <code>Patricc</code> to return <code>Patrick</code> as a match.</li>
<li><code>highlights</code> - Returns an extra field with the result, containing HTML to display the exact text subset and terms that were matched with the query.</li>
</ul>
<p>Feel free to play around with these parameters, and to customize the search query further by exploring the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html">Elastic Full-Text Query DSL</a>.</p>
<h2 id="6api">6 - API</h2>
<p>Let's write a quick HTTP API in order to access our search functionality from a frontend app.</p>
<h3 id="60apiserver">6.0 - API Server</h3>
<p>Replace our existing <code>server/app.js</code> file with the following contents.</p>
<pre><code class="language-javascript">const Koa = require('koa')
const Router = require('koa-router')
const joi = require('joi')
const validate = require('koa-joi-validate')
const search = require('./search')

const app = new Koa()
const router = new Router()

// Log each request to the console
app.use(async (ctx, next) =&gt; {
  const start = Date.now()
  await next()
  const ms = Date.now() - start
  console.log(`${ctx.method} ${ctx.url} - ${ms}`)
})

// Log percolated errors to the console
app.on('error', err =&gt; {
  console.error('Server Error', err)
})

// Set permissive CORS header
app.use(async (ctx, next) =&gt; {
  ctx.set('Access-Control-Allow-Origin', '*')
  return next()
})

// ADD ENDPOINTS HERE

const port = process.env.PORT || 3000

app
  .use(router.routes())
  .use(router.allowedMethods())
  .listen(port, err =&gt; {
    if (err) throw err
    console.log(`App Listening on Port ${port}`)
  })
</code></pre>
<br>
<p>This code will import our server dependencies and set up simple logging and error handling for a <a href="http://koajs.com/">Koa.js</a> Node API server.</p>
<h3 id="61linkendpointwithqueries">6.1 - Link endpoint with queries</h3>
<p>Next, we'll add an endpoint to our server in order to expose our Elasticsearch query function.</p>
<p>Insert the following code below the <code>// ADD ENDPOINTS HERE</code> comment in <code>server/app.js</code>.</p>
<pre><code class="language-javascript">/**
 * GET /search
 * Search for a term in the library
 */
router.get('/search', async (ctx, next) =&gt; {
    const { term, offset } = ctx.request.query
    ctx.body = await search.queryTerm(term, offset)
  }
)
</code></pre>
<br>
<p>Restart the app using <code>docker-compose up -d --build</code>.  In your browser, try calling the search endpoint.  For example, this request would search the entire library for passages mentioning &quot;Java&quot; - <code>http://localhost:3000/search?term=java</code></p>
<p>The result will look quite similar to the response from earlier when we called the Elasticsearch HTTP interface directly.</p>
<pre><code class="language-json">{
    &quot;took&quot;: 242,
    &quot;timed_out&quot;: false,
    &quot;_shards&quot;: {
        &quot;total&quot;: 5,
        &quot;successful&quot;: 5,
        &quot;skipped&quot;: 0,
        &quot;failed&quot;: 0
    },
    &quot;hits&quot;: {
        &quot;total&quot;: 93,
        &quot;max_score&quot;: 13.356944,
        &quot;hits&quot;: [{
            &quot;_index&quot;: &quot;library&quot;,
            &quot;_type&quot;: &quot;novel&quot;,
            &quot;_id&quot;: &quot;eHYHJmEBpQg9B4622421&quot;,
            &quot;_score&quot;: 13.356944,
            &quot;_source&quot;: {
                &quot;author&quot;: &quot;Charles Darwin&quot;,
                &quot;title&quot;: &quot;On the Origin of Species&quot;,
                &quot;location&quot;: 1080,
                &quot;text&quot;: &quot;Java, plants of, 375.&quot;
            },
            &quot;highlight&quot;: {
                &quot;text&quot;: [&quot;&lt;em&gt;Java&lt;/em&gt;, plants of, 375.&quot;]
            }
        }, {
            &quot;_index&quot;: &quot;library&quot;,
            &quot;_type&quot;: &quot;novel&quot;,
            &quot;_id&quot;: &quot;2HUHJmEBpQg9B462xdNg&quot;,
            &quot;_score&quot;: 9.030668,
            &quot;_source&quot;: {
                &quot;author&quot;: &quot;Unknown Author&quot;,
                &quot;title&quot;: &quot;The King James Bible&quot;,
                &quot;location&quot;: 186,
                &quot;text&quot;: &quot;10:4 And the sons of Javan; Elishah, and Tarshish, Kittim, and Dodanim.&quot;
            },
            &quot;highlight&quot;: {
                &quot;text&quot;: [&quot;10:4 And the sons of &lt;em&gt;Javan&lt;/em&gt;; Elishah, and Tarshish, Kittim, and Dodanim.&quot;]
            }
        }
        ...
      ]
   }
}
</code></pre>
<br>
<h3 id="62inputvalidation">6.2 - Input validation</h3>
<p>This endpoint is still brittle - we are not doing any checks on the request parameters, so invalid or missing values would result in a server error.</p>
<p>We'll add some middleware to the endpoint in order to validate input parameters using <a href="https://github.com/hapijs/joi">Joi</a> and the <a href="https://github.com/triestpa/koa-joi-validate">Koa-Joi-Validate</a> library.</p>
<pre><code class="language-javascript">/**
 * GET /search
 * Search for a term in the library
 * Query Params -
 * term: string under 60 characters
 * offset: positive integer
 */
router.get('/search',
  validate({
    query: {
      term: joi.string().max(60).required(),
      offset: joi.number().integer().min(0).default(0)
    }
  }),
  async (ctx, next) =&gt; {
    const { term, offset } = ctx.request.query
    ctx.body = await search.queryTerm(term, offset)
  }
)
</code></pre>
<p>Now, if you restart the server and make a request with a missing term(<code>http://localhost:3000/search</code>), you will get back an HTTP 400 error with a relevant message, such as <code>Invalid URL Query - child &quot;term&quot; fails because [&quot;term&quot; is required]</code>.</p>
<p>To view live logs from the Node app, you can run <code>docker-compose logs -f api</code>.</p>
<h2 id="7frontendapplication">7 - Front-End Application</h2>
<p>Now that our <code>/search</code> endpoint is in place, let's wire up a simple web app to test out the API.</p>
<h3 id="70vuejsapp">7.0 - Vue.js App</h3>
<p>We'll be using Vue.js to coordinate our frontend.</p>
<p>Add a new file, <code>/public/app.js</code>, to hold our Vue.js application code.</p>
<pre><code class="language-javascript">const vm = new Vue ({
  el: '#vue-instance',
  data () {
    return {
      baseUrl: 'http://localhost:3000', // API url
      searchTerm: 'Hello World', // Default search term
      searchDebounce: null, // Timeout for search bar debounce
      searchResults: [], // Displayed search results
      numHits: null, // Total search results found
      searchOffset: 0, // Search result pagination offset

      selectedParagraph: null, // Selected paragraph object
      bookOffset: 0, // Offset for book paragraphs being displayed
      paragraphs: [] // Paragraphs being displayed in book preview window
    }
  },
  async created () {
    this.searchResults = await this.search() // Search for default term
  },
  methods: {
    /** Debounce search input by 100 ms */
    onSearchInput () {
      clearTimeout(this.searchDebounce)
      this.searchDebounce = setTimeout(async () =&gt; {
        this.searchOffset = 0
        this.searchResults = await this.search()
      }, 100)
    },
    /** Call API to search for inputted term */
    async search () {
      const response = await axios.get(`${this.baseUrl}/search`, { params: { term: this.searchTerm, offset: this.searchOffset } })
      this.numHits = response.data.hits.total
      return response.data.hits.hits
    },
    /** Get next page of search results */
    async nextResultsPage () {
      if (this.numHits &gt; 10) {
        this.searchOffset += 10
        if (this.searchOffset + 10 &gt; this.numHits) { this.searchOffset = this.numHits - 10}
        this.searchResults = await this.search()
        document.documentElement.scrollTop = 0
      }
    },
    /** Get previous page of search results */
    async prevResultsPage () {
      this.searchOffset -= 10
      if (this.searchOffset &lt; 0) { this.searchOffset = 0 }
      this.searchResults = await this.search()
      document.documentElement.scrollTop = 0
    }
  }
})
</code></pre>
<br>
<p>The app is pretty simple - we're just defining some shared data properties, and adding methods to retrieve and paginate through search results.  The search input is debounced by 100ms, to prevent the API from being called with every keystroke.</p>
<p>Explaining how Vue.js works is outside the scope of this tutorial, but this probably won't look too crazy if you've used Angular or React.  If you're completely unfamiliar with Vue, and if you want something quick to get started with, I would recommend the official quick-start guide - <a href="https://vuejs.org/v2/guide/">https://vuejs.org/v2/guide/</a></p>
<h3 id="71html">7.1 - HTML</h3>
<p>Replace our placeholder <code>/public/index.html</code> file with the following contents, in order to load our Vue.js app and to layout a basic search interface.</p>
<pre><code class="language-html">&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
&lt;head&gt;
  &lt;meta charset=&quot;utf-8&quot;&gt;
  &lt;title&gt;Elastic Library&lt;/title&gt;
  &lt;meta name=&quot;description&quot; content=&quot;Literary Classic Search Engine.&quot;&gt;
  &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no&quot;&gt;
  &lt;link href=&quot;https://cdnjs.cloudflare.com/ajax/libs/normalize/7.0.0/normalize.min.css&quot; rel=&quot;stylesheet&quot; type=&quot;text/css&quot; /&gt;
  &lt;link href=&quot;https://cdn.muicss.com/mui-0.9.20/css/mui.min.css&quot; rel=&quot;stylesheet&quot; type=&quot;text/css&quot; /&gt;
  &lt;link href=&quot;https://fonts.googleapis.com/css?family=EB+Garamond:400,700|Open+Sans&quot; rel=&quot;stylesheet&quot;&gt;
  &lt;link href=&quot;styles.css&quot; rel=&quot;stylesheet&quot; /&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;div class=&quot;app-container&quot; id=&quot;vue-instance&quot;&gt;
    &lt;!-- Search Bar Header --&gt;
    &lt;div class=&quot;mui-panel&quot;&gt;
      &lt;div class=&quot;mui-textfield&quot;&gt;
        &lt;input v-model=&quot;searchTerm&quot; type=&quot;text&quot; v-on:keyup=&quot;onSearchInput()&quot;&gt;
        &lt;label&gt;Search&lt;/label&gt;
      &lt;/div&gt;
    &lt;/div&gt;

    &lt;!-- Search Metadata Card --&gt;
    &lt;div class=&quot;mui-panel&quot;&gt;
      &lt;div class=&quot;mui--text-headline&quot;&gt;{{ numHits }} Hits&lt;/div&gt;
      &lt;div class=&quot;mui--text-subhead&quot;&gt;Displaying Results {{ searchOffset }} - {{ searchOffset + 9 }}&lt;/div&gt;
    &lt;/div&gt;

    &lt;!-- Top Pagination Card --&gt;
    &lt;div class=&quot;mui-panel pagination-panel&quot;&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;prevResultsPage()&quot;&gt;Prev Page&lt;/button&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;nextResultsPage()&quot;&gt;Next Page&lt;/button&gt;
    &lt;/div&gt;

    &lt;!-- Search Results Card List --&gt;
    &lt;div class=&quot;search-results&quot; ref=&quot;searchResults&quot;&gt;
      &lt;div class=&quot;mui-panel&quot; v-for=&quot;hit in searchResults&quot; v-on:click=&quot;showBookModal(hit)&quot;&gt;
        &lt;div class=&quot;mui--text-title&quot; v-html=&quot;hit.highlight.text[0]&quot;&gt;&lt;/div&gt;
        &lt;div class=&quot;mui-divider&quot;&gt;&lt;/div&gt;
        &lt;div class=&quot;mui--text-subhead&quot;&gt;{{ hit._source.title }} - {{ hit._source.author }}&lt;/div&gt;
        &lt;div class=&quot;mui--text-body2&quot;&gt;Location {{ hit._source.location }}&lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;

    &lt;!-- Bottom Pagination Card --&gt;
    &lt;div class=&quot;mui-panel pagination-panel&quot;&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;prevResultsPage()&quot;&gt;Prev Page&lt;/button&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;nextResultsPage()&quot;&gt;Next Page&lt;/button&gt;
    &lt;/div&gt;

    &lt;!-- INSERT BOOK MODAL HERE --&gt;
&lt;/div&gt;
&lt;script src=&quot;https://cdn.muicss.com/mui-0.9.28/js/mui.min.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;https://cdnjs.cloudflare.com/ajax/libs/vue/2.5.3/vue.min.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;https://cdnjs.cloudflare.com/ajax/libs/axios/0.17.0/axios.min.js&quot;&gt;&lt;/script&gt;
&lt;script src=&quot;app.js&quot;&gt;&lt;/script&gt;
&lt;/body&gt;
&lt;/html&gt;
</code></pre>
<br>
<h3 id="72css">7.2 - CSS</h3>
<p>Add a new file, <code>/public/styles.css</code>, with some custom UI styling.</p>
<pre><code class="language-css">body { font-family: 'EB Garamond', serif; }

.mui-textfield &gt; input, .mui-btn, .mui--text-subhead, .mui-panel &gt; .mui--text-headline {
  font-family: 'Open Sans', sans-serif;
}

.all-caps { text-transform: uppercase; }
.app-container { padding: 16px; }
.search-results em { font-weight: bold; }
.book-modal &gt; button { width: 100%; }
.search-results .mui-divider { margin: 14px 0; }

.search-results {
  display: flex;
  flex-direction: row;
  flex-wrap: wrap;
  justify-content: space-around;
}

.search-results &gt; div {
  flex-basis: 45%;
  box-sizing: border-box;
  cursor: pointer;
}

@media (max-width: 600px) {
  .search-results &gt; div { flex-basis: 100%; }
}

.paragraphs-container {
  max-width: 800px;
  margin: 0 auto;
  margin-bottom: 48px;
}

.paragraphs-container .mui--text-body1, .paragraphs-container .mui--text-body2 {
  font-size: 1.8rem;
  line-height: 35px;
}

.book-modal {
  width: 100%;
  height: 100%;
  padding: 40px 10%;
  box-sizing: border-box;
  margin: 0 auto;
  background-color: white;
  overflow-y: scroll;
  position: fixed;
  top: 0;
  left: 0;
}

.pagination-panel {
  display: flex;
  justify-content: space-between;
}

.title-row {
  display: flex;
  justify-content: space-between;
  align-items: flex-end;
}

@media (max-width: 600px) {
  .title-row{ 
    flex-direction: column; 
    text-align: center;
    align-items: center
  }
}

.locations-label {
  text-align: center;
  margin: 8px;
}

.modal-footer {
  position: fixed;
  bottom: 0;
  left: 0;
  width: 100%;
  display: flex;
  justify-content: space-around;
  background: white;
}
</code></pre>
<br>
<h3 id="73tryitout">7.3 - Try it out</h3>
<p>Open <code>localhost:8080</code> in your web browser, you should see a simple search interface with paginated results.  Try typing in the top search bar to find matches from different terms.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_4_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<blockquote>
<p>You <em>do not</em> have to re-run the <code>docker-compose up</code> command for the changes to take effect.  The local <code>public</code> directory is mounted to our Nginx fileserver container, so frontend changes on the local system will be automatically reflected in the containerized app.</p>
</blockquote>
<p>If you try clicking on any result, nothing happens - we still have one more feature to add to the app.</p>
<h2 id="8pagepreviews">8 - Page Previews</h2>
<p>It would be nice to be able to click on each search result and view it in the context of the book that it's from.</p>
<h3 id="80addelasticsearchquery">8.0 - Add Elasticsearch Query</h3>
<p>First, we'll need to define a simple query to get a range of paragraphs from a given book.</p>
<p>Add the following function to the <code>module.exports</code> block in <code>server/search.js</code>.</p>
<pre><code class="language-javascript">/** Get the specified range of paragraphs from a book */
getParagraphs (bookTitle, startLocation, endLocation) {
  const filter = [
    { term: { title: bookTitle } },
    { range: { location: { gte: startLocation, lte: endLocation } } }
  ]

  const body = {
    size: endLocation - startLocation,
    sort: { location: 'asc' },
    query: { bool: { filter } }
  }

  return client.search({ index, type, body })
}
</code></pre>
<br>
<p>This new function will return an ordered array of paragraphs between the start and end locations of a given book.</p>
<h3 id="81addapiendpoint">8.1 - Add API Endpoint</h3>
<p>Now, let's link this function to an API endpoint.</p>
<p>Add the following to <code>server/app.js</code>, below the original <code>/search</code> endpoint.</p>
<pre><code class="language-javascript">/**
 * GET /paragraphs
 * Get a range of paragraphs from the specified book
 * Query Params -
 * bookTitle: string under 256 characters
 * start: positive integer
 * end: positive integer greater than start
 */
router.get('/paragraphs',
  validate({
    query: {
      bookTitle: joi.string().max(256).required(),
      start: joi.number().integer().min(0).default(0),
      end: joi.number().integer().greater(joi.ref('start')).default(10)
    }
  }),
  async (ctx, next) =&gt; {
    const { bookTitle, start, end } = ctx.request.query
    ctx.body = await search.getParagraphs(bookTitle, start, end)
  }
)
</code></pre>
<br>
<h3 id="82adduifunctionality">8.2 - Add UI functionality</h3>
<p>Now that our new endpoint is in place, let's add some frontend functionality to query and display full pages from the book.</p>
<p>Add the following functions to the <code>methods</code> block of <code>/public/app.js</code>.</p>
<pre><code class="language-javascript">    /** Call the API to get current page of paragraphs */
    async getParagraphs (bookTitle, offset) {
      try {
        this.bookOffset = offset
        const start = this.bookOffset
        const end = this.bookOffset + 10
        const response = await axios.get(`${this.baseUrl}/paragraphs`, { params: { bookTitle, start, end } })
        return response.data.hits.hits
      } catch (err) {
        console.error(err)
      }
    },
    /** Get next page (next 10 paragraphs) of selected book */
    async nextBookPage () {
      this.$refs.bookModal.scrollTop = 0
      this.paragraphs = await this.getParagraphs(this.selectedParagraph._source.title, this.bookOffset + 10)
    },
    /** Get previous page (previous 10 paragraphs) of selected book */
    async prevBookPage () {
      this.$refs.bookModal.scrollTop = 0
      this.paragraphs = await this.getParagraphs(this.selectedParagraph._source.title, this.bookOffset - 10)
    },
    /** Display paragraphs from selected book in modal window */
    async showBookModal (searchHit) {
      try {
        document.body.style.overflow = 'hidden'
        this.selectedParagraph = searchHit
        this.paragraphs = await this.getParagraphs(searchHit._source.title, searchHit._source.location - 5)
      } catch (err) {
        console.error(err)
      }
    },
    /** Close the book detail modal */
    closeBookModal () {
      document.body.style.overflow = 'auto'
      this.selectedParagraph = null
    }
</code></pre>
<br>
<p>These five functions provide the logic for downloading and paginating through pages (ten paragraphs each) in a book.</p>
<p>Now we just need to add a UI to display the book pages.  Add this markup below the <code>&lt;!-- INSERT BOOK MODAL HERE --&gt;</code> comment in <code>/public/index.html</code>.</p>
<pre><code class="language-html">    &lt;!-- Book Paragraphs Modal Window --&gt;
    &lt;div v-if=&quot;selectedParagraph&quot; ref=&quot;bookModal&quot; class=&quot;book-modal&quot;&gt;
      &lt;div class=&quot;paragraphs-container&quot;&gt;
        &lt;!-- Book Section Metadata --&gt;
        &lt;div class=&quot;title-row&quot;&gt;
          &lt;div class=&quot;mui--text-display2 all-caps&quot;&gt;{{ selectedParagraph._source.title }}&lt;/div&gt;
          &lt;div class=&quot;mui--text-display1&quot;&gt;{{ selectedParagraph._source.author }}&lt;/div&gt;
        &lt;/div&gt;
        &lt;br&gt;
        &lt;div class=&quot;mui-divider&quot;&gt;&lt;/div&gt;
        &lt;div class=&quot;mui--text-subhead locations-label&quot;&gt;Locations {{ bookOffset - 5 }} to {{ bookOffset + 5 }}&lt;/div&gt;
        &lt;div class=&quot;mui-divider&quot;&gt;&lt;/div&gt;
        &lt;br&gt;

        &lt;!-- Book Paragraphs --&gt;
        &lt;div v-for=&quot;paragraph in paragraphs&quot;&gt;
          &lt;div v-if=&quot;paragraph._source.location === selectedParagraph._source.location&quot; class=&quot;mui--text-body2&quot;&gt;
            &lt;strong&gt;{{ paragraph._source.text }}&lt;/strong&gt;
          &lt;/div&gt;
          &lt;div v-else class=&quot;mui--text-body1&quot;&gt;
            {{ paragraph._source.text }}
          &lt;/div&gt;
          &lt;br&gt;
        &lt;/div&gt;
      &lt;/div&gt;

      &lt;!-- Book Pagination Footer --&gt;
      &lt;div class=&quot;modal-footer&quot;&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;prevBookPage()&quot;&gt;Prev Page&lt;/button&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;closeBookModal()&quot;&gt;Close&lt;/button&gt;
        &lt;button class=&quot;mui-btn mui-btn--flat&quot; v-on:click=&quot;nextBookPage()&quot;&gt;Next Page&lt;/button&gt;
      &lt;/div&gt;
    &lt;/div&gt;
</code></pre>
<br>
<p>Restart the app server (<code>docker-compose up -d --build</code>) again and open up <code>localhost:8080</code>.  When you click on a search result, you are now able to view the surrounding paragraphs.  You can now even read the rest of the book to completion if you're entertained by what you find.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/elastic-library/sample_5_0.png" alt="Building a Full-Text Search App Using Docker and Elasticsearch"></p>
<p>Congrats, you've completed the tutorial application!</p>
<p>Feel free to compare your local result against the completed sample hosted here - <a href="https://search.patricktriest.com/">https://search.patricktriest.com/</a></p>
<h2 id="9disadvantagesofelasticsearch">9 - Disadvantages of Elasticsearch</h2>
<h3 id="90resourcehog">9.0 - Resource Hog</h3>
<p>Elasticsearch is computationally demanding.  The <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html">official recommendation</a> is to run ES on a machine with 64 GB of RAM, and they strongly discourage running it on anything with under 8 GB of RAM.  Elasticsearch is an <em>in-memory</em> datastore, which allows it to return results extremely quickly, but also results in a very significant system memory footprint.  In production, <a href="https://www.elastic.co/guide/en/elasticsearch/guide/2.x/distributed-cluster.html">it is strongly recommended to run multiple Elasticsearch nodes in a cluster</a>  to allow for high server availability, automatic sharding, and data redundancy in case of a node failure.</p>
<p>I've got our tutorial application running on a $15/month GCP compute instance (at <a href="https://search.patricktriest.com">search.patricktriest.com</a>) with 1.7 GB of RAM, and it <em>just barely</em> is able to run the Elasticsearch node; sometimes the entire machine freezes up during the initial data-loading step.  Elasticsearch is, in my experience, much more of a resource hog than more traditional databases such as PostgreSQL and MongoDB, and can be significantly more expensive to host as a result.</p>
<h3 id="91syncingwithdatabases">9.1 - Syncing with Databases</h3>
<p>In most applications, storing all of the data in Elasticsearch is not an ideal option.  It is possible to use ES as the primary transactional database for an app, but this is generally not recommended due to the lack of ACID compliance in Elasticsearch, which can lead to lost write operations when ingesting data at scale.  In many cases, ES serves a more specialized role, such as powering the text searching features of the app.  This specialized use requires that some of the data from the primary database is replicated to the Elasticsearch instance.</p>
<p>For instance, let's imagine that we're storing our users in a PostgreSQL table, but using Elasticsearch to power our user-search functionality.  If a user, &quot;Albert&quot;, decides to change his name to &quot;Al&quot;,  we'll need this change to be reflected in both our primary PostgreSQL database and in our auxiliary Elasticsearch cluster.</p>
<p>This can be a tricky integration to get right, and the best answer will depend on your existing stack.  There are a multitude of open-source options available, from <a href="https://github.com/mongodb-labs/mongo-connector">a process to watch a MongoDB operation log</a> and automatically sync detected changes to ES, to a <a href="https://github.com/zombodb/zombodb">PostgresSQL plugin</a> to create a custom PSQL-based index that communicates automatically with Elasticsearch.</p>
<p>If none of the available pre-built options work, you could always just add some hooks into your server code to update the Elasticsearch index manually based on database changes.  I would consider this final option to be a last resort, since keeping ES in sync using custom business logic can be complex, and is likely to introduce numerous bugs to the application.</p>
<p>The need to sync Elasticsearch with a primary database is more of an architectural complexity than it is a specific weakness of ES, but it's certainly worth keeping in mind when considering the tradeoffs of adding a dedicated search engine to your app.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Full-text search is one of the most important features in many modern applications - and is one of the most difficult to implement well.  Elasticsearch is a fantastic option for adding fast and customizable text search to your application, but there are alternatives.  <a href="http://lucene.apache.org/solr/">Apache Solr</a> is a similar open source search platform that is built on Apache Lucene - the same library at the core of Elasticsearch.  <a href="https://www.algolia.com/">Algolia</a> is a search-as-a-service web platform which is growing quickly in popularity and is likely to be easier to get started with for beginners (but as a tradeoff is less customizable and can get quite expensive).</p>
<p>&quot;Search-bar&quot; style features are far from the only use-case for Elasticsearch.  ES is also a very common tool for log storage and analysis, commonly used in an ELK (Elasticsearch, Logstash, Kibana) stack configuration.  The flexible full-text search allowed by Elasticsearch can also be very useful for a wide variety of data science tasks - such as correcting/standardizing the spellings of entities within a dataset or searching a large text dataset for similar phrases.</p>
<p>Here are some ideas for your own projects.</p>
<ul>
<li>Add more of your favorite books to our tutorial app and create your own private library search engine.</li>
<li>Create an academic plagiarism detection engine by indexing papers from <a href="https://scholar.google.com/">Google Scholar</a>.</li>
<li>Build a spell checking application by indexing every word in the dictionary to Elasticsearch.</li>
<li>Build your own Google-competitor internet search engine by loading the <a href="https://aws.amazon.com/public-datasets/common-crawl/">Common Crawl Corpus</a> into Elasticsearch (caution - with over 5 billion pages, this can be a very expensive dataset play with).</li>
<li>Use Elasticsearch for journalism: search for specific names and terms in recent large-scale document leaks such as the <a href="https://en.wikipedia.org/wiki/Panama_Papers">Panama Papers</a> and <a href="https://en.wikipedia.org/wiki/Paradise_Papers">Paradise Papers</a>.</li>
</ul>
<p>The source code for this tutorial application is 100% open-source and can be found at the GitHub repository here - <a href="https://github.com/triestpa/guttenberg-search">https://github.com/triestpa/guttenberg-search</a></p>
<p>I hope you enjoyed the tutorial!  Please feel free to post any thoughts, questions, or criticisms in the comments below.</p>
</div>]]></content:encoded></item><item><title><![CDATA[An Introduction To Utilizing Public-Key Cryptography In Javascript]]></title><description><![CDATA[Build an end-to-end RSA-2048  encrypted messaging app using Socket.io and Vue.js.]]></description><link>http://blog.patricktriest.com/building-an-encrypted-messenger-with-javascript/</link><guid isPermaLink="false">598eaf93b7d6af1a6a795fcd</guid><category><![CDATA[Javascript]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Sun, 10 Dec 2017 12:00:00 GMT</pubDate><media:content url="https://blog-images.patricktriest.com/uploads/Matrix.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h2 id="opencryptochatatutorial">Open Cryptochat - A Tutorial</h2>
<img src="https://blog-images.patricktriest.com/uploads/Matrix.jpg" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"><p>Cryptography is important.  Without encryption, the internet as we know it would not be possible - data sent online would be as vulnerable to interception as a message shouted across a crowded room.  Cryptography is also a major topic in current events, increasingly playing a central role in <a href="https://en.wikipedia.org/wiki/FBI%E2%80%93Apple_encryption_dispute">law enforcement investigations</a> and <a href="https://www.politico.com/tipsheets/morning-cybersecurity/2017/11/10/texas-shooting-could-revive-encryption-legislation-223290">government legislation</a>.</p>
<p>Encryption is an invaluable tool for journalists, activists, nation-states, businesses, and everyday people who need to protect their data from the ever-present threat of hackers, spies, and advertising agencies.</p>
<p>An understanding of how to utilize strong encryption is essential for modern software development.  We will not be delving much into the underlying math and theory of cryptography for this tutorial; instead, the focus will be on how to harness these techniques for your own applications.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_5.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<p>In this tutorial, we will walk through the basic concepts and implementation of an end-to-end 2048-bit <a href="https://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA encrypted</a> messenger. We'll be utilizing <a href="https://vuejs.org/">Vue.js</a> for coordinating the frontend functionality along with a <a href="https://nodejs.org/en/">Node.js</a> backend using <a href="https://socket.io/">Socket.io</a> for sending messages between users.</p>
<ul>
<li>Live Preview - <a href="https://chat.patricktriest.com">https://chat.patricktriest.com</a></li>
<li>Github Repository - <a href="https://github.com/triestpa/Open-Cryptochat">https://github.com/triestpa/Open-Cryptochat</a></li>
</ul>
<p>The concepts that we are covering in this tutorial are implemented in Javascript and are mostly intended to be platform-agnostic.  We will be building a traditional browser-based web app, but you can adapt this code to work within a pre-built desktop (using <a href="https://electronjs.org/">Electron</a>) or mobile ( <a href="https://facebook.github.io/react-native/">React Native</a>, <a href="https://ionicframework.com/">Ionic</a>, <a href="https://cordova.apache.org/">Cordova</a>) application binary if you are concerned about browser-based application security.<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>  Likewise, implementing similar functionality in another programming language should be relatively straightforward since most languages have reputable open-source encryption libraries available; the base syntax will change but the core concepts remain universal.</p>
<blockquote>
<p>Disclaimer - This is meant to be a primer in end-to-end encryption implementation, not a definitive guide to building the Fort Knox of browser chat applications. I've worked to provide useful information on adding cryptography to your Javascript applications, but I cannot 100% guarantee the security of the resulting app.  There's a lot that can go wrong at all stages of the process, especially at the stages not covered by this tutorial such as setting up web hosting and securing the server(s).  If you are a security expert, and you find vulnerabilities in the tutorial code, please feel free to reach out to me by email (<a href="mailto:patrick.triest@gmail.com">patrick.triest@gmail.com</a>) or in the comments section below.</p>
</blockquote>
<h2 id="1projectsetup">1 - Project Setup</h2>
<h3 id="10installdependencies">1.0 - Install Dependencies</h3>
<p>You'll need to have <a href="https://nodejs.org/en/">Node.js</a> (version 6 or higher) installed in order to run the backend for this app.</p>
<p>Create an empty directory for the project and add a <code>package.json</code> file with the following contents.</p>
<pre><code class="language-json">{
  &quot;name&quot;: &quot;open-cryptochat&quot;,
  &quot;version&quot;: &quot;1.0.0&quot;,
  &quot;node&quot;:&quot;8.1.4&quot;,
  &quot;license&quot;: &quot;MIT&quot;,
  &quot;author&quot;: &quot;patrick.triest@gmail.com&quot;,
  &quot;description&quot;: &quot;End-to-end RSA-2048 encrypted chat application.&quot;,
  &quot;main&quot;: &quot;app.js&quot;,
  &quot;engines&quot;: {
    &quot;node&quot;: &quot;&gt;=7.6&quot;
  },
  &quot;scripts&quot;: {
    &quot;start&quot;: &quot;node app.js&quot;
  },
  &quot;dependencies&quot;: {
    &quot;express&quot;: &quot;4.15.3&quot;,
    &quot;socket.io&quot;: &quot;2.0.3&quot;
  }
}
</code></pre>
<br>
<p>Run <code>npm install</code> on the command line to install the two Node.js dependencies.</p>
<h3 id="11createnodejsapp">1.1 - Create Node.js App</h3>
<p>Create a file called <code>app.js</code>, and add the following contents.</p>
<pre><code class="language-javascript">const express = require('express')

// Setup Express server
const app = express()
const http = require('http').Server(app)

// Attach Socket.io to server
const io = require('socket.io')(http)

// Serve web app directory
app.use(express.static('public'))

// INSERT SOCKET.IO CODE HERE

// Start server
const port = process.env.PORT || 3000
http.listen(port, () =&gt; {
  console.log(`Chat server listening on port ${port}.`)
})
</code></pre>
<br>
<p>This is the core server logic.  Right now, all it will do is start a server and make all of the files in the local <code>/public</code> directory accessible to web clients.</p>
<blockquote>
<p>In production, I would strongly recommend serving your frontend code separately from the Node.js app, using battle-hardened server software such <a href="https://httpd.apache.org/">Apache</a> and <a href="https://www.nginx.com/">Nginx</a>, or hosting the website on file storage service such as <a href="https://aws.amazon.com/s3/">AWS S3</a>.  For this tutorial, however, using the Express static file server is the simplest way to get the app running.</p>
</blockquote>
<h3 id="12addfrontend">1.2 - Add Frontend</h3>
<p>Create a new directory called <code>public</code>.  This is where we'll put all of the frontend web app code.</p>
<h5 id="120addhtmltemplate">1.2.0 - Add HTML Template</h5>
<p>Create a new file, <code>/public/index.html</code>, and add these contents.</p>
<pre><code class="language-html">&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
  &lt;head&gt;
    &lt;meta charset=&quot;utf-8&quot;&gt;
    &lt;title&gt;Open Cryptochat&lt;/title&gt;
    &lt;meta name=&quot;description&quot; content=&quot;A minimalist, open-source, end-to-end RSA-2048 encrypted chat application.&quot;&gt;
    &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no&quot;&gt;
    &lt;link href=&quot;https://fonts.googleapis.com/css?family=Montserrat:300,400&quot; rel=&quot;stylesheet&quot;&gt;
    &lt;link href=&quot;https://fonts.googleapis.com/css?family=Roboto+Mono&quot; rel=&quot;stylesheet&quot;&gt;
    &lt;link href=&quot;/styles.css&quot; rel=&quot;stylesheet&quot;&gt;
  &lt;/head&gt;
  &lt;body&gt;
    &lt;div id=&quot;vue-instance&quot;&gt;
      &lt;!-- Add Chat Container Here --&gt;
      &lt;div class=&quot;info-container full-width&quot;&gt;
        &lt;!-- Add Room UI Here --&gt;
        &lt;div class=&quot;notification-list&quot; ref=&quot;notificationContainer&quot;&gt;
          &lt;h1&gt;NOTIFICATION LOG&lt;/h1&gt;
          &lt;div class=&quot;notification full-width&quot; v-for=&quot;notification in notifications&quot;&gt;
            &lt;div class=&quot;notification-timestamp&quot;&gt;{{ notification.timestamp }}&lt;/div&gt;
            &lt;div class=&quot;notification-message&quot;&gt;{{ notification.message }}&lt;/div&gt;
          &lt;/div&gt;
        &lt;/div&gt;
        &lt;div class=&quot;flex-fill&quot;&gt;&lt;/div&gt;
        &lt;!-- Add Encryption Key UI Here --&gt;
      &lt;/div&gt;
      &lt;!-- Add Bottom Bar Here --&gt;
    &lt;/div&gt;
    &lt;script src=&quot;https://cdnjs.cloudflare.com/ajax/libs/vue/2.4.1/vue.min.js&quot;&gt;&lt;/script&gt;
    &lt;script src=&quot;https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.3/socket.io.slim.js&quot;&gt;&lt;/script&gt;
    &lt;script src=&quot;https://cdnjs.cloudflare.com/ajax/libs/immutable/3.8.1/immutable.min.js&quot;&gt;&lt;/script&gt;
    &lt;script src=&quot;/page.js&quot;&gt;&lt;/script&gt;
  &lt;/body&gt;
&lt;/html&gt;
</code></pre>
<br>
<p>This template sets up the baseline HTML structure and downloads the client-side JS dependencies.  It will also display a simple list of notifications once we add the client-side JS code.</p>
<h5 id="121createvuejsapp">1.2.1 - Create Vue.js App</h5>
<p>Add the following contents to a new file, <code>/public/page.js</code>.</p>
<pre><code class="language-javascript">/** The core Vue instance controlling the UI */
const vm = new Vue ({
  el: '#vue-instance',
  data () {
    return {
      cryptWorker: null,
      socket: null,
      originPublicKey: null,
      destinationPublicKey: null,
      messages: [],
      notifications: [],
      currentRoom: null,
      pendingRoom: Math.floor(Math.random() * 1000),
      draft: ''
    }
  },
  created () {
    this.addNotification('Hello World')
  },
  methods: {
    /** Append a notification message in the UI */
    addNotification (message) {
      const timestamp = new Date().toLocaleTimeString()
      this.notifications.push({ message, timestamp })
    },
  }
})
</code></pre>
<br>
<p>This script will initialize the Vue.js application and will add a &quot;Hello World&quot; notification to the UI.</p>
<h5 id="122addstyling">1.2.2 - Add Styling</h5>
<p>Create a new file, <code>/public/styles.css</code> and paste in the following stylesheet.</p>
<style>
.language-css {
height: 500px;
}
</style>
<pre><code class="language-css">/* Global */
:root {
  --black: #111111;
  --light-grey: #d6d6d6;
  --highlight: yellow;
}

body {
  background: var(--black);
  color: var(--light-grey);
  font-family: 'Roboto Mono', monospace;
  height: 100vh;
  display: flex;
  padding: 0;
  margin: 0;
}

div { box-sizing: border-box; }
input, textarea, select { font-family: inherit; font-size: small; }
textarea:focus, input:focus { outline: none; }

.full-width { width: 100%; }
.green { color: green; }
.red { color: red; }
.yellow { color: yellow; }
.center-x { margin: 0 auto; }
.center-text { width: 100%; text-align: center; }

h1, h2, h3 { font-family: 'Montserrat', sans-serif; }
h1 { font-size: medium; }
h2 { font-size: small; font-weight: 300; }
h3 { font-size: x-small; font-weight: 300; }
p { font-size: x-small; }

.clearfix:after {
   visibility: hidden;
   display: block;
   height: 0;
   clear: both;
}

#vue-instance {
  display: flex;
  flex-direction: row;
  flex: 1 0 100%;
  overflow-x: hidden;
}

/** Chat Window **/
.chat-container {
  flex: 0 0 60%;
  word-wrap: break-word;
  overflow-x: hidden;
  overflow-y: scroll;
  padding: 6px;
  margin-bottom: 50px;
}

.message &gt; p { font-size: small; }
.title-header &gt; p {
  font-family: 'Montserrat', sans-serif;
  font-weight: 300;
}

/* Info Panel */
.info-container {
  flex: 0 0 40%;
  border-left: solid 1px var(--light-grey);
  padding: 12px;
  overflow-x: hidden;
  overflow-y: scroll;
  margin-bottom: 50px;
  position: relative;
  justify-content: space-around;
  display: flex;
  flex-direction: column;
}

.divider {
  padding-top: 1px;
  max-height: 0px;
  min-width: 200%;
  background: var(--light-grey);
  margin: 12px -12px;
  flex: 1 0;
}

.notification-list {
  display: flex;
  flex-direction: column;
  overflow: scroll;
  padding-bottom: 24px;
  flex: 1 0 40%;
}

.notification {
  font-family: 'Montserrat', sans-serif;
  font-weight: 300;
  font-size: small;
  padding: 4px 0;
  display: inline-flex;
}

.notification-timestamp {
  flex: 0 0 20%;
  padding-right: 12px;
}

.notification-message { flex: 0 0 80%; }
.notification:last-child {
  margin-bottom: 24px;
}

.keys {
  display: block;
  font-size: xx-small;
  overflow-x: hidden;
  overflow-y: scroll;
}

.keys &gt; .divider {
  width: 75%;
  min-width: 0;
  margin: 16px auto;
}

.key { overflow: scroll; }

.room-select {
  display: flex;
  min-height: 24px;
  font-family: 'Montserrat', sans-serif;
  font-weight: 300;
}

#room-input {
    flex: 0 0 60%;
    background: none;
    border: none;
    border-bottom: 1px solid var(--light-grey);
    border-top: 1px solid var(--light-grey);
    border-left: 1px solid var(--light-grey);
    color: var(--light-grey);
    padding: 4px;
}

.yellow-button {
  flex: 0 0 30%;
  background: none;
  border: 1px solid var(--highlight);
  color: var(--highlight);
  cursor: pointer;
}

.yellow-button:hover {
  background: var(--highlight);
  color: var(--black);
}

.yellow &gt; a { color: var(--highlight); }

.loader {
    border: 4px solid black;
    border-top: 4px solid var(--highlight);
    border-radius: 50%;
    width: 48px;
    height: 48px;
    animation: spin 2s linear infinite;
}

@keyframes spin {
    0% { transform: rotate(0deg); }
    100% { transform: rotate(360deg); }
}

/* Message Input Bar */
.message-input {
  background: none;
  border: none;
  color: var(--light-grey);
  width: 90%;
}

.bottom-bar {
  border-top: solid 1px var(--light-grey);
  background: var(--black);
  position: fixed;
  bottom: 0;
  left: 0;
  padding: 12px;
  height: 48px;
}

.message-list {
  margin-bottom: 40px;
}
</code></pre>
<br>
<p>We won't really be going into the CSS, but I can assure you that it's all fairly straight-forward.</p>
<p>For the sake of simplicity, we won't bother to add a build system to our frontend.  A build system, in my opinion, is just not really necessary for an app this simple (the total gzipped payload of the completed app is under 100kb).  You are very welcome (and encouraged, since it will allow the app to be backwards compatible with outdated browsers) to add a build system such as <a href="https://webpack.js.org/">Webpack</a>, <a href="https://gulpjs.com/">Gulp</a>, or <a href="https://rollupjs.org/">Rollup</a> to the application if you decide to fork this code into your own project.</p>
<h3 id="13tryitout">1.3 - Try it out</h3>
<p>Try running <code>npm start</code> on the command-line.  You should see the command-line output <code>Chat server listening on port 3000.</code>.  Open <code>http://localhost:3000</code> in your browser, and you should see a very dark, empty web app displaying &quot;Hello World&quot; on the right side of the page.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_1.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<h2 id="2basicmessaging">2 - Basic Messaging</h2>
<p>Now that the baseline project scaffolding is in place, we'll start by adding basic (unencrypted) real-time messaging.</p>
<h3 id="20setupserversidesocketlisteners">2.0 - Setup Server-Side Socket Listeners</h3>
<p>In <code>/app.js</code>, add the follow code directly below the <code>// INSERT SOCKET.IO CODE HERE</code> marker.</p>
<pre><code class="language-javascript">/** Manage behavior of each client socket connection */
io.on('connection', (socket) =&gt; {
  console.log(`User Connected - Socket ID ${socket.id}`)

  // Store the room that the socket is connected to
  let currentRoom = 'DEFAULT'

  /** Process a room join request. */
  socket.on('JOIN', (roomName) =&gt; {
    socket.join(currentRoom)

    // Notify user of room join success
    io.to(socket.id).emit('ROOM_JOINED', currentRoom)

    // Notify room that user has joined
    socket.broadcast.to(currentRoom).emit('NEW_CONNECTION', null)
  })

  /** Broadcast a received message to the room */
  socket.on('MESSAGE', (msg) =&gt; {
    console.log(`New Message - ${msg.text}`)
    socket.broadcast.to(currentRoom).emit('MESSAGE', msg)
  })
})
</code></pre>
<br>
<p>This code-block will create a connection listener that will manage any clients who connect to the server from the front-end application.  Currently, it just adds them to a <code>DEFAULT</code> chat room, and retransmits any message that it receives to the rest of the users in the room.</p>
<h3 id="21setupclientsidesocketlisteners">2.1 - Setup Client-Side Socket Listeners</h3>
<p>Within the frontend, we'll add some code to connect to the server.  Replace the <code>created</code> function in <code>/public/page.js</code> with the following.</p>
<pre><code class="language-javascript">created () {
  // Initialize socket.io
  this.socket = io()
  this.setupSocketListeners()
},
</code></pre>
<br>
<p>Next, we'll need to add a few custom functions to manage the client-side socket connection and to send/receive messages.  Add the following to <code>/public/page.js</code> inside the <code>methods</code> block of the Vue app object.</p>
<pre><code class="language-javascript">/** Setup Socket.io event listeners */
setupSocketListeners () {
  // Automatically join default room on connect
  this.socket.on('connect', () =&gt; {
    this.addNotification('Connected To Server.')
    this.joinRoom()
  })

  // Notify user that they have lost the socket connection
  this.socket.on('disconnect', () =&gt; this.addNotification('Lost Connection'))

  // Display message when recieved
  this.socket.on('MESSAGE', (message) =&gt; {
    this.addMessage(message)
  })
},

/** Send the current draft message */
sendMessage () {
  // Don't send message if there is nothing to send
  if (!this.draft || this.draft === '') { return }

  const message = this.draft

  // Reset the UI input draft text
  this.draft = ''

  // Instantly add message to local UI
  this.addMessage(message)

  // Emit the message
  this.socket.emit('MESSAGE', message)
},

/** Join the chatroom */
joinRoom () {
  this.socket.emit('JOIN')
},

/** Add message to UI */
addMessage (message) {
  this.messages.push(message)
},
</code></pre>
<br>
<h3 id="22displaymessagesinui">2.2 - Display Messages in UI</h3>
<p>Finally, we'll need to provide a UI to send and display messages.</p>
<p>In order to display all messages in the current chat, add the following to <code>/public/index.html</code> after the <code>&lt;!-- Add Chat Container Here --&gt;</code> comment.</p>
<pre><code class="language-html">&lt;div class=&quot;chat-container full-width&quot; ref=&quot;chatContainer&quot;&gt;
  &lt;div class=&quot;message-list&quot;&gt;
    &lt;div class=&quot;message full-width&quot; v-for=&quot;message in messages&quot;&gt;
      &lt;p&gt;
      &gt; {{ message }}
      &lt;/p&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>To add a text input bar for the user to write messages in, add the following to <code>/public/index.html</code>, after the <code>&lt;!-- Add Bottom Bar Here --&gt;</code> comment.</p>
<pre><code class="language-html">&lt;div class=&quot;bottom-bar full-width&quot;&gt;
  &gt; &lt;input class=&quot;message-input&quot; type=&quot;text&quot; placeholder=&quot;Message&quot; v-model=&quot;draft&quot; @keyup.enter=&quot;sendMessage()&quot;&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>Now, restart the server and open <code>http://localhost:3000</code> in two separate tabs/windows.  Try sending messages back and forth between the tabs.  In the command-line, you should be able to see a server log of messages being sent.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_2.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_3.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<h2 id="encryption101">Encryption 101</h2>
<p>Cool, now we have a real-time messaging application.  Before adding end-to-end encryption, it's important to have a basic understanding of how asymmetric encryption works.</p>
<h4 id="symetricencryptiononewayfunctions">Symetric Encryption &amp; One Way Functions</h4>
<p>Let's say we're trading secret numbers.  We're sending the numbers through a third party, but we don't want the third party to know which number we are exchanging.</p>
<p>In order to accomplish this, we'll exchange a shared secret first - let's use <code>7</code>.</p>
<p>To encrypt the message, we'll first multiply our shared secret (<code>7</code>) by a random number <code>n</code>, and add a value <code>x</code> to the result.  In this equation, <code>x</code> represents the number that we want to send and <code>y</code> represents the encrypted result.</p>
<p><code>(7 * n) + x = y</code></p>
<p>We can then use modular arithmetic in order to transform an encrypted input into the decrypted output.</p>
<p><code>y mod 7 = x</code></p>
<p>Here, <code>y</code> as the exposed (encrypted) message and <code>x</code> is the original unencrypted message.</p>
<p>If one of us wants to exchange the number <code>2</code>, we could compute <code>(7*4) + 2</code> and send <code>30</code> as a message.  We both know the secret key (<code>7</code>), so we'll both be able to calculate <code>30 mod 7</code> and determine that <code>2</code> was the original number.</p>
<p>The original number (<code>2</code>), is effectively hidden from anyone listening in the middle since the only message passed between us was <code>30</code>.  If a third party is able to retrieve both the unencrypted result (<code>30</code>) and the encrypted value (<code>2</code>), they would still not know the value of the secret key.  In this example, <code>30 mod 14</code> and <code>30 mod 28</code> are also equal to <code>2</code>, so an interceptor could not know for certain whether the secret key is <code>7</code>, <code>14</code>, or <code>28</code>, and therefore could not dependably decipher the next message.</p>
<p>Modulo is thus considered a &quot;one-way&quot; function since it cannot be trivially reversed.</p>
<p>Modern encryption algorithms are, to vastly simplify and generalize, very complex applications of this general principle.  Through the use of large prime numbers, modular exponentiation, long private keys, and multiple rounds of cipher transformations, these algorithms generally take a very inconvenient amount a time (1+ million years) to crack.</p>
<blockquote>
<p>Quantum computers could, theoretically, crack these ciphers more quickly.  You can read more about this <a href="https://www.infoworld.com/article/3040991/security/mits-new-5-atom-quantum-computer-could-make-todays-encryption-obsolete.html">here</a>.  This technology is still in its infancy, so we probably don't need to worry about encrypted data being compromised in this manner just yet.</p>
</blockquote>
<p>The above example assumes that both parties were able to exchange a secret (in this case <code>7</code>) ahead of time.  This is called <em>symmetric encryption</em> since the same secret key is used for both encrypting and decrypting the message.  On the internet, however, this is often not a viable option - we need a way to send encrypted messages without requiring offline coordination to decide on a shared secret.  This is where asymmetric encryption comes into play.</p>
<h4 id="publickeycryptography">Public Key Cryptography</h4>
<p>In contrast to symmetric encryption, public key cryptography (asymmetric encryption) uses pairs of keys (one public, one private) instead of a single shared secret - <em>public keys</em> are for encrypting data, and <em>private keys</em> are for decrypting data.</p>
<p>A <em>public key</em> is like an open box with an unbreakable lock.  If someone wants to send you a message, they can place that message in your public box, and close the lid to lock it.  The message can now be sent, to be delivered by an untrusted party without needing to worry about the contents being exposed.  Once I receive the box, I'll unlock it with my <em>private key</em> - the only existing key which can open that box.</p>
<p>Exchanging <em>public keys</em> is like exchanging those boxes - each private key is kept safe with the original owner, so the contents of the box are safe in transit.</p>
<p>This is, of course, a bare-bones simplification of how public key cryptography works.  If you're curious to learn more (especially regarding the history and mathematical basis for these techniques) I would strongly recommend starting with these two videos.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/YEBfamv-_do" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/wXB-V_Keiu8" frameborder="0" gesture="media" allow="encrypted-media" allowfullscreen></iframe>
<h2 id="3cryptowebworker">3 - Crypto Web Worker</h2>
<p>Cryptographic operations tend to be computationally intensive.  Since Javascript is single-threaded, doing these operations on the main UI thread will cause the browser to freeze for a few seconds.</p>
<blockquote>
<p>Wrapping the operations in a promise will not help, since promises are for managing asynchronous operations on a single-thread, and do not provide any performance benefit for computationally intensive tasks.</p>
</blockquote>
<p>In order to keep the application performant, we will use a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers">Web Worker</a> to perform cryptographic computations on a separate browser thread.  We'll be using <a href="https://github.com/travist/jsencrypt">JSEncrypt</a>, a reputable Javascript RSA implementation originating from Stanford.  Using JSEncrypt, we'll create a few helper functions for encryption, decryption, and key pair generation.</p>
<h3 id="30createwebworkertowrapthejsencryptmethods">3.0 - Create Web Worker To Wrap the JSencrypt Methods</h3>
<p>Add a new file called <code>crypto-worker.js</code> in the <code>public</code> directory.  This file will store our web worker code in order to perform encryption operations on a separate browser thread.</p>
<pre><code class="language-javascript">self.window = self // This is required for the jsencrypt library to work within the web worker

// Import the jsencrypt library
self.importScripts('https://cdnjs.cloudflare.com/ajax/libs/jsencrypt/2.3.1/jsencrypt.min.js');

let crypt = null
let privateKey = null

/** Webworker onmessage listener */
onmessage = function(e) {
  const [ messageType, messageId, text, key ] = e.data
  let result
  switch (messageType) {
    case 'generate-keys':
      result = generateKeypair()
      break
    case 'encrypt':
      result = encrypt(text, key)
      break
    case 'decrypt':
      result = decrypt(text)
      break
  }

  // Return result to the UI thread
  postMessage([ messageId, result ])
}

/** Generate and store keypair */
function generateKeypair () {
  crypt = new JSEncrypt({default_key_size: 2056})
  privateKey = crypt.getPrivateKey()

  // Only return the public key, keep the private key hidden
  return crypt.getPublicKey()
}

/** Encrypt the provided string with the destination public key */
function encrypt (content, publicKey) {
  crypt.setKey(publicKey)
  return crypt.encrypt(content)
}

/** Decrypt the provided string with the local private key */
function decrypt (content) {
  crypt.setKey(privateKey)
  return crypt.decrypt(content)
}
</code></pre>
<br>
<p>This web worker will receive messages from the UI thread in the <code>onmessage</code> listener, perform the requested operation, and post the result back to the UI thread.  The private encryption key is never directly exposed to the UI thread, which helps to mitigate the potential for key theft from a <a href="https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)">cross-site scripting (XSS) attack</a>.</p>
<h3 id="31configurevueapptocommunicatewithwebworker">3.1 - Configure Vue App To Communicate with Web Worker</h3>
<p>Next, we'll configure the UI controller to communicate with the web worker.  Sequential call/response communications using event listeners can be painful to synchronize.  To simplify this, we'll create a utility function that wraps the entire communication lifecycle in a promise.  Add the following code to the <code>methods</code> block in <code>/public/page.js</code>.</p>
<pre><code class="language-javascript">/** Post a message to the web worker and return a promise that will resolve with the response.  */
getWebWorkerResponse (messageType, messagePayload) {
  return new Promise((resolve, reject) =&gt; {
    // Generate a random message id to identify the corresponding event callback
    const messageId = Math.floor(Math.random() * 100000)

    // Post the message to the webworker
    this.cryptWorker.postMessage([messageType, messageId].concat(messagePayload))

    // Create a handler for the webworker message event
    const handler = function (e) {
      // Only handle messages with the matching message id
      if (e.data[0] === messageId) {
        // Remove the event listener once the listener has been called.
        e.currentTarget.removeEventListener(e.type, handler)

        // Resolve the promise with the message payload.
        resolve(e.data[1])
      }
    }

    // Assign the handler to the webworker 'message' event.
    this.cryptWorker.addEventListener('message', handler)
  })
}
</code></pre>
<br>
<p>This code will allow us to trigger an operation on the web worker thread and receive the result in a promise.  This can be a very useful helper function in any project that outsources call/response processing to web workers.</p>
<h2 id="4keyexchange">4 - Key Exchange</h2>
<p>In our app, the first step will be generating a public-private key pair for each user.  Then, once the users are in the same chat, we will exchange <em>public keys</em> so that each user can encrypt messages which only the other user can decrypt.  Hence, we will always encrypt messages using the recipient's <em>public key</em>, and we will always decrypt messages using the recipient's <em>private key</em>.</p>
<h3 id="40addserversidesocketlistenertotransmitpublickeys">4.0 - Add Server-Side Socket Listener To Transmit Public Keys</h3>
<p>On the server-side, we'll need a new socket listener that will receive a public-key from a client and re-broadcast this key to the rest of the room.  We'll also add a listener to let clients know when someone has disconnected from the current room.</p>
<p>Add the following listeners to <code>/app.js</code> within the <code>io.on('connection', (socket) =&gt; { ... }</code> callback.</p>
<pre><code class="language-javascript">/** Broadcast a new publickey to the room */
socket.on('PUBLIC_KEY', (key) =&gt; {
  socket.broadcast.to(currentRoom).emit('PUBLIC_KEY', key)
})

/** Broadcast a disconnection notification to the room */
socket.on('disconnect', () =&gt; {
  socket.broadcast.to(currentRoom).emit('USER_DISCONNECTED', null)
})
</code></pre>
<br>
<h3 id="41generatekeypair">4.1 - Generate Key Pair</h3>
<p>Next, we'll replace the <code>created</code> function in <code>/public/page.js</code> to initialize the web worker and generate a new key pair.</p>
<pre><code class="language-javascript">async created () {
  this.addNotification('Welcome! Generating a new keypair now.')

  // Initialize crypto webworker thread
  this.cryptWorker = new Worker('crypto-worker.js')

  // Generate keypair and join default room
  this.originPublicKey = await this.getWebWorkerResponse('generate-keys')
  this.addNotification('Keypair Generated')

  // Initialize socketio
  this.socket = io()
  this.setupSocketListeners()
},
</code></pre>
<br>
<p>We are using the <a href="https://blog.patricktriest.com/what-is-async-await-why-should-you-care/">async/await syntax</a> to receive the web worker promise result with a single line of code.</p>
<h3 id="42addpublickeyhelperfunctions">4.2 - Add Public Key Helper Functions</h3>
<p>We'll also add a few new functions to <code>/public/page.js</code> for sending the public key, and to trim down the key to a human-readable identifier.</p>
<pre><code class="language-javascript">/** Emit the public key to all users in the chatroom */
sendPublicKey () {
  if (this.originPublicKey) {
    this.socket.emit('PUBLIC_KEY', this.originPublicKey)
  }
},

/** Get key snippet for display purposes */
getKeySnippet (key) {
  return key.slice(400, 416)
},
</code></pre>
<br>
<h3 id="43sendandreceivepublickey">4.3 - Send and Receive Public Key</h3>
<p>Next, we'll add some listeners to the client-side socket code, in order to send the local public key whenever a new user joins the room, and to save the public key sent by the other user.</p>
<p>Add the following to <code>/public/page.js</code> within the <code>setupSocketListeners</code> function.</p>
<pre><code class="language-javascript">// When a user joins the current room, send them your public key
this.socket.on('NEW_CONNECTION', () =&gt; {
  this.addNotification('Another user joined the room.')
  this.sendPublicKey()
})

// Broadcast public key when a new room is joined
this.socket.on('ROOM_JOINED', (newRoom) =&gt; {
  this.currentRoom = newRoom
  this.addNotification(`Joined Room - ${this.currentRoom}`)
  this.sendPublicKey()
})

// Save public key when received
this.socket.on('PUBLIC_KEY', (key) =&gt; {
  this.addNotification(`Public Key Received - ${this.getKeySnippet(key)}`)
  this.destinationPublicKey = key
})

// Clear destination public key if other user leaves room
this.socket.on('user disconnected', () =&gt; {
  this.notify(`User Disconnected - ${this.getKeySnippet(this.destinationKey)}`)
  this.destinationPublicKey = null
})
</code></pre>
<br>
<h3 id="44showpublickeysinui">4.4 - Show Public Keys In UI</h3>
<p>Finally, we'll add some HTML to display the two public keys.</p>
<p>Add the following to <code>/public/index.html</code>, directly below the <code>&lt;!-- Add Encryption Key UI Here --&gt;</code> comment.</p>
<pre><code class="language-html">&lt;div class=&quot;divider&quot;&gt;&lt;/div&gt;
&lt;div class=&quot;keys full-width&quot;&gt;
  &lt;h1&gt;KEYS&lt;/h1&gt;
  &lt;h2&gt;THEIR PUBLIC KEY&lt;/h2&gt;
  &lt;div class=&quot;key red&quot; v-if=&quot;destinationPublicKey&quot;&gt;
    &lt;h3&gt;TRUNCATED IDENTIFIER - {{ getKeySnippet(destinationPublicKey) }}&lt;/h3&gt;
    &lt;p&gt;{{ destinationPublicKey }}&lt;/p&gt;
  &lt;/div&gt;
  &lt;h2 v-else&gt;Waiting for second user to join room...&lt;/h2&gt;
  &lt;div class=&quot;divider&quot;&gt;&lt;/div&gt;
  &lt;h2&gt;YOUR PUBLIC KEY&lt;/h2&gt;
  &lt;div class=&quot;key green&quot; v-if=&quot;originPublicKey&quot;&gt;
    &lt;h3&gt;TRUNCATED IDENTIFIER - {{ getKeySnippet(originPublicKey) }}&lt;/h3&gt;
    &lt;p&gt;{{ originPublicKey }}&lt;/p&gt;
  &lt;/div&gt;
  &lt;div class=&quot;keypair-loader full-width&quot; v-else&gt;
    &lt;div class=&quot;center-x loader&quot;&gt;&lt;/div&gt;
    &lt;h2 class=&quot;center-text&quot;&gt;Generating Keypair...&lt;/h2&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>Try restarting the app and reloading <code>http://localhost:3000</code>.  You should be able to simulate a successful key exchange by opening two browser tabs.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_4.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<blockquote>
<p>Having more than two pages with web app running will break the key-exchange.  We'll fix this further down.</p>
</blockquote>
<h2 id="5messageencryption">5 - Message Encryption</h2>
<p>Now that the key-exchange is complete, encrypting and decrypting messages within the web app is rather straight-forward.</p>
<h3 id="50encryptmessagebeforesending">5.0 - Encrypt Message Before Sending</h3>
<p>Replace the <code>sendMessage</code> function in <code>/public/page.js</code> with the following.</p>
<pre><code class="language-javascript">/** Encrypt and emit the current draft message */
async sendMessage () {
  // Don't send message if there is nothing to send
  if (!this.draft || this.draft === '') { return }

  // Use immutable.js to avoid unintended side-effects.
  let message = Immutable.Map({
    text: this.draft,
    recipient: this.destinationPublicKey,
    sender: this.originPublicKey
  })

  // Reset the UI input draft text
  this.draft = ''

  // Instantly add (unencrypted) message to local UI
  this.addMessage(message.toObject())

  if (this.destinationPublicKey) {
    // Encrypt message with the public key of the other user
    const encryptedText = await this.getWebWorkerResponse(
      'encrypt', [ message.get('text'), this.destinationPublicKey ])
    const encryptedMsg = message.set('text', encryptedText)

    // Emit the encrypted message
    this.socket.emit('MESSAGE', encryptedMsg.toObject())
  }
},
</code></pre>
<br>
<h3 id="51receiveanddecryptmessage">5.1 - Receive and Decrypt Message</h3>
<p>Modify the client-side <code>message</code> listener in <code>/public/page.js</code> to decrypt the message once it is received.</p>
<pre><code class="language-javascript">// Decrypt and display message when received
this.socket.on('MESSAGE', async (message) =&gt; {
  // Only decrypt messages that were encrypted with the user's public key
  if (message.recipient === this.originPublicKey) {
    // Decrypt the message text in the webworker thread
    message.text = await this.getWebWorkerResponse('decrypt', message.text)

    // Instantly add (unencrypted) message to local UI
    this.addMessage(message)
  }
})
</code></pre>
<br>
<h3 id="52displaymessagelist">5.2 - Display Message List</h3>
<p>Modify the message list UI in <code>/public/index.html</code> (inside the <code>chat-container</code>) to display the decrypted message and the abbreviated public key of the sender.</p>
<pre><code class="language-html">&lt;div class=&quot;message full-width&quot; v-for=&quot;message in messages&quot;&gt;
  &lt;p&gt;
    &lt;span v-bind:class=&quot;(message.sender == originPublicKey) ? 'green' : 'red'&quot;&gt;{{ getKeySnippet(message.sender) }}&lt;/span&gt;
    &gt; {{ message.text }}
  &lt;/p&gt;
&lt;/div&gt;
</code></pre>
<br>
<h3 id="53tryitout">5.3 - Try It Out</h3>
<p>Try restarting the server and reloading the page at <code>http://localhost:3000</code>.  The UI should look mostly unchanged from how it was before, besides displaying the public key snippet of whoever sent each message.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_5.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_6.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<p>In command-line output, the messages are no longer readable - they now display as garbled encrypted text.</p>
<h2 id="6chatrooms">6 - Chatrooms</h2>
<p>You may have noticed a massive flaw in the current app - if we open a third tab running the web app then the encryption system breaks.  Asymmetric-encryption is designed to work in one-to-one scenarios; there's no way to encrypt the message <em>once</em> and have it be decryptable by <em>two</em> separate users.</p>
<p>This leaves us with two options -</p>
<ol>
<li>Encrypt and send a separate copy of the message to each user, if there is more than one.</li>
<li>Restrict each chat room to only allow two users at a time.</li>
</ol>
<p>Since this tutorial is already quite long, we'll be going with second, simpler option.</p>
<h3 id="60serversideroomjoinlogic">6.0 - Server-side Room Join Logic</h3>
<p>In order to enforce this new 2-user limit, we'll modify the server-side socket <code>JOIN</code> listener in <code>/app.js</code>, at the top of socket connection listener block.</p>
<pre><code class="language-javascript">// Store the room that the socket is connected to
// If you need to scale the app horizontally, you'll need to store this variable in a persistent store such as Redis.
// For more info, see here: https://github.com/socketio/socket.io-redis
let currentRoom = null

/** Process a room join request. */
socket.on('JOIN', (roomName) =&gt; {
  // Get chatroom info
  let room = io.sockets.adapter.rooms[roomName]

  // Reject join request if room already has more than 1 connection
  if (room &amp;&amp; room.length &gt; 1) {
    // Notify user that their join request was rejected
    io.to(socket.id).emit('ROOM_FULL', null)

    // Notify room that someone tried to join
    socket.broadcast.to(roomName).emit('INTRUSION_ATTEMPT', null)
  } else {
    // Leave current room
    socket.leave(currentRoom)

    // Notify room that user has left
    socket.broadcast.to(currentRoom).emit('USER_DISCONNECTED', null)

    // Join new room
    currentRoom = roomName
    socket.join(currentRoom)

    // Notify user of room join success
    io.to(socket.id).emit('ROOM_JOINED', currentRoom)

    // Notify room that user has joined
    socket.broadcast.to(currentRoom).emit('NEW_CONNECTION', null)
  }
})
</code></pre>
<br>
<p>This modified socket logic will prevent a user from joining any room that already has two users.</p>
<h3 id="61joinroomfromtheclientside">6.1 - Join Room From The Client Side</h3>
<p>Next, we'll modify our client-side <code>joinRoom</code> function in <code>/public/page.js</code>, in order to reset the state of the chat when switching rooms.</p>
<pre><code class="language-javascript">/** Join the specified chatroom */
joinRoom () {
  if (this.pendingRoom !== this.currentRoom &amp;&amp; this.originPublicKey) {
    this.addNotification(`Connecting to Room - ${this.pendingRoom}`)

    // Reset room state variables
    this.messages = []
    this.destinationPublicKey = null

    // Emit room join request.
    this.socket.emit('JOIN', this.pendingRoom)
  }
},
</code></pre>
<br>
<h3 id="62addnotifications">6.2 - Add Notifications</h3>
<p>Let's create two more client-side socket listeners (within the <code>setupSocketListeners</code> function in <code>/public/page.js</code>), to notify us whenever a join request is rejected.</p>
<pre><code class="language-javascript">// Notify user that the room they are attempting to join is full
this.socket.on('ROOM_FULL', () =&gt; {
  this.addNotification(`Cannot join ${this.pendingRoom}, room is full`)

  // Join a random room as a fallback
  this.pendingRoom = Math.floor(Math.random() * 1000)
  this.joinRoom()
})

// Notify room that someone attempted to join
this.socket.on('INTRUSION_ATTEMPT', () =&gt; {
  this.addNotification('A third user attempted to join the room.')
})
</code></pre>
<br>
<h3 id="63addroomjoinui">6.3 - Add Room Join UI</h3>
<p>Finally, we'll add some HTML to provide an interface for the user to join a room of their choosing.</p>
<p>Add the following to <code>/public/index.html</code> below the <code>&lt;!-- Add Room UI Here --&gt;</code> comment.</p>
<pre><code class="language-html">&lt;h1&gt;CHATROOM&lt;/h1&gt;
&lt;div class=&quot;room-select&quot;&gt;
  &lt;input type=&quot;text&quot; class=&quot;full-width&quot; placeholder=&quot;Room Name&quot; id=&quot;room-input&quot; v-model=&quot;pendingRoom&quot; @keyup.enter=&quot;joinRoom()&quot;&gt;
  &lt;input class=&quot;yellow-button full-width&quot; type=&quot;submit&quot; v-on:click=&quot;joinRoom()&quot; value=&quot;JOIN&quot;&gt;
&lt;/div&gt;
&lt;div class=&quot;divider&quot;&gt;&lt;/div&gt;
</code></pre>
<br>
<h3 id="64addautoscroll">6.4 - Add Autoscroll</h3>
<p>An annoying bug remaining in the app is that the notification and chat lists do not yet auto-scroll to display new messages.</p>
<p>In <code>/public/page.js</code>, add the following function to the <code>methods</code> block.</p>
<pre><code class="language-javascript">/** Autoscoll DOM element to bottom */
autoscroll (element) {
  if (element) { element.scrollTop = element.scrollHeight }
},
</code></pre>
<br>
<p>To auto-scroll the notification and message lists, we'll call <code>autoscroll</code> at the end of their respective <code>add</code> methods.</p>
<pre><code class="language-javascript">/** Add message to UI and scroll the view to display the new message. */
addMessage (message) {
  this.messages.push(message)
  this.autoscroll(this.$refs.chatContainer)
},

/** Append a notification message in the UI */
addNotification (message) {
  const timestamp = new Date().toLocaleTimeString()
  this.notifications.push({ message, timestamp })
  this.autoscroll(this.$refs.notificationContainer)
},
</code></pre>
<br>
<h3 id="65tryitout">6.5 - Try it out</h3>
<p>That was the last step!  Try restarting the node app and reloading the page at <code>localhost:3000</code>.  You should now be able to freely switch between rooms, and any attempt to join the same room from a third browser tab will be rejected.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/e2e-chat/screenshot_7.png" alt="An Introduction To Utilizing Public-Key Cryptography In Javascript"></p>
<h2 id="7whatnext">7 - What next?</h2>
<p>Congrats! You have just built a completely functional end-to-end encrypted messaging app.</p>
<p>Github Repository - <a href="https://github.com/triestpa/Open-Cryptochat">https://github.com/triestpa/Open-Cryptochat</a><br>
Live Preview - <a href="https://chat.patricktriest.com">https://chat.patricktriest.com</a></p>
<p>Using this baseline source code you could deploy a private messaging app on your own servers.  In order to coordinate which room to meet in, one slick option could be using a time-based pseudo-random number generator (such as <a href="https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&amp;hl=en">Google Authenticator</a>), with a shared seed between you and a second party (I've got a Javascript &quot;Google Authenticator&quot; clone tutorial in the works - stay tuned).</p>
<h3 id="furtherimprovements">Further Improvements</h3>
<p>There are lots of ways to build up the app from here:</p>
<ul>
<li>Group chats, by storing multiple public keys, and encrypting the message for each user individually.</li>
<li>Multimedia messages, by encrypting a byte-array containing the media file.</li>
<li>Import and export key pairs as local files.</li>
<li>Sign messages with the private key for sender identity verification.  This is a trade-off because it increases the difficulty of fabricating messages, but also undermines the goal of &quot;deniable authentication&quot; as outlined in the <a href="https://en.wikipedia.org/wiki/Off-the-Record_Messaging">OTR messaging standard</a>.</li>
<li>Experiment with different encryption systems such as:
<ul>
<li><a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard"><strong>AES</strong></a> - Symmetric encryption, with a shared secret between the users.  This is the only publicly available algorithm that is in use by the NSA and US Military.</li>
<li><a href="https://en.wikipedia.org/wiki/ElGamal_encryption"><strong>ElGamal</strong></a> - Similar to RSA, but with smaller cyphertexts, faster decryption, and slower encryption.  This is the core algorithm that is used in <a href="https://en.wikipedia.org/wiki/Pretty_Good_Privacy">PGP</a>.</li>
<li>Implement a <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange"><strong>Diffie-Helman</strong></a> key exchange.  This is a technique of using asymmetric encryption (such as ElGamal) to exchange a shared secret, such as a symmetric encryption key (for AES).  Building this on top of our existing project and exchanging a new shared secret before each message is a good way to improve the security of the app (see <a href="https://en.wikipedia.org/wiki/Forward_secrecy">Perfect Forward Security</a>).</li>
</ul>
</li>
<li>Build an app for virtually any use-case where intermediate servers should never have unencrypted access to the transmitted data, such as password-managers and P2P (peer-to-peer) networks.</li>
<li>Refactor the app for <a href="https://facebook.github.io/react-native/">React Native</a>, <a href="https://ionicframework.com/">Ionic</a>, <a href="https://cordova.apache.org/">Cordova</a>, or <a href="https://electronjs.org/">Electron</a> in order to provide a secure pre-built application bundle for mobile and/or desktop environments.</li>
</ul>
<p>Feel free to comment below with questions, responses, and/or feedback on the tutorial.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p><strong>Security Implications Of Browser Based Encryption</strong><br><br>Please remember to be careful. The use of these protocols in a browser-based Javascript app is a great way to experiment and understand how they work in practice, but this app is not a suitable replacement for established, peer-reviewed encryption protocol implementations such as <a href="https://en.wikipedia.org/wiki/OpenSSL">OpenSSL</a> and <a href="https://en.wikipedia.org/wiki/GNU_Privacy_Guard">GnuPG</a>.<br><br> Client-side browser Javascript encryption is a controversial topic among security experts due to the vulnerabilities present in web application delivery versus pre-packaged software distributions that run outside the browser.  Many of these issues can be mitigated by utilizing HTTPS to prevent man-in-the-middle resource injection attacks, and by avoiding persistent storage of unencrypted sensitive data within the browser, but it is important to stay aware of potential vulnerabilities in the web platform. <a href="#fnref1" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</div>]]></content:encoded></item><item><title><![CDATA[Exploring United States Policing Data Using Python]]></title><description><![CDATA[Use Python to analyze and visualize an open-source dataset of 60 million police stops from across the US.]]></description><link>http://blog.patricktriest.com/police-data-python/</link><guid isPermaLink="false">598eaf94b7d6af1a6a795fd8</guid><category><![CDATA[Python]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Fri, 27 Oct 2017 06:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/policing-data/police_header.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h2 id="datasciencepoliticsandpolice">Data Science, Politics, and Police</h2>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/police_header.jpg" alt="Exploring United States Policing Data Using Python"><p>The intersection of science, politics, personal opinion, and social policy can be rather complex.  This junction of ideas and disciplines is often rife with controversies, strongly held viewpoints, and agendas that are often <a href="https://en.wikipedia.org/wiki/Global_warming_controversy">more based on belief than on empirical evidence</a>.  Data science is particularly important in this area since it provides a methodology for examining the world in a pragmatic fact-first manner, and is capable of providing insight into some of the most important issues that we face today.</p>
<p>The recent high-profile police shootings of unarmed black men, such as <a href="https://en.wikipedia.org/wiki/Shooting_of_Michael_Brown">Michael Brown</a> (2014), <a href="https://en.wikipedia.org/wiki/Shooting_of_Tamir_Rice">Tamir Rice</a> (2014), <a href="https://en.wikipedia.org/wiki/Shooting_of_Alton_Sterling">Anton Sterling</a> (2016), and <a href="https://en.wikipedia.org/wiki/Shooting_of_Philando_Castile">Philando Castile</a> (2016), have triggered a divisive national dialog on the issue of racial bias in policing.</p>
<p>These shootings have spurred the growth of large social movements seeking to raise awareness of what is viewed as the systemic targeting of people-of-color by police forces across the country.  On the other side of the political spectrum, many hold a view that the unbalanced targeting of non-white citizens is a myth created by the media based on a handful of extreme cases, and that these highly-publicized stories are not representative of the national norm.</p>
<p>In June 2017, a team of researchers at Stanford University collected and released an open-source data set of 60 million state police patrol stops from 20 states across the US.  In this tutorial, we will walk through how to analyze and visualize this data using Python.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_VT.png" alt="Exploring United States Policing Data Using Python"></p>
<p>The source code and figures for this analysis can be found in the companion Github repository - <a href="https://github.com/triestpa/Police-Analysis-Python">https://github.com/triestpa/Police-Analysis-Python</a></p>
<p>To preview the completed IPython notebook, visit the page <a href="https://github.com/triestpa/Police-Analysis-Python/blob/master/traffic_stop_analysis.ipynb">here</a>.</p>
<blockquote>
<p>This tutorial and analysis would not be possible without the work performed by <a href="https://openpolicing.stanford.edu/">The Stanford Open Policing Project</a>.  Much of the analysis performed in this tutorial is based on the work that has already performed by this team.  <a href="https://openpolicing.stanford.edu/tutorials/">A short tutorial</a> for working with the data using the R programming language is provided on the official project website.</p>
</blockquote>
<h2 id="thedata">The Data</h2>
<p>In the United States there are more than 50,000 traffic stops on a typical day.  The potential number of data points for each stop is huge, from the demographics (age, race, gender) of the driver, to the location, time of day, stop reason, stop outcome, car model, and much more.  Unfortunately, not every state makes this data available, and those that do often have different standards for which information is reported.  Different counties and districts within each state can also be inconstant in how each traffic stop is recorded.  The <a href="https://openpolicing.stanford.edu/">research team at Stanford</a> has managed to gather traffic-stop data from twenty states, and has worked to regularize the reporting standards for 11 fields.</p>
<ul>
<li>Stop Date</li>
<li>Stop Time</li>
<li>Stop Location</li>
<li>Driver Race</li>
<li>Driver Gender</li>
<li>Driver Age</li>
<li>Stop Reason</li>
<li>Search Conducted</li>
<li>Search Type</li>
<li>Contraband Found</li>
<li>Stop Outcome</li>
</ul>
<p>Most states do not have data available for every field, but there is enough overlap between the data sets to provide a solid foundation for some very interesting analysis.</p>
<h2 id="0gettingstarted">0 - Getting Started</h2>
<p>We'll start with analyzing the data set for Vermont.  We're looking at Vermont first for a few reasons.</p>
<ol>
<li>The Vermont dataset is small enough to be very manageable and quick to operate on, with only 283,285 traffic stops (compared to the Texas data set, for instance, which contains almost 24 million records).</li>
<li>There is not much missing data, as all eleven fields mentioned above are covered.</li>
<li>Vermont is 94% white, but is also in a part of the country known for being very liberal (disclaimer - I grew up in the Boston area, and I've spent a quite a bit of time in Vermont).  Many in this area consider this state to be very progressive and might like to believe that their state institutions are not as prone to systemic racism as the institutions in other parts of the country.  It will be interesting to determine if the data validates this view.</li>
</ol>
<h4 id="00downloaddatset">0.0 - Download Datset</h4>
<p>First, download the Vermont traffic stop data - <a href="https://stacks.stanford.edu/file/druid:py883nd2578/VT-clean.csv.gz">https://stacks.stanford.edu/file/druid:py883nd2578/VT-clean.csv.gz</a></p>
<h4 id="01setupproject">0.1 - Setup Project</h4>
<p>Create a new directory for the project, say <code>police-data-analysis</code>, and move the downloaded file into a <code>/data</code> directory within the project.</p>
<h4 id="02optionalcreatenewvirtualenvoranacondaenvironment">0.2 - Optional: Create new virtualenv (or Anaconda) environment</h4>
<p>If you want to keep your Python dependencies neat and separated between projects, now would be the time to create and activate a new environment for this analysis, using either <a href="https://virtualenv.pypa.io/en/stable/">virtualenv</a> or <a href="https://conda.io/docs/user-guide/install/index.html">Anaconda</a>.</p>
<p>Here are some tutorials to help you get set up.<br>
virtualenv - <a href="https://virtualenv.pypa.io/en/stable/">https://virtualenv.pypa.io/en/stable/</a><br>
Anaconda - <a href="https://conda.io/docs/user-guide/install/index.html">https://conda.io/docs/user-guide/install/index.html</a></p>
<h4 id="03installdependencies">0.3 - Install dependencies</h4>
<p>We'll need to install a few Python packages to perform our analysis.</p>
<p>On the command line, run the following command to install the required libraries.</p>
<pre><code class="language-bash">pip install numpy pandas matplotlib ipython jupyter
</code></pre>
<blockquote>
<p>If you're using Anaconda, you can replace the <code>pip</code> command here with <code>conda</code>.  Also, depending on your installation, you might need to use <code>pip3</code> instead of <code>pip</code> in order to install the Python 3 versions of the packages.</p>
</blockquote>
<h4 id="04startjupyternotebook">0.4 - Start Jupyter Notebook</h4>
<p>Start a new local Jupyter notebook server from the command line.</p>
<pre><code class="language-bash">jupyter notebook
</code></pre>
<p>Open your browser to the specified URL (probably <code>localhost:8888</code>, unless you have a special configuration) and create a new notebook.</p>
<blockquote>
<p>I used Python 3.6 for writing this tutorial.  If you want to use another Python version, that's fine, most of the code that we'll cover should work on any Python 2.x or 3.x distribution.</p>
</blockquote>
<h4 id="05loaddependencies">0.5 - Load Dependencies</h4>
<p>In the first cell of the notebook, import our dependencies.</p>
<pre><code class="language-python">import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

figsize = (16,8)
</code></pre>
<p>We're also setting a shared variable <code>figsize</code> that we'll reuse later on in our data visualization logic.</p>
<h4 id="06loaddataset">0.6 - Load Dataset</h4>
<p>In the next cell, load Vermont police stop data set into a <a href="https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe">Pandas dataframe</a>.</p>
<pre><code class="language-python">df_vt = pd.read_csv('./data/VT-clean.csv.gz', compression='gzip', low_memory=False)
</code></pre>
<blockquote>
<p>This command assumes that you are storing the data set in the <code>data</code> directory of the project.  If you are not, you can adjust the data file path accordingly.</p>
</blockquote>
<h2 id="1vermontdataexploration">1 - Vermont Data Exploration</h2>
<p>Now begins the fun part.</p>
<h4 id="10previewtheavailabledata">1.0 - Preview the Available Data</h4>
<p>We can get a quick preview of the first ten rows of the data set with the <code>head()</code> method.</p>
<pre><code class="language-python">df_vt.head()
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>id</th>
      <th>state</th>
      <th>stop_date</th>
      <th>stop_time</th>
      <th>location_raw</th>
      <th>county_name</th>
      <th>county_fips</th>
      <th>fine_grained_location</th>
      <th>police_department</th>
      <th>driver_gender</th>
      <th>driver_age_raw</th>
      <th>driver_age</th>
      <th>driver_race_raw</th>
      <th>driver_race</th>
      <th>violation_raw</th>
      <th>violation</th>
      <th>search_conducted</th>
      <th>search_type_raw</th>
      <th>search_type</th>
      <th>contraband_found</th>
      <th>stop_outcome</th>
      <th>is_arrested</th>
      <th>officer_id</th>
      <th>is_white</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>VT-2010-00001</td>
      <td>VT</td>
      <td>2010-07-01</td>
      <td>00:10</td>
      <td>East Montpelier</td>
      <td>Washington County</td>
      <td>50023.0</td>
      <td>COUNTY RD</td>
      <td>MIDDLESEX VSP</td>
      <td>M</td>
      <td>22.0</td>
      <td>22.0</td>
      <td>White</td>
      <td>White</td>
      <td>Moving Violation</td>
      <td>Moving violation</td>
      <td>False</td>
      <td>No Search Conducted</td>
      <td>N/A</td>
      <td>False</td>
      <td>Citation</td>
      <td>False</td>
      <td>-1.562157e+09</td>
      <td>True</td>
    </tr>
    <tr>
      <th>3</th>
      <td>VT-2010-00004</td>
      <td>VT</td>
      <td>2010-07-01</td>
      <td>00:11</td>
      <td>Whiting</td>
      <td>Addison County</td>
      <td>50001.0</td>
      <td>N MAIN ST</td>
      <td>NEW HAVEN VSP</td>
      <td>F</td>
      <td>18.0</td>
      <td>18.0</td>
      <td>White</td>
      <td>White</td>
      <td>Moving Violation</td>
      <td>Moving violation</td>
      <td>False</td>
      <td>No Search Conducted</td>
      <td>N/A</td>
      <td>False</td>
      <td>Arrest for Violation</td>
      <td>True</td>
      <td>-3.126844e+08</td>
      <td>True</td>
    </tr>
    <tr>
      <th>4</th>
      <td>VT-2010-00005</td>
      <td>VT</td>
      <td>2010-07-01</td>
      <td>00:35</td>
      <td>Hardwick</td>
      <td>Caledonia County</td>
      <td>50005.0</td>
      <td>i91 nb mm 62</td>
      <td>ROYALTON VSP</td>
      <td>M</td>
      <td>18.0</td>
      <td>18.0</td>
      <td>White</td>
      <td>White</td>
      <td>Moving Violation</td>
      <td>Moving violation</td>
      <td>False</td>
      <td>No Search Conducted</td>
      <td>N/A</td>
      <td>False</td>
      <td>Written Warning</td>
      <td>False</td>
      <td>9.225661e+08</td>
      <td>True</td>
    </tr>
    <tr>
      <th>5</th>
      <td>VT-2010-00006</td>
      <td>VT</td>
      <td>2010-07-01</td>
      <td>00:44</td>
      <td>Hardwick</td>
      <td>Caledonia County</td>
      <td>50005.0</td>
      <td>64000 I 91 N; MM64 I 91 N</td>
      <td>ROYALTON VSP</td>
      <td>F</td>
      <td>20.0</td>
      <td>20.0</td>
      <td>White</td>
      <td>White</td>
      <td>Vehicle Equipment</td>
      <td>Equipment</td>
      <td>False</td>
      <td>No Search Conducted</td>
      <td>N/A</td>
      <td>False</td>
      <td>Written Warning</td>
      <td>False</td>
      <td>-6.032327e+08</td>
      <td>True</td>
    </tr>
    <tr>
      <th>8</th>
      <td>VT-2010-00009</td>
      <td>VT</td>
      <td>2010-07-01</td>
      <td>01:10</td>
      <td>Rochester</td>
      <td>Windsor County</td>
      <td>50027.0</td>
      <td>36000 I 91 S; MM36 I 91 S</td>
      <td>ROCKINGHAM VSP</td>
      <td>M</td>
      <td>24.0</td>
      <td>24.0</td>
      <td>Black</td>
      <td>Black</td>
      <td>Moving Violation</td>
      <td>Moving violation</td>
      <td>False</td>
      <td>No Search Conducted</td>
      <td>N/A</td>
      <td>False</td>
      <td>Written Warning</td>
      <td>False</td>
      <td>2.939526e+08</td>
      <td>False</td>
    </tr>
  </tbody>
</table>
<p>We can also list the available fields by reading the <code>columns</code> property.</p>
<pre><code class="language-python">df_vt.columns
</code></pre>
<pre><code class="language-text">Index(['id', 'state', 'stop_date', 'stop_time', 'location_raw', 'county_name',
       'county_fips', 'fine_grained_location', 'police_department',
       'driver_gender', 'driver_age_raw', 'driver_age', 'driver_race_raw',
       'driver_race', 'violation_raw', 'violation', 'search_conducted',
       'search_type_raw', 'search_type', 'contraband_found', 'stop_outcome',
       'is_arrested', 'officer_id'],
      dtype='object')
</code></pre>
<br>
<h4 id="11dropmissingvalues">1.1 - Drop Missing Values</h4>
<p>Let's do a quick count of each column to determine how consistently populated the data is.</p>
<pre><code class="language-python">df_vt.count()
</code></pre>
<pre><code class="language-text">id                       283285
state                    283285
stop_date                283285
stop_time                283285
location_raw             282591
county_name              282580
county_fips              282580
fine_grained_location    282938
police_department        283285
driver_gender            281573
driver_age_raw           282114
driver_age               281999
driver_race_raw          279301
driver_race              278468
violation_raw            281107
violation                281107
search_conducted         283285
search_type_raw          281045
search_type                3419
contraband_found         283251
stop_outcome             280960
is_arrested              283285
officer_id               283273
dtype: int64
</code></pre>
<p>We can see that most columns have similar numbers of values besides <code>search_type</code>, which is not present for most of the rows, likely because most stops do not result in a search.</p>
<p>For our analysis, it will be best to have the exact same number of values for each field.  We'll go ahead now and make sure that every single cell has a value.</p>
<pre><code class="language-python"># Fill missing search type values with placeholder
df_vt['search_type'].fillna('N/A', inplace=True)

# Drop rows with missing values
df_vt.dropna(inplace=True)

df_vt.count()
</code></pre>
<br>
<p>When we count the values again, we'll see that each column has the exact same number of entries.</p>
<pre><code class="language-text">id                       273181
state                    273181
stop_date                273181
stop_time                273181
location_raw             273181
county_name              273181
county_fips              273181
fine_grained_location    273181
police_department        273181
driver_gender            273181
driver_age_raw           273181
driver_age               273181
driver_race_raw          273181
driver_race              273181
violation_raw            273181
violation                273181
search_conducted         273181
search_type_raw          273181
search_type              273181
contraband_found         273181
stop_outcome             273181
is_arrested              273181
officer_id               273181
dtype: int64
</code></pre>
<br>
<h4 id="12stopsbycounty">1.2 - Stops By County</h4>
<p>Let's get a list of all counties in the data set, along with how many traffic stops happened in each.</p>
<pre><code class="language-python">df_vt['county_name'].value_counts()
</code></pre>
<pre><code class="language-text">Windham County       37715
Windsor County       36464
Chittenden County    24815
Orange County        24679
Washington County    24633
Rutland County       22885
Addison County       22813
Bennington County    22250
Franklin County      19715
Caledonia County     16505
Orleans County       10344
Lamoille County       8604
Essex County          1239
Grand Isle County      520
Name: county_name, dtype: int64
</code></pre>
<p>If you're familiar with Vermont's geography, you'll notice that the police stops seem to be more concentrated in counties in the southern-half of the state.  The southern-half of the state is also where much of the cross-state traffic flows in transit to and from New Hampshire, Massachusetts, and New York.  Since the traffic stop data is from the state troopers, this interstate highway traffic could potentially explain why we see more traffic stops in these counties.</p>
<p>Here's a quick map generated with <a href="https://public.tableau.com/profile/patrick.triest#!/vizhome/VtPoliceStops/Sheet1">Tableau</a> to visualize this regional distribution.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/vermont_map.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="13violations">1.3 - Violations</h4>
<p>We can also check out the distribution of traffic stop reasons.</p>
<pre><code class="language-python">df_vt['violation'].value_counts()
</code></pre>
<pre><code class="language-text">Moving violation      212100
Equipment              50600
Other                   9768
DUI                      711
Other (non-mapped)         2
Name: violation, dtype: int64
</code></pre>
<p>Unsurprisingly, the top reason for a traffic stop is <code>Moving Violation</code> (speeding, reckless driving, etc.), followed by <code>Equipment</code> (faulty lights, illegal modifications, etc.).</p>
<p>By using the <code>violation_raw</code> fields as reference, we can see that the <code>Other</code> category includes &quot;Investigatory Stop&quot; (the police have reason to suspect that the driver of the vehicle has committed a crime) and  &quot;Externally Generated Stop&quot; (possibly as a result of a 911 call, or a referral from municipal police departments).</p>
<p><code>DUI</code> (&quot;driving under the influence&quot;, i.e. drunk driving) is surprisingly the least prevalent, with only 711 total recorded stops for this reason over the five year period (2010-2015) that the dataset covers.  This seems low, since <a href="http://www.statisticbrain.com/number-of-dui-arrests-per-state/">Vermont had 2,647 DUI arrests in 2015</a>, so I suspect that a large proportion of these arrests were performed by municipal police departments, and/or began with a <code>Moving Violation</code> stop, instead of a more specific <code>DUI</code> stop.</p>
<h4 id="14outcomes">1.4 - Outcomes</h4>
<p>We can also examine the traffic stop outcomes.</p>
<pre><code class="language-python">df_vt['stop_outcome'].value_counts()
</code></pre>
<pre><code class="language-text">Written Warning         166488
Citation                103401
Arrest for Violation      3206
Warrant Arrest              76
Verbal Warning              10
Name: stop_outcome, dtype: int64
</code></pre>
<p>A majority of stops result in a written warning - which goes on the record but carries no direct penalty.  A bit over 1/3 of the stops result in a citation (commonly known as a ticket), which comes with a direct fine and can carry other negative side-effects such as raising a driver's auto insurance premiums.</p>
<p>The decision to give a warning or a citation is often at the discretion of the police officer, so this could be a good source for studying bias.</p>
<h4 id="15stopsbygender">1.5 - Stops By Gender</h4>
<p>Let's break down the traffic stops by gender.</p>
<pre><code class="language-python">df_vt['driver_gender'].value_counts()
</code></pre>
<pre><code class="language-text">M    179678
F    101895
Name: driver_gender, dtype: int64
</code></pre>
<p>We can see that approximately 36% of the stops are of women drivers, and 64% are of men.</p>
<h4 id="16stopsbyrace">1.6 - Stops By Race</h4>
<p>Let's also examine the distribution by race.</p>
<pre><code class="language-python">df_vt['driver_race'].value_counts()
</code></pre>
<pre><code class="language-text">White       266216
Black         5741
Asian         3607
Hispanic      2625
Other          279
Name: driver_race, dtype: int64
</code></pre>
<p>Most traffic stops are of white drivers, which is to be expected since <a href="https://www.census.gov/quickfacts/VT">Vermont is around 94% white</a> (making it the 2nd-least diverse state in the nation, <a href="https://www.census.gov/quickfacts/ME">behind Maine</a>).  Since white drivers make up approximately 94% of the traffic stops, there's no obvious bias here for pulling over non-white drivers vs white drivers.  Using the same methodology, however, we can also see that while black drivers make up roughly 2% of all traffic stops, <a href="https://www.census.gov/quickfacts/VT">only 1.3% of Vermont's population is black</a>.</p>
<p>Let's keep on analyzing the data to see what else we can learn.</p>
<h4 id="17policestopfrequencybyraceandage">1.7 - Police Stop Frequency by Race and Age</h4>
<p>It would be interesting to visualize how the frequency of police stops breaks down by both race and age.</p>
<pre><code class="language-python">fig, ax = plt.subplots()
ax.set_xlim(15, 70)
for race in df_vt['driver_race'].unique():
    s = df_vt[df_vt['driver_race'] == race]['driver_age']
    s.plot.kde(ax=ax, label=race)
ax.legend()
</code></pre>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/race_age_dist.png" alt="Exploring United States Policing Data Using Python"></p>
<p>We can see that young drivers in their late teens and early twenties are the most likely to be pulled over.  Between ages 25 and 35, the stop rate of each demographic drops off quickly. As far as the racial comparison goes, the most interesting disparity is that for white drivers between the ages of 35 and 50 the pull-over rate stays mostly flat, whereas for other races it continues to drop steadily.</p>
<h2 id="2violationandoutcomeanalysis">2 - Violation and Outcome Analysis</h2>
<p>Now that we've got a feel for the dataset, we can start getting into some more advanced analysis.</p>
<p>One interesting topic that we touched on earlier is the fact that the decision to penalize a driver with a ticket or a citation is often at the discretion of the police officer.  With this in mind, let's see if there are any discernable patterns in driver demographics and stop outcome.</p>
<h4 id="20analysishelperfunction">2.0 - Analysis Helper Function</h4>
<p>In order to assist in this analysis, we'll define a helper function to aggregate a few important statistics from our dataset.</p>
<ul>
<li><code>citations_per_warning</code> - The ratio of citations to warnings.  A higher number signifies a greater likelihood of being ticketed instead of getting off with a warning.</li>
<li><code>arrest_rate</code> - The percentage of stops that end in an arrest.</li>
</ul>
<pre><code class="language-python">def compute_outcome_stats(df):
    &quot;&quot;&quot;Compute statistics regarding the relative quanties of arrests, warnings, and citations&quot;&quot;&quot;
    n_total = len(df)
    n_warnings = len(df[df['stop_outcome'] == 'Written Warning'])
    n_citations = len(df[df['stop_outcome'] == 'Citation'])
    n_arrests = len(df[df['stop_outcome'] == 'Arrest for Violation'])
    citations_per_warning = n_citations / n_warnings
    arrest_rate = n_arrests / n_total

    return(pd.Series(data = {
        'n_total': n_total,
        'n_warnings': n_warnings,
        'n_citations': n_citations,
        'n_arrests': n_arrests,
        'citations_per_warning': citations_per_warning,
        'arrest_rate': arrest_rate
    }))
</code></pre>
<p>Let's test out this helper function by applying it to the entire dataframe.</p>
<pre><code class="language-python">compute_outcome_stats(df_vt)
</code></pre>
<pre><code class="language-text">arrest_rate                   0.011721
citations_per_warning         0.620751
n_arrests                  3199.000000
n_citations              103270.000000
n_total                  272918.000000
n_warnings               166363.000000
dtype: float64
</code></pre>
<p>In the above result, we can see that about <code>1.17%</code> of traffic stops result in an arrest, and there are on-average <code>0.62</code> citations (tickets) issued per warning.  This data passes the sanity check, but it's too coarse to provide many interesting insights.  Let's dig deeper.</p>
<h4 id="21breakdownbygender">2.1 - Breakdown By Gender</h4>
<p>Using our helper function, along with the Pandas dataframe <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html">groupby</a> method, we can easily compare these stats for male and female drivers.</p>
<pre><code class="language-python">df_vt.groupby('driver_gender').apply(compute_outcome_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>arrest_rate</th>
      <th>citations_per_warning</th>
      <th>n_arrests</th>
      <th>n_citations</th>
      <th>n_total</th>
      <th>n_warnings</th>
    </tr>
    <tr>
      <th>driver_gender</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>F</th>
      <td>0.007038</td>
      <td>0.548033</td>
      <td>697.0</td>
      <td>34805.0</td>
      <td>99036.0</td>
      <td>63509.0</td>
    </tr>
    <tr>
      <th>M</th>
      <td>0.014389</td>
      <td>0.665652</td>
      <td>2502.0</td>
      <td>68465.0</td>
      <td>173882.0</td>
      <td>102854.0</td>
    </tr>
  </tbody>
</table>
<p>This is a simple example of the common <a href="https://pandas.pydata.org/pandas-docs/stable/groupby.html">split-apply-combine</a> technique.  We'll be building on this pattern for the remainder of the tutorial, so make sure that you understand how this comparison table is generated before continuing.</p>
<p>We can see here that men are, on average, twice as likely to be arrested during a traffic stop, and are also slightly more likely to be given a citation than women.  It is, of course, not clear from the data whether this is indicative of any bias by the police officers, or if it reflects that men are being pulled over for more serious offenses than women on average.</p>
<h4 id="22breakdownbyrace">2.2 - Breakdown By Race</h4>
<p>Let's now compute the same comparison, grouping by race.</p>
<pre><code class="language-python">df_vt.groupby('driver_race').apply(compute_outcome_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>arrest_rate</th>
      <th>citations_per_warning</th>
      <th>n_arrests</th>
      <th>n_citations</th>
      <th>n_total</th>
      <th>n_warnings</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.006384</td>
      <td>1.002339</td>
      <td>22.0</td>
      <td>1714.0</td>
      <td>3446.0</td>
      <td>1710.0</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.019925</td>
      <td>0.802379</td>
      <td>111.0</td>
      <td>2428.0</td>
      <td>5571.0</td>
      <td>3026.0</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.016393</td>
      <td>0.865827</td>
      <td>42.0</td>
      <td>1168.0</td>
      <td>2562.0</td>
      <td>1349.0</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.011571</td>
      <td>0.611188</td>
      <td>3024.0</td>
      <td>97960.0</td>
      <td>261339.0</td>
      <td>160278.0</td>
    </tr>
  </tbody>
</table>
<p>Ok, this is interesting.  We can see that Asian drivers are arrested at the lowest rate, but receive tickets at the highest rate (roughly 1 ticket per warning).  Black and Hispanic drivers are both arrested at a higher rate and ticketed at a higher rate than white drivers.</p>
<p>Let's visualize these results.</p>
<pre><code class="language-python">race_agg = df_vt.groupby(['driver_race']).apply(compute_outcome_stats)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=figsize)
race_agg['citations_per_warning'].plot.barh(ax=axes[0], figsize=figsize, title=&quot;Citation Rate By Race&quot;)
race_agg['arrest_rate'].plot.barh(ax=axes[1], figsize=figsize, title='Arrest Rate By Race')
</code></pre>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/citations_and_arrests_by_race.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="23groupbyoutcomeandviolation">2.3 - Group By Outcome and Violation</h4>
<p>We'll deepen our analysis by grouping each statistic by the violation that triggered the traffic stop.</p>
<pre><code class="language-python">df_vt.groupby(['driver_race','violation']).apply(compute_outcome_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>arrest_rate</th>
      <th>citations_per_warning</th>
      <th>n_arrests</th>
      <th>n_citations</th>
      <th>n_total</th>
      <th>n_warnings</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th>violation</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="4" valign="top">Asian</th>
      <th>DUI</th>
      <td>0.200000</td>
      <td>0.333333</td>
      <td>2.0</td>
      <td>2.0</td>
      <td>10.0</td>
      <td>6.0</td>
    </tr>
    <tr>
      <th>Equipment</th>
      <td>0.006270</td>
      <td>0.132143</td>
      <td>2.0</td>
      <td>37.0</td>
      <td>319.0</td>
      <td>280.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.005563</td>
      <td>1.183190</td>
      <td>17.0</td>
      <td>1647.0</td>
      <td>3056.0</td>
      <td>1392.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.016393</td>
      <td>0.875000</td>
      <td>1.0</td>
      <td>28.0</td>
      <td>61.0</td>
      <td>32.0</td>
    </tr>
    <tr>
      <th rowspan="4" valign="top">Black</th>
      <th>DUI</th>
      <td>0.200000</td>
      <td>0.142857</td>
      <td>2.0</td>
      <td>1.0</td>
      <td>10.0</td>
      <td>7.0</td>
    </tr>
    <tr>
      <th>Equipment</th>
      <td>0.029181</td>
      <td>0.220651</td>
      <td>26.0</td>
      <td>156.0</td>
      <td>891.0</td>
      <td>707.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.016052</td>
      <td>0.942385</td>
      <td>71.0</td>
      <td>2110.0</td>
      <td>4423.0</td>
      <td>2239.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.048583</td>
      <td>2.205479</td>
      <td>12.0</td>
      <td>161.0</td>
      <td>247.0</td>
      <td>73.0</td>
    </tr>
    <tr>
      <th rowspan="4" valign="top">Hispanic</th>
      <th>DUI</th>
      <td>0.200000</td>
      <td>3.000000</td>
      <td>2.0</td>
      <td>6.0</td>
      <td>10.0</td>
      <td>2.0</td>
    </tr>
    <tr>
      <th>Equipment</th>
      <td>0.023560</td>
      <td>0.187898</td>
      <td>9.0</td>
      <td>59.0</td>
      <td>382.0</td>
      <td>314.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.012422</td>
      <td>1.058824</td>
      <td>26.0</td>
      <td>1062.0</td>
      <td>2093.0</td>
      <td>1003.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.064935</td>
      <td>1.366667</td>
      <td>5.0</td>
      <td>41.0</td>
      <td>77.0</td>
      <td>30.0</td>
    </tr>
    <tr>
      <th rowspan="5" valign="top">White</th>
      <th>DUI</th>
      <td>0.192364</td>
      <td>0.455026</td>
      <td>131.0</td>
      <td>172.0</td>
      <td>681.0</td>
      <td>378.0</td>
    </tr>
    <tr>
      <th>Equipment</th>
      <td>0.012233</td>
      <td>0.190486</td>
      <td>599.0</td>
      <td>7736.0</td>
      <td>48965.0</td>
      <td>40612.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.008635</td>
      <td>0.732720</td>
      <td>1747.0</td>
      <td>84797.0</td>
      <td>202321.0</td>
      <td>115729.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.058378</td>
      <td>1.476672</td>
      <td>547.0</td>
      <td>5254.0</td>
      <td>9370.0</td>
      <td>3558.0</td>
    </tr>
    <tr>
      <th>Other (non-mapped)</th>
      <td>0.000000</td>
      <td>1.000000</td>
      <td>0.0</td>
      <td>1.0</td>
      <td>2.0</td>
      <td>1.0</td>
    </tr>
  </tbody>
</table>
<p>Ok, well this table looks interesting, but it's rather large and visually overwhelming.  Let's trim down that dataset in order to retrieve a more focused subset of information.</p>
<pre><code class="language-python"># Create new column to represent whether the driver is white
df_vt['is_white'] = df_vt['driver_race'] == 'White'

# Remove violation with too few data points
df_vt_filtered = df_vt[~df_vt['violation'].isin(['Other (non-mapped)', 'DUI'])]
</code></pre>
<p>We're generating a new column to represent whether or not the driver is white.  We are also generating a filtered version of the dataframe that strips out the two violation types with the fewest datapoints.</p>
<blockquote>
<p>We not assigning the filtered dataframe to <code>df_vt</code> since we'll want to keep using the complete unfiltered dataset in the next sections.</p>
</blockquote>
<p>Let's redo our race + violation aggregation now, using our filtered dataset.</p>
<pre><code class="language-python">df_vt_filtered.groupby(['is_white','violation']).apply(compute_outcome_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>arrest_rate</th>
      <th>citations_per_warning</th>
      <th>n_arrests</th>
      <th>n_citations</th>
      <th>n_total</th>
      <th>n_warnings</th>
    </tr>
    <tr>
      <th>is_white</th>
      <th>violation</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">False</th>
      <th>Equipment</th>
      <td>0.023241</td>
      <td>0.193697</td>
      <td>37.0</td>
      <td>252.0</td>
      <td>1592.0</td>
      <td>1301.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.011910</td>
      <td>1.039922</td>
      <td>114.0</td>
      <td>4819.0</td>
      <td>9572.0</td>
      <td>4634.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.046753</td>
      <td>1.703704</td>
      <td>18.0</td>
      <td>230.0</td>
      <td>385.0</td>
      <td>135.0</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">True</th>
      <th>Equipment</th>
      <td>0.012233</td>
      <td>0.190486</td>
      <td>599.0</td>
      <td>7736.0</td>
      <td>48965.0</td>
      <td>40612.0</td>
    </tr>
    <tr>
      <th>Moving violation</th>
      <td>0.008635</td>
      <td>0.732720</td>
      <td>1747.0</td>
      <td>84797.0</td>
      <td>202321.0</td>
      <td>115729.0</td>
    </tr>
    <tr>
      <th>Other</th>
      <td>0.058378</td>
      <td>1.476672</td>
      <td>547.0</td>
      <td>5254.0</td>
      <td>9370.0</td>
      <td>3558.0</td>
    </tr>
  </tbody>
</table>
<p>Ok great, this is much easier to read.</p>
<p>In the above table, we can see that non-white drivers are more likely to be arrested during a stop that was initiated due to an equipment or moving violation, but white drivers are more likely to be arrested for a traffic stop resulting from &quot;Other&quot; reasons.  Non-white drivers are more likely than white drivers to be given tickets for each violation.</p>
<h4 id="24visualizestopoutcomeandviolationresults">2.4 - Visualize Stop Outcome and Violation Results</h4>
<p>Let's generate a bar chart now in order to visualize this data broken down by race.</p>
<pre><code class="language-python">race_stats = df_vt_filtered.groupby(['violation', 'driver_race']).apply(compute_outcome_stats).unstack()
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=figsize)
race_stats.plot.bar(y='arrest_rate', ax=axes[0], title='Arrest Rate By Race and Violation')
race_stats.plot.bar(y='citations_per_warning', ax=axes[1], title='Citations Per Warning By Race and Violation')
</code></pre>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/citations_and_arrests_by_race_and_violation.png" alt="Exploring United States Policing Data Using Python"></p>
<p>We can see in these charts that Hispanic and Black drivers are generally arrested at a higher rate than white drivers (with the exception of the rather ambiguous &quot;Other&quot; category). and  that Black drivers are more likely, across the board, to be issued a citation than white drivers.  Asian drivers are arrested at very low rates, and their citation rates are highly variable.</p>
<p>These results are compelling, and are suggestive of potential racial bias, but they are too inconsistent across violation types to provide any definitive answers.  Let's dig deeper to see what else we can find.</p>
<h2 id="3searchoutcomeanalysis">3 - Search Outcome Analysis</h2>
<p>Two of the more interesting fields available to us are <code>search_conducted</code> and <code>contraband_found</code>.</p>
<p>In the analysis by the &quot;Stanford Open Policing Project&quot;, they use these two fields to perform what is known as an &quot;outcome test&quot;.</p>
<p>On the <a href="https://openpolicing.stanford.edu/findings/">project website</a>, the &quot;outcome test&quot; is summarized clearly.</p>
<blockquote>
<p>In the 1950s, the Nobel prize-winning economist Gary Becker proposed an elegant method to test for bias in search decisions: the outcome test.</p>
<p>Becker proposed looking at search outcomes. If officers don’t discriminate, he argued, they should find contraband — like illegal drugs or weapons — on searched minorities at the same rate as on searched whites. If searches of minorities turn up contraband at lower rates than searches of whites, the outcome test suggests officers are applying a double standard, searching minorities on the basis of less evidence.&quot;</p>
</blockquote>
<p><a href="https://openpolicing.stanford.edu/findings/">Findings, Stanford Open Policing Project</a></p>
<p>The authors of the project also make the point that only using the &quot;hit rate&quot;, or the rate of searches where contraband is found, can be misleading.  For this reason, we'll also need to use the &quot;search rate&quot; in our analysis - the rate at which a traffic stop results in a search.</p>
<p>We'll now use the available data to perform our own outcome test, in order to determine whether minorities in Vermont are routinely searched on the basis of less evidence than white drivers.</p>
<h4 id="30computesearchrateandhitrate">3.0 Compute Search Rate and Hit Rate</h4>
<p>We'll define a new function to compute the search rate and hit rate for the traffic stops in our dataframe.</p>
<ul>
<li><strong>Search Rate</strong> - The rate at which a traffic stop results in a search.  A search rate of <code>0.20</code> would signify that out of 100 traffic stops, 20 resulted in a search.</li>
<li><strong>Hit Rate</strong> - The rate at which contraband is found in a search. A hit rate of <code>0.80</code> would signify that out of 100 searches, 80 searches resulted in contraband (drugs, unregistered weapons, etc.) being found.</li>
</ul>
<pre><code class="language-python">def compute_search_stats(df):
    &quot;&quot;&quot;Compute the search rate and hit rate&quot;&quot;&quot;
    search_conducted = df['search_conducted']
    contraband_found = df['contraband_found']
    n_stops     = len(search_conducted)
    n_searches  = sum(search_conducted)
    n_hits      = sum(contraband_found)

    # Filter out counties with too few stops
    if (n_stops) &lt; 50:
        search_rate = None
    else:
        search_rate = n_searches / n_stops

    # Filter out counties with too few searches
    if (n_searches) &lt; 5:
        hit_rate = None
    else:
        hit_rate = n_hits / n_searches

    return(pd.Series(data = {
        'n_stops': n_stops,
        'n_searches': n_searches,
        'n_hits': n_hits,
        'search_rate': search_rate,
        'hit_rate': hit_rate
    }))
</code></pre>
<br>
<h4 id="31computesearchstatsforentiredataset">3.1 - Compute Search Stats For Entire Dataset</h4>
<p>We can test our new function to determine the search rate and hit rate for the entire state.</p>
<pre><code class="language-python">compute_search_stats(df_vt)
</code></pre>
<pre><code class="language-text">hit_rate            0.796865
n_hits           2593.000000
n_searches       3254.000000
n_stops        272918.000000
search_rate         0.011923
dtype: float64
</code></pre>
<p>Here we can see that each traffic stop had a 1.2% change of resulting in a search, and each search had an 80% chance of yielding contraband.</p>
<h4 id="32comparesearchstatsbydrivergender">3.2 - Compare Search Stats By Driver Gender</h4>
<p>Using the Pandas <code>groupby</code> method, we can compute how the search stats differ by gender.</p>
<pre><code class="language-python">df_vt.groupby('driver_gender').apply(compute_search_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_gender</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>F</th>
      <td>0.789392</td>
      <td>506.0</td>
      <td>641.0</td>
      <td>99036.0</td>
      <td>0.006472</td>
    </tr>
    <tr>
      <th>M</th>
      <td>0.798699</td>
      <td>2087.0</td>
      <td>2613.0</td>
      <td>173882.0</td>
      <td>0.015027</td>
    </tr>
  </tbody>
</table>
<p>We can see here that men are three times as likely to be searched as women, and that 80% of searches for both genders resulted in contraband being found.  The data shows that men are searched and caught with contraband more often than women, but it is unclear whether there is any gender discrimination in deciding who to search since the hit rate is equal.</p>
<h4 id="33comparesearchstatsbyage">3.3 - Compare Search Stats By Age</h4>
<p>We can split the dataset into age buckets and perform the same analysis.</p>
<pre><code class="language-python">age_groups = pd.cut(df_vt[&quot;driver_age&quot;], np.arange(15, 70, 5))
df_vt.groupby(age_groups).apply(compute_search_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_age</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>(15, 20]</th>
      <td>0.847988</td>
      <td>569.0</td>
      <td>671.0</td>
      <td>27418.0</td>
      <td>0.024473</td>
    </tr>
    <tr>
      <th>(20, 25]</th>
      <td>0.838000</td>
      <td>838.0</td>
      <td>1000.0</td>
      <td>43275.0</td>
      <td>0.023108</td>
    </tr>
    <tr>
      <th>(25, 30]</th>
      <td>0.788462</td>
      <td>492.0</td>
      <td>624.0</td>
      <td>34759.0</td>
      <td>0.017952</td>
    </tr>
    <tr>
      <th>(30, 35]</th>
      <td>0.766756</td>
      <td>286.0</td>
      <td>373.0</td>
      <td>27746.0</td>
      <td>0.013443</td>
    </tr>
    <tr>
      <th>(35, 40]</th>
      <td>0.742991</td>
      <td>159.0</td>
      <td>214.0</td>
      <td>23203.0</td>
      <td>0.009223</td>
    </tr>
    <tr>
      <th>(40, 45]</th>
      <td>0.692913</td>
      <td>88.0</td>
      <td>127.0</td>
      <td>24055.0</td>
      <td>0.005280</td>
    </tr>
    <tr>
      <th>(45, 50]</th>
      <td>0.575472</td>
      <td>61.0</td>
      <td>106.0</td>
      <td>24103.0</td>
      <td>0.004398</td>
    </tr>
    <tr>
      <th>(50, 55]</th>
      <td>0.706667</td>
      <td>53.0</td>
      <td>75.0</td>
      <td>22517.0</td>
      <td>0.003331</td>
    </tr>
    <tr>
      <th>(55, 60]</th>
      <td>0.833333</td>
      <td>30.0</td>
      <td>36.0</td>
      <td>17502.0</td>
      <td>0.002057</td>
    </tr>
    <tr>
      <th>(60, 65]</th>
      <td>0.500000</td>
      <td>6.0</td>
      <td>12.0</td>
      <td>12514.0</td>
      <td>0.000959</td>
    </tr>
  </tbody>
</table>
<p>We can see here that the search rate steadily declines as drivers get older, and that the hit rate also declines rapidly for older drivers.</p>
<h4 id="34comparesearchstatsbyrace">3.4 - Compare Search Stats By Race</h4>
<p>Now for the most interesting part - comparing search data by race.</p>
<pre><code class="language-python">df_vt.groupby('driver_race').apply(compute_search_stats)
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.785714</td>
      <td>22.0</td>
      <td>28.0</td>
      <td>3446.0</td>
      <td>0.008125</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.686620</td>
      <td>195.0</td>
      <td>284.0</td>
      <td>5571.0</td>
      <td>0.050978</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.644231</td>
      <td>67.0</td>
      <td>104.0</td>
      <td>2562.0</td>
      <td>0.040593</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.813601</td>
      <td>2309.0</td>
      <td>2838.0</td>
      <td>261339.0</td>
      <td>0.010859</td>
    </tr>
  </tbody>
</table>
<p>Black and Hispanic drivers are searched at much higher rates than White drivers (5% and 4% of traffic stops respectively, versus 1% for white drivers), but the searches of these drivers only yield contraband 60-70% of the time, compared to 80% of the time for White drivers.</p>
<p>Let's rephrase these results.</p>
<p><em>Black drivers are <strong>500% more likely</strong> to be searched than white drivers during a traffic stop, but are <strong>13% less likely</strong> to be caught with contraband in the event of a search.</em></p>
<p><em>Hispanic drivers are <strong>400% more likely</strong> to be searched than white drivers during a traffic stop, but are <strong>17% less likely</strong> to be caught with contraband in the event of a search.</em></p>
<h4 id="35comparesearchstatsbyraceandlocation">3.5 - Compare Search Stats By Race and Location</h4>
<p>Let's add in location as another factor.  It's possible that some counties (such as those with larger towns or with interstate highways where opioid trafficking is prevalent) have a much higher search rate / lower hit rates for both white and non-white drivers, but also have greater racial diversity, leading to distortion in the overall stats.  By controlling for location, we can determine if this is the case.</p>
<p>We'll define three new helper functions to generate the visualizations.</p>
<pre><code class="language-python">def generate_comparison_scatter(df, ax, state, race, field, color):
    &quot;&quot;&quot;Generate scatter plot comparing field for white drivers with minority drivers&quot;&quot;&quot;
    race_location_agg = df.groupby(['county_fips','driver_race']).apply(compute_search_stats).reset_index().dropna()
    race_location_agg = race_location_agg.pivot(index='county_fips', columns='driver_race', values=field)
    ax = race_location_agg.plot.scatter(ax=ax, x='White', y=race, s=150, label=race, color=color)
    return ax

def format_scatter_chart(ax, state, field):
    &quot;&quot;&quot;Format and label to scatter chart&quot;&quot;&quot;
    ax.set_xlabel('{} - White'.format(field))
    ax.set_ylabel('{} - Non-White'.format(field, race))
    ax.set_title(&quot;{} By County - {}&quot;.format(field, state))
    lim = max(ax.get_xlim()[1], ax.get_ylim()[1])
    ax.set_xlim(0, lim)
    ax.set_ylim(0, lim)
    diag_line, = ax.plot(ax.get_xlim(), ax.get_ylim(), ls=&quot;--&quot;, c=&quot;.3&quot;)
    ax.legend()
    return ax

def generate_comparison_scatters(df, state):
    &quot;&quot;&quot;Generate scatter plots comparing search rates of white drivers with black and hispanic drivers&quot;&quot;&quot;
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=figsize)
    generate_comparison_scatter(df, axes[0], state, 'Black', 'search_rate', 'red')
    generate_comparison_scatter(df, axes[0], state, 'Hispanic', 'search_rate', 'orange')
    generate_comparison_scatter(df, axes[0], state, 'Asian', 'search_rate', 'green')
    format_scatter_chart(axes[0], state, 'Search Rate')

    generate_comparison_scatter(df, axes[1], state, 'Black', 'hit_rate', 'red')
    generate_comparison_scatter(df, axes[1], state, 'Hispanic', 'hit_rate', 'orange')
    generate_comparison_scatter(df, axes[1], state, 'Asian', 'hit_rate', 'green')
    format_scatter_chart(axes[1], state, 'Hit Rate')

    return fig
</code></pre>
<p>We can now generate the scatter plots using the <code>generate_comparison_scatters</code> function.</p>
<pre><code class="language-python">generate_comparison_scatters(df_vt, 'VT')
</code></pre>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_VT.png" alt="Exploring United States Policing Data Using Python"></p>
<p>The plots above are comparing <code>search_rate</code> (left) and <code>hit_rate</code> (right) for minority drivers compared with white drivers in each county.  If all of the dots (each of which represents the stats for a single county and race) followed the diagonal center line, the implication would be that white drivers and non-white drivers are searched at the exact same rate with the exact same standard of evidence.</p>
<p>Unfortunately, this is not the case.  In the above charts, we can see that, for every county, the search rate is higher for Black and Hispanic drivers even though the hit rate is lower.</p>
<p>Let's define one more visualization helper function, to show all of these results on a single scatter plot.</p>
<pre><code class="language-python">def generate_county_search_stats_scatter(df, state):
    &quot;&quot;&quot;Generate a scatter plot of search rate vs. hit rate by race and county&quot;&quot;&quot;
    race_location_agg = df.groupby(['county_fips','driver_race']).apply(compute_search_stats)

    colors = ['blue','orange','red', 'green']
    fig, ax = plt.subplots(figsize=figsize)
    for c, frame in race_location_agg.groupby(level='driver_race'):
        ax.scatter(x=frame['hit_rate'], y=frame['search_rate'], s=150, label=c, color=colors.pop())
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.2), ncol=4, fancybox=True)
    ax.set_xlabel('Hit Rate')
    ax.set_ylabel('Search Rate')
    ax.set_title(&quot;Search Stats By County and Race - {}&quot;.format(state))
    return fig
</code></pre>
<pre><code class="language-python">generate_county_search_stats_scatter(df_vt, &quot;VT&quot;)
</code></pre>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_VT.png" alt="Exploring United States Policing Data Using Python"></p>
<p>As the old idiom goes - <em>a picture is worth a thousand words</em>.  The above chart is one of those pictures - and the name of the picture is &quot;Systemic Racism&quot;.</p>
<p>The search rates and hit rates for white drivers in most counties are consistently clustered around 80% and 1% respectively.  We can see, however, that nearly every county searches Black and Hispanic drivers at a higher rate, and that these searches uniformly have a lower hit rate than those on White drivers.</p>
<p>This state-wide pattern of a higher search rate combined with a lower hit rate suggests that a lower standard of evidence is used when deciding to search Black and Hispanic drivers compared to when searching White drivers.</p>
<blockquote>
<p>You might notice that only one county is represented by Asian drivers - this is due to the lack of data for searches of Asian drivers in other counties.</p>
</blockquote>
<h2 id="4analyzingotherstates">4 - Analyzing Other States</h2>
<p>Vermont is a great state to test out our analysis on, but the dataset size is relatively small.  Let's now perform the same analysis on other states to determine if this pattern persists across state lines.</p>
<h4 id="40massachusetts">4.0 - Massachusetts</h4>
<p>First we'll generate the analysis for my home state, Massachusetts.  This time we'll have more data to work with - roughly 3.4 million traffic stops.</p>
<p>Download the dataset to your project's <code>/data</code> directory - <a href="https://stacks.stanford.edu/file/druid:py883nd2578/MA-clean.csv.gz">https://stacks.stanford.edu/file/druid:py883nd2578/MA-clean.csv.gz</a></p>
<p>We've developed a solid reusable formula for reading and visualizing each state's dataset, so let's wrap the entire recipe in a new helper function.</p>
<pre><code class="language-python">fields = ['county_fips', 'driver_race', 'search_conducted', 'contraband_found']
types = {
    'contraband_found': bool,
    'county_fips': float,
    'driver_race': object,
    'search_conducted': bool
}

def analyze_state_data(state):
    df = pd.read_csv('./data/{}-clean.csv.gz'.format(state), compression='gzip', low_memory=True, dtype=types, usecols=fields)
    df.dropna(inplace=True)
    df = df[df['driver_race'] != 'Other']
    generate_comparison_scatters(df, state)
    generate_county_search_stats_scatter(df, state)
    return df.groupby('driver_race').apply(compute_search_stats)
</code></pre>
<p>We're making a few optimizations here in order to make the analysis a bit more streamlined and computationally efficient.  By only reading the four columns that we're interested in, and by specifying the datatypes ahead of time, we'll be able to read larger datasets into memory more quickly.</p>
<pre><code class="language-python">analyze_state_data('MA')
</code></pre>
<p>The first output is a statewide table of search rate and hit rate by race.</p>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.331169</td>
      <td>357.0</td>
      <td>1078.0</td>
      <td>101942.0</td>
      <td>0.010575</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.487150</td>
      <td>4170.0</td>
      <td>8560.0</td>
      <td>350498.0</td>
      <td>0.024422</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.449502</td>
      <td>5007.0</td>
      <td>11139.0</td>
      <td>337782.0</td>
      <td>0.032977</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.523037</td>
      <td>18220.0</td>
      <td>34835.0</td>
      <td>2527393.0</td>
      <td>0.013783</td>
    </tr>
  </tbody>
</table>
<p>We can see here again that Black and Hispanic drivers are searched at significantly higher rates than white drivers. The differences in hit rates are not as extreme as in Vermont, but they are still noticeably lower for Black and Hispanic drivers than for White drivers.  Asian drivers, interestingly, are the least likely to be searched and also the least likely to have contraband if they are searched.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_MA.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_MA.png" alt="Exploring United States Policing Data Using Python"></p>
<p>If we compare the stats for MA to VT, we'll also notice that police in MA seem to use a much lower standard of evidence when searching a vehicle, with their searches averaging around a 50% hit rate, compared to 80% in VT.</p>
<p>The trend here is much less obvious than in Vermont, but it is still clear that traffic stops of Black and Hispanic drivers are more likely to result in a search, despite the fact the searches of White drivers are more likely to result in contraband being found.</p>
<h4 id="41wisconsinconnecticut">4.1 - Wisconsin &amp; Connecticut</h4>
<p>Wisconsin and Connecticut have been named as some of the <a href="https://www.wpr.org/wisconsin-considered-one-worst-states-racial-disparities">worst states in America for racial disparities</a>.  Let's see how their police stats stack up.</p>
<p>Again, you'll need to download the Wisconsin and Connecticut dataset to your project's <code>/data</code> directory.</p>
<ul>
<li>Wisconsin: <a href="https://stacks.stanford.edu/file/druid:py883nd2578/WI-clean.csv.gz">https://stacks.stanford.edu/file/druid:py883nd2578/WI-clean.csv.gz</a></li>
<li>Connecticut: <a href="https://stacks.stanford.edu/file/druid:py883nd2578/WI-clean.csv.gz">https://stacks.stanford.edu/file/druid:py883nd2578/WI-clean.csv.gz</a></li>
</ul>
<p>We can call our <code>analyze_state_data</code> function for Wisconsin once the dataset has been downloaded.</p>
<pre><code class="language-python">analyze_state_data('WI')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.470817</td>
      <td>121.0</td>
      <td>257.0</td>
      <td>24577.0</td>
      <td>0.010457</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.477574</td>
      <td>1299.0</td>
      <td>2720.0</td>
      <td>56050.0</td>
      <td>0.048528</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.415741</td>
      <td>449.0</td>
      <td>1080.0</td>
      <td>35210.0</td>
      <td>0.030673</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.526300</td>
      <td>5103.0</td>
      <td>9696.0</td>
      <td>778227.0</td>
      <td>0.012459</td>
    </tr>
  </tbody>
</table>
<p>The trends here are starting to look familiar.  White drivers in Wisconsin are much less likely to be searched than non-white drivers (aside from Asians, who tend to be searched at around the same rates as whites).  Searches of non-white drivers are, again, less likely to yield contraband than searches on white drivers.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_WI.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_WI.png" alt="Exploring United States Policing Data Using Python"></p>
<p>We can see here, yet again, that the standard of evidence for searching Black and Hispanic drivers is lower in virtually every county than for White drivers.  In one outlying county, almost 25% (!) of traffic stops for Black drivers resulted in a search, even though only half of those searches yielded contraband.</p>
<p>Let's do the same analysis for Connecticut</p>
<pre><code class="language-python">analyze_state_data('CT')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.384615</td>
      <td>10.0</td>
      <td>26.0</td>
      <td>5949.0</td>
      <td>0.004370</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.284072</td>
      <td>346.0</td>
      <td>1218.0</td>
      <td>37460.0</td>
      <td>0.032515</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.291925</td>
      <td>282.0</td>
      <td>966.0</td>
      <td>31154.0</td>
      <td>0.031007</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.379344</td>
      <td>1179.0</td>
      <td>3108.0</td>
      <td>242314.0</td>
      <td>0.012826</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_CT.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_CT.png" alt="Exploring United States Policing Data Using Python"></p>
<p>Again, the pattern persists.</p>
<h4 id="42arizona">4.2 - Arizona</h4>
<p>We can generate each result rather quickly for each state (with available data), once we've downloaded each dataset.</p>
<pre><code class="language-python">analyze_state_data('AZ')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.196664</td>
      <td>224.0</td>
      <td>1139.0</td>
      <td>48177.0</td>
      <td>0.023642</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.255548</td>
      <td>2188.0</td>
      <td>8562.0</td>
      <td>116795.0</td>
      <td>0.073308</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.160930</td>
      <td>5943.0</td>
      <td>36929.0</td>
      <td>501619.0</td>
      <td>0.073620</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.242564</td>
      <td>9288.0</td>
      <td>38291.0</td>
      <td>1212652.0</td>
      <td>0.031576</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_AZ.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_AZ.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="43colorado">4.3 - Colorado</h4>
<pre><code class="language-python">analyze_state_data('CO')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.537634</td>
      <td>50.0</td>
      <td>93.0</td>
      <td>32471.0</td>
      <td>0.002864</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.481283</td>
      <td>270.0</td>
      <td>561.0</td>
      <td>71965.0</td>
      <td>0.007795</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.450454</td>
      <td>1041.0</td>
      <td>2311.0</td>
      <td>308499.0</td>
      <td>0.007491</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.651388</td>
      <td>3638.0</td>
      <td>5585.0</td>
      <td>1767804.0</td>
      <td>0.003159</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_CO.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_CO.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="44washington">4.4 - Washington</h4>
<pre><code class="language-python">analyze_state_data('WA')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.087143</td>
      <td>608.0</td>
      <td>6977.0</td>
      <td>352063.0</td>
      <td>0.019817</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.130799</td>
      <td>1717.0</td>
      <td>13127.0</td>
      <td>254577.0</td>
      <td>0.051564</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.103366</td>
      <td>2128.0</td>
      <td>20587.0</td>
      <td>502254.0</td>
      <td>0.040989</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.156008</td>
      <td>15768.0</td>
      <td>101072.0</td>
      <td>4279273.0</td>
      <td>0.023619</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_WA.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_WA.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="45northcarolina">4.5 - North Carolina</h4>
<pre><code class="language-python">analyze_state_data('NC')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.104377</td>
      <td>31.0</td>
      <td>297.0</td>
      <td>46287.0</td>
      <td>0.006416</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.182489</td>
      <td>1955.0</td>
      <td>10713.0</td>
      <td>1222533.0</td>
      <td>0.008763</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.119330</td>
      <td>776.0</td>
      <td>6503.0</td>
      <td>368878.0</td>
      <td>0.017629</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.153850</td>
      <td>3387.0</td>
      <td>22015.0</td>
      <td>3146302.0</td>
      <td>0.006997</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_NC.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_NC.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="46texas">4.6 - Texas</h4>
<p>You might want to let this one run while you go fix yourself a cup of coffee or tea.  At almost 24 million traffic stops, the Texas dataset takes a rather long time to process.</p>
<pre><code class="language-python">analyze_state_data('TX')
</code></pre>
<table border="0" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>hit_rate</th>
      <th>n_hits</th>
      <th>n_searches</th>
      <th>n_stops</th>
      <th>search_rate</th>
    </tr>
    <tr>
      <th>driver_race</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Asian</th>
      <td>0.289271</td>
      <td>976.0</td>
      <td>3374.0</td>
      <td>349105.0</td>
      <td>0.009665</td>
    </tr>
    <tr>
      <th>Black</th>
      <td>0.345983</td>
      <td>27588.0</td>
      <td>79738.0</td>
      <td>2300427.0</td>
      <td>0.034662</td>
    </tr>
    <tr>
      <th>Hispanic</th>
      <td>0.219449</td>
      <td>37080.0</td>
      <td>168969.0</td>
      <td>6525365.0</td>
      <td>0.025894</td>
    </tr>
    <tr>
      <th>White</th>
      <td>0.335098</td>
      <td>83157.0</td>
      <td>248157.0</td>
      <td>13576726.0</td>
      <td>0.018278</td>
    </tr>
  </tbody>
</table>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/search_scatters_TX.png" alt="Exploring United States Policing Data Using Python"><br>
<img src="https://cdn.patricktriest.com/blog/images/posts/policing-data/county_scatter_TX.png" alt="Exploring United States Policing Data Using Python"></p>
<h4 id="47evenmoredatavisualizations">4.7 - Even more data visualizations</h4>
<p>I highly recommend that you visit the <a href="https://openpolicing.stanford.edu/findings/">Stanford Open Policing Project results page</a> for more visualizations of this data.  Here you can browse the search outcome results for all available states, and explore additional analysis that the researchers have performed such as stop rate by race (using county population demographics data) as well as the effects of recreational marijuana legalization on search rates.</p>
<h2 id="5whatnext">5 - What next?</h2>
<p>Do these results imply that all police officers are overtly racist?  <strong>No.</strong></p>
<p>Do they show that Black and Hispanic drivers are searched much more frequently than white drivers, often with a lower standard of evidence?  <strong>Yes.</strong></p>
<p>What we are observing here appears to be a pattern of systemic racism.  The racial disparities revealed in this analysis are a reflection of an entrenched mistrust of certain minorities in the United States.  The data and accompanying analysis are indicative of social trends that are certainly not limited to police officers.  Racial discrimination is present at all levels of society from <a href="https://www.theguardian.com/us-news/2015/jun/22/zara-reports-culture-of-favoritism-based-on-race">retail stores</a> to the <a href="https://www.wired.com/story/tech-leadership-race-problem/">tech industry</a> to <a href="https://www.scientificamerican.com/article/sex-and-race-discrimination-in-academia-starts-even-before-grad-school/">academia</a>.</p>
<p>We are able to empirically identify these trends only because state police deparments (and the Open Policing team at Stanford) have made this data available to the public; no similar datasets exist for most other professions and industries.  Releasing datasets about these issues is commendable (but sadly still somewhat uncommon, especially in the private sector) and will help to further identify where these disparities exist, and to influence policies in order to provide a fair, effective way to counteract these biases.</p>
<p>To see the full official analysis for all 20 available states, check out the official findings paper here - <a href="https://5harad.com/papers/traffic-stops.pdf">https://5harad.com/papers/traffic-stops.pdf</a>.</p>
<p>I hope that this tutorial has provided the tools you might need to take this analysis further.  There's a <em>lot</em> more that you can do with the data than what we've covered here.</p>
<ul>
<li>Analyze police stops for your home state and county (if the data is available).  If the data is not available, submit a formal request to your local representatives and institutions that the data be made public.</li>
<li>Combine your analysis with US census data on the demographic, social, and economic stats about each county.</li>
<li>Create a web app to display the county trends on an interactive map.</li>
<li>Build a mobile app to warn drivers when they're entering an area that appears to be more distrusting of drivers of a certain race.</li>
<li>Open-source your own analysis, spread your findings, seek out peer review, maybe even write an explanatory blog post.</li>
</ul>
<p>The source code and figures for this analysis can be found in the companion Github repository - <a href="https://github.com/triestpa/Police-Analysis-Python">https://github.com/triestpa/Police-Analysis-Python</a></p>
<p>To view the completed IPython notebook, visit the page <a href="https://github.com/triestpa/Police-Analysis-Python/blob/master/traffic_stop_analysis.ipynb">here</a>.</p>
<p>The code for this project is 100% open source (<a href="https://github.com/triestpa/Police-Analysis-Python/blob/master/LICENSE">MIT license</a>), so feel free to use it however you see fit in your own projects.</p>
<p>As always, please feel free to comment below with any questions, comments, or criticisms.</p>
</div>]]></content:encoded></item><item><title><![CDATA[You Should Learn Regex]]></title><description><![CDATA[Regular Expressions (Regex): One of the most powerful, widely applicable, and sometimes intimidating techniques in software engineering.]]></description><link>http://blog.patricktriest.com/you-should-learn-regex/</link><guid isPermaLink="false">59be0eb44283e45fbfa65488</guid><category><![CDATA[Guides]]></category><category><![CDATA[Miscellaneous]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Sun, 08 Oct 2017 12:00:00 GMT</pubDate><media:content url="https://blog-images.patricktriest.com/uploads/regex-cover.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog-images.patricktriest.com/uploads/regex-cover.jpg" alt="You Should Learn Regex"><p>Regular Expressions (Regex): One of the most powerful, widely applicable, and sometimes intimidating techniques in software engineering.  From validating email addresses to performing complex code refactors, regular expressions have a wide range of uses and are an essential entry in any software engineer's toolbox.</p>
<h4 id="whatisaregularexpression">What is a regular expression?</h4>
<p>A regular expression (or regex, or regexp) is a way to describe complex search patterns using sequences of characters.</p>
<p>The complexity of the specialized regex syntax, however, can make these expressions somewhat inaccessible.  For instance, here is a basic regex that describes any time in the 24-hour HH/MM format.</p>
<pre><code class="language-text">\b([01]?[0-9]|2[0-3]):([0-5]\d)\b
</code></pre>
<p>If this looks complex to you now, don't worry, by the time we finish the tutorial understanding this expression will be trivial.</p>
<h4 id="learnoncewriteanywhere">Learn once, write anywhere</h4>
<p>Regular expressions can be used in virtually any programming language.  A knowledge of regex is very useful for validating user input, interacting with the Unix shell, searching/refactoring code in your favorite text editor, performing database text searches, and lots more.</p>
<p>In this tutorial, I'll attempt to give an provide an approachable introduction to regex syntax and usage in a variety of scenarios, languages, and environments.</p>
<p><a href="https://regex101.com">This web application</a> is my favorite tool for building, testing, and debugging regular expressions.  I highly recommend that you use it to test out the expressions that we'll cover in this tutorial.</p>
<p>The source code for the examples in this tutorial can be found at the Github repository here - <a href="https://github.com/triestpa/You-Should-Learn-Regex">https://github.com/triestpa/You-Should-Learn-Regex</a></p>
<h2 id="0matchanynumberline">0 - Match Any Number Line</h2>
<p>We'll start with a very simple example - Match any line that only contains numbers.</p>
<pre><code class="language-text">^[0-9]+$
</code></pre>
<br>
<p>Let's walk through this piece-by-piece.</p>
<ul>
<li><code>^</code> - Signifies the start of a line.</li>
<li><code>[0-9]</code> - Matches any digit between 0 and 9</li>
<li><code>+</code> - Matches one or more instance of the preceding expression.</li>
<li><code>$</code> - Signifies the end of the line.</li>
</ul>
<p>We could re-write this regex in pseudo-English as <code>[start of line][one or more digits][end of line]</code>.</p>
<p>Pretty simple right?</p>
<blockquote>
<p>We could replace <code>[0-9]</code> with <code>\d</code>, which will do the same thing (match any digit).</p>
</blockquote>
<p>The great thing about this expression (and regular expressions in general) is that it can be used, without much modification, <strong>in any programing language</strong>.</p>
<p>To demonstrate we'll now quickly go through how to perform this simple regex search on a text file using 16 of the most popular programming languages.</p>
<p>We can use the following input file (<code>test.txt</code>) as an example.</p>
<pre><code class="language-text">1234
abcde
12db2
5362

1
</code></pre>
<br>
<p>Each script will read the <code>test.txt</code> file, search it using our regular expression, and print the result (<code>'1234', '5362', '1'</code>) to the console.</p>
<h3 id="languageexamples">Language Examples</h3>
<h4 id="00javascriptnodejstypescript">0.0 - Javascript / Node.js / Typescript</h4>
<pre><code class="language-javascript">const fs = require('fs')
const testFile = fs.readFileSync('test.txt', 'utf8')
const regex = /^([0-9]+)$/gm
let results = testFile.match(regex)
console.log(results)
</code></pre>
<br>
<h4 id="01python">0.1 - Python</h4>
<pre><code class="language-python">import re

with open('test.txt', 'r') as f:
  test_string = f.read()
  regex = re.compile(r'^([0-9]+)$', re.MULTILINE)
  result = regex.findall(test_string)
  print(result)
</code></pre>
<br>
<h4 id="02r">0.2 - R</h4>
<pre><code class="language-r">fileLines &lt;- readLines(&quot;test.txt&quot;)
results &lt;- grep(&quot;^[0-9]+$&quot;, fileLines, value = TRUE)
print (results)
</code></pre>
<br>
<h4 id="03ruby">0.3 - Ruby</h4>
<pre><code class="language-ruby">File.open(&quot;test.txt&quot;, &quot;rb&quot;) do |f|
    test_str = f.read
    re = /^[0-9]+$/m
    test_str.scan(re) do |match|
        puts match.to_s
    end
end
</code></pre>
<br>
<h4 id="04haskell">0.4 - Haskell</h4>
<pre><code class="language-haskell">import Text.Regex.PCRE

main = do
  fileContents &lt;- readFile &quot;test.txt&quot;
  let stringResult = fileContents =~ &quot;^[0-9]+$&quot; :: AllTextMatches [] String
  print (getAllTextMatches stringResult)
</code></pre>
<br>
<h4 id="05perl">0.5 - Perl</h4>
<pre><code class="language-perl">open my $fh, '&lt;', 'test.txt' or die &quot;Unable to open file $!&quot;;
read $fh, my $file_content, -s $fh;
close $fh;
my $regex = qr/^([0-9]+)$/mp;
my @matches = $file_content =~ /$regex/g;
print join(',', @matches);
</code></pre>
<br>
<h4 id="06php">0.6 - PHP</h4>
<pre><code class="language-php">&lt;?php
$myfile = fopen(&quot;test.txt&quot;, &quot;r&quot;) or die(&quot;Unable to open file.&quot;);
$test_str = fread($myfile,filesize(&quot;test.txt&quot;));
fclose($myfile);
$re = '/^[0-9]+$/m';
preg_match_all($re, $test_str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
?&gt;
</code></pre>
<br>
<h4 id="07go">0.7 - Go</h4>
<pre><code class="language-go">package main

import (
    &quot;fmt&quot;
    &quot;io/ioutil&quot;
    &quot;regexp&quot;
)

func main() {
    testFile, err := ioutil.ReadFile(&quot;test.txt&quot;)
    if err != nil { fmt.Print(err) }
    testString := string(testFile)
    var re = regexp.MustCompile(`(?m)^([0-9]+)$`)
    var results = re.FindAllString(testString, -1)
    fmt.Println(results)
}
</code></pre>
<br>
<h4 id="08java">0.8 - Java</h4>
<pre><code class="language-java">import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;

class FileRegexExample {
  public static void main(String[] args) {
    try {
      String content = new String(Files.readAllBytes(Paths.get(&quot;test.txt&quot;)));
      Pattern pattern = Pattern.compile(&quot;^[0-9]+$&quot;, Pattern.MULTILINE);
      Matcher matcher = pattern.matcher(content);
      ArrayList&lt;String&gt; matchList = new ArrayList&lt;String&gt;();

      while (matcher.find()) {
        matchList.add(matcher.group());
      }

      System.out.println(matchList);
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}
</code></pre>
<br>
<h4 id="09kotlin">0.9 - Kotlin</h4>
<pre><code class="language-kotlin">import java.io.File
import kotlin.text.Regex
import kotlin.text.RegexOption

val file = File(&quot;test.txt&quot;)
val content:String = file.readText()
val regex = Regex(&quot;^[0-9]+$&quot;, RegexOption.MULTILINE)
val results = regex.findAll(content).map{ result -&gt; result.value }.toList()
println(results)
</code></pre>
<br>
<h4 id="010scala">0.10 - Scala</h4>
<pre><code class="language-scala">import scala.io.Source
import scala.util.matching.Regex

object FileRegexExample {
  def main(args: Array[String]) {
    val fileContents = Source.fromFile(&quot;test.txt&quot;).getLines.mkString(&quot;\n&quot;)
    val pattern = &quot;(?m)^[0-9]+$&quot;.r
    val results = (pattern findAllIn fileContents).mkString(&quot;,&quot;)
    println(results)
  }
}
</code></pre>
<br>
<h4 id="011swift">0.11 - Swift</h4>
<pre><code class="language-swift">import Cocoa
do {
    let fileText = try String(contentsOfFile: &quot;test.txt&quot;, encoding: String.Encoding.utf8)
    let regex = try! NSRegularExpression(pattern: &quot;^[0-9]+$&quot;, options: [ .anchorsMatchLines ])
    let results = regex.matches(in: fileText, options: [], range: NSRange(location: 0, length: fileText.characters.count))
    let matches = results.map { String(fileText[Range($0.range, in: fileText)!]) }
    print(matches)
} catch {
    print(error)
}
</code></pre>
<br>
<h4 id="012rust">0.12 - Rust</h4>
<pre><code class="language-rust">extern crate regex;
use std::fs::File;
use std::io::prelude::*;
use regex::Regex;

fn main() {
  let mut f = File::open(&quot;test.txt&quot;).expect(&quot;file not found&quot;);
  let mut test_str = String::new();
  f.read_to_string(&amp;mut test_str).expect(&quot;something went wrong reading the file&quot;);

  let regex = match Regex::new(r&quot;(?m)^([0-9]+)$&quot;) {
    Ok(r) =&gt; r,
    Err(e) =&gt; {
      println!(&quot;Could not compile regex: {}&quot;, e);
      return;
    }
  };

  let result = regex.find_iter(&amp;test_str);
  for mat in result {
    println!(&quot;{}&quot;, &amp;test_str[mat.start()..mat.end()]);
  }
}
</code></pre>
<br>
<h4 id="013c">0.13 - C#</h4>
<pre><code class="language-c#">using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Linq;

namespace RegexExample
{
    class FileRegexExample
    {
        static void Main()
        {
            string text = File.ReadAllText(@&quot;./test.txt&quot;, Encoding.UTF8);
            Regex regex = new Regex(&quot;^[0-9]+$&quot;, RegexOptions.Multiline);
            MatchCollection mc = regex.Matches(text);
            var matches = mc.OfType&lt;Match&gt;().Select(m =&gt; m.Value).ToArray();
            Console.WriteLine(string.Join(&quot; &quot;, matches));
        }
    }
}
</code></pre>
<br>
<h4 id="014c">0.14 - C++</h4>
<pre><code class="language-c++">#include &lt;string&gt;
#include &lt;fstream&gt;
#include &lt;iostream&gt;
#include &lt;sstream&gt;
#include &lt;regex&gt;
using namespace std;

int main () {
  ifstream t(&quot;test.txt&quot;);
  stringstream buffer;
  buffer &lt;&lt; t.rdbuf();
  string testString = buffer.str();

  regex numberLineRegex(&quot;(^|\n)([0-9]+)($|\n)&quot;);
  sregex_iterator it(testString.begin(), testString.end(), numberLineRegex);
  sregex_iterator it_end;

  while(it != it_end) {
    cout &lt;&lt; it -&gt; str();
    ++it;
  }
}
</code></pre>
<br>
<h4 id="015bash">0.15 - Bash</h4>
<pre><code class="language-bash">#!bin/bash
grep -E '^[0-9]+$' test.txt
</code></pre>
<br>
<p>Writing out the same operation in sixteen languages is a fun exercise, but we'll be mostly sticking with Javascript and Python (along with a bit of Bash at the end) for the rest of the tutorial since these languages (in my opinion) tend to yield the clearest and most readable implementations.</p>
<h2 id="1yearmatching">1 - Year Matching</h2>
<p>Let's go through another simple example - matching any valid year in the 20th or 21st centuries.</p>
<pre><code class="language-text">\b(19|20)\d{2}\b
</code></pre>
<p>We're starting and ending this regex with <code>\b</code> instead of <code>^</code> and <code>$</code>.  <code>\b</code> represents a <em>word boundary</em>, or a space between two words.  This will allow us to match years within the text blocks (instead of on their own lines), which is very useful for searching through, say, paragraph text.</p>
<ul>
<li><code>\b</code> - Word boundary</li>
<li><code>(19|20)</code> - Matches either '19' or '20' using the OR (<code>|</code>) operand.</li>
<li><code>\d{2}</code> - Two digits, same as <code>[0-9]{2}</code></li>
<li><code>\b</code> - Word boundary</li>
</ul>
<blockquote>
<p>Note that <code>\b</code> differs from <code>\s</code>, the code for a whitespace character.  <code>\b</code> searches for a place where a word character is not followed or preceded by another word-character, so <strong>it is searching for the absence of a word character</strong>, whereas <code>\s</code> is searching explicitly for a space character.  <code>\b</code> is especially appropriate for cases where we want to match a specific sequence/word, but not the whitespace before or after it.</p>
</blockquote>
<h4 id="10realworldexamplecountyearoccurrences">1.0 - Real-World Example - Count Year Occurrences</h4>
<p>We can use this expression in a Python script to find how many times each year in the 20th or 21st century is mentioned in a historical Wikipedia article.</p>
<pre><code class="language-python">import re
import urllib.request
import operator

# Download wiki page
url = &quot;https://en.wikipedia.org/wiki/Diplomatic_history_of_World_War_II&quot;
html = urllib.request.urlopen(url).read()

# Find all mentioned years in the 20th or 21st century
regex = r&quot;\b(?:19|20)\d{2}\b&quot;
matches = re.findall(regex, str(html))

# Form a dict of the number of occurrences of each year
year_counts = dict((year, matches.count(year)) for year in set(matches))

# Print the dict sorted in descending order
for year in sorted(year_counts, key=year_counts.get, reverse=True):
  print(year, year_counts[year])
</code></pre>
<br>
<p>The above script will print each year, along the number of times it is mentioned.</p>
<pre><code class="language-text">1941 137
1943 80
1940 76
1945 73
1939 71
...
</code></pre>
<br>
<h2 id="2timematching">2 - Time Matching</h2>
<p>Now we'll define a regex expression to match any time in the 24-hour format (<code>MM:HH</code>, such as 16:59).</p>
<pre><code class="language-text">\b([01]?[0-9]|2[0-3]):([0-5]\d)\b
</code></pre>
<ul>
<li><code>\b</code> - Word boundary</li>
<li><code>[01]</code> - 0 or 1</li>
<li><code>?</code> - Signifies that the preceding pattern is optional.</li>
<li><code>[0-9]</code> - any number between 0 and 9</li>
<li><code>|</code> - <code>OR</code> operand</li>
<li><code>2[0-3]</code> - 2, followed by any number between 0 and 3 (i.e. 20-23)</li>
<li><code>:</code> - Matches the <code>:</code> character</li>
<li><code>[0-5]</code> - Any number between 0 and 5</li>
<li><code>\d</code> - Any number between 0 and 9 (same as <code>[0-9]</code>)</li>
<li><code>\b</code> - Word boundary</li>
</ul>
<h4 id="20capturegroups">2.0 - Capture Groups</h4>
<p>You might have noticed something new in the above pattern - we're wrapping the hour and minute capture segments in parenthesis <code>( ... )</code>.  This allows us to define each part of the pattern as a <strong>capture group</strong>.</p>
<p>Capture groups allow us individually extract, transform, and rearrange pieces of each matched pattern.</p>
<h4 id="21realworldexampletimeparsing">2.1 - Real-World Example - Time Parsing</h4>
<p>For example, in the above 24-hour pattern, we've defined two capture groups - one for the hour and one for the minute.</p>
<p>We can extract these capture groups easily.</p>
<p>Here's how we could use Javascript to parse a 24-hour formatted time into hours and minutes.</p>
<pre><code class="language-javascript">const regex = /\b([01]?[0-9]|2[0-3]):([0-5]\d)/
const str = `The current time is 16:24`
const result = regex.exec(str)
console.log(`The current hour is ${result[1]}`)
console.log(`The current minute is ${result[2]}`)
</code></pre>
<blockquote>
<p>The zeroth capture group is always the entire matched expression.</p>
</blockquote>
<p>The above script will produce the following output.</p>
<pre><code class="language-text">The current hour is 16
The current minute is 24
</code></pre>
<br>
<p>As an extra exercise, you could try modifying this script to convert 24-hour times to 12-hour (am/pm) times.</p>
<h2 id="3datematching">3 - Date Matching</h2>
<p>Now let's match a <code>DAY/MONTH/YEAR</code> style date pattern.</p>
<pre><code class="language-text">\b(0?[1-9]|[12]\d|3[01])([\/\-])(0?[1-9]|1[012])\2(\d{4})
</code></pre>
<p>This one is a bit longer, but it should look pretty similar to what we've covered already.</p>
<ul>
<li><code>(0?[1-9]|[12]\d|3[01])</code> - Match any number between 1 and 31 (with an optional preceding zero)</li>
<li><code>([\/\-])</code> - Match the seperator <code>/</code> or <code>-</code></li>
<li><code>(0?[1-9]|1[012])</code> - Match any number between 1 and 12</li>
<li><code>\2</code> - Matches the second capture group (the seperator)</li>
<li><code>\d{4}</code> - Match any 4 digit number (0000 - 9999)</li>
</ul>
<p>The only new concept here is that we're using <code>\2</code> to match the second capture group, which is the divider (<code>/</code> or <code>-</code>).  This enables us to avoid repeating our pattern matching specification, and will also require that the dividers are consistent (if the first divider is <code>/</code>, then the second must be as well).</p>
<h4 id="30capturegroupsubstitution">3.0 - Capture Group Substitution</h4>
<p>Using capture groups, we can dynamically reorganize and transform our string input.</p>
<p>The standard way to refer to capture groups is to use the <code>$</code> or <code>\</code> symbol, along with the index of the capture group.</p>
<h4 id="31realworldexampledateformattransformation">3.1 - Real-World Example - Date Format Transformation</h4>
<p>Let's imagine that we were tasked with converting a collection of documents from using the international date format style (<code>DAY/MONTH/YEAR</code>) to the American style (<code>MONTH/DAY/YEAR</code>)</p>
<p>We could use the above regular expression with a replacement pattern - <code>$3$2$1$2$4</code> or <code>\3\2\1\2\4</code>.</p>
<p>Let's review our capture groups.</p>
<ul>
<li><code>\1</code> - First capture group: the day digits.</li>
<li><code>\2</code> - Second capture group: the divider.</li>
<li><code>\3</code> - Third capture group: the month digits.</li>
<li><code>\4</code> - Fourth capture group: the year digits.</li>
</ul>
<p>Hence, our replacement pattern (<code>\3\2\1\2\4</code>) will simply swap the month and day content in the expression.</p>
<p>Here's how we could do this transformation in Javascript -</p>
<pre><code class="language-javascript">const regex = /\b(0?[1-9]|[12]\d|3[01])([ \/\-])(0?[1-9]|1[012])\2(\d{4})/
const str = `Today's date is 18/09/2017`
const subst = `$3$2$1$2$4`
const result = str.replace(regex, subst)
console.log(result)
</code></pre>
<br>
<p>The above script will print <code>Today's date is 09/18/2017</code> to the console.</p>
<p>The above script is quite similar in Python -</p>
<pre><code class="language-python">import re
regex = r'\b(0?[1-9]|[12]\d|3[01])([ \/\-])(0?[1-9]|1[012])\2(\d{4})'
test_str = &quot;Today's date is 18/09/2017&quot;
subst = r'\3\2\1\2\4'
result = re.sub(regex, subst, test_str)
print(result)
</code></pre>
<br>
<h2 id="4emailvalidation">4 - Email Validation</h2>
<p>Regular expressions can also be useful for input validation.</p>
<pre><code class="language-text">^[^@\s]+@[^@\s]+\.\w{2,6}$
</code></pre>
<p>Above is an (overly simple) regular expression to match an email address.</p>
<ul>
<li><code>^</code> - Start of input</li>
<li><code>[^@\s]</code> - Match any character except for <code>@</code> and whitespace <code>\s</code></li>
<li><code>+</code> - 1+ times</li>
<li><code>@</code> - Match the '@' symbol</li>
<li><code>[^@\s]+</code> - Match any character except for <code>@</code> and whitespace), 1+ times</li>
<li><code>\.</code> - Match the '.' character.</li>
<li><code>\w{2,6}</code> - Match any word character (letter, digit, or underscore), 2-6 times</li>
<li><code>$</code> - End of input</li>
</ul>
<h4 id="40realworldexamplevalidateemail">4.0 - Real-World Example - Validate Email</h4>
<p>Let's say we wanted to create a simple Javascript function to check if an input is a valid email.</p>
<pre><code class="language-javascript">function isValidEmail (input) {
  const regex = /^[^@\s]+@[^@\s]+\.\w{2,6}$/g;
  const result = regex.exec(input)

  // If result is null, no match was found
  return !!result
}

const tests = [
  `test.test@gmail.com`, // Valid
  '', // Invalid
  `test.test`, // Invalid
  '@invalid@test.com', // Invalid
  'invalid@@test.com', // Invalid
  `gmail.com`, // Invalid
  `this is a test@test.com`, // Invalid
  `test.test@gmail.comtest.test@gmail.com` // Invalid
]

console.log(tests.map(isValidEmail))
</code></pre>
<br>
<p>The output of this script should be <code>[ true, false, false, false, false, false, false, false ]</code>.</p>
<blockquote>
<p>Note - In a real-world application, validating an email address using a regular expression is not enough for many situations, such as when a user signs up in a web app.  Once you have confirmed that the input text is an email address, it is best to always follow through with the standard practice of sending a confirmation/activation email.</p>
</blockquote>
<h4 id="41fullemailregex">4.1 - Full Email Regex</h4>
<p>This is a very simple example which ignores lots of very important email-validity edge cases, such as invalid start/end characters and consecutive periods.  I really don't recommend using the above expression in your applications; it would be best to instead use a reputable email-validation library or to track down a more complete email validation regex.</p>
<p>For instance, here's a more advanced expression from (the aptly named) <a href="http://emailregex.com/">emailregex.com</a> which matches 99% of <a href="https://www.ietf.org/rfc/rfc5322.txt">RFC 5322</a> compliant email addresses.</p>
<pre><code>(?:[a-z0-9!#$%&amp;'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&amp;'*+/=?^_`{|}~-]+)*|&quot;(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*&quot;)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
</code></pre>
<p>Yeah, we're not going to walk through that one.</p>
<h2 id="5codecommentpatternmatching">5 - Code Comment Pattern Matching</h2>
<p>One of the most useful ad-hoc uses of regular expressions can be code refactoring.  Most code editors support regex-based find/replace operations.  A well-formed regex substitution can turn a tedious 30-minute busywork job into a beautiful single-expression piece of refactor wizardry.</p>
<p>Instead of writing scripts to perform these operations, try doing them natively in your text editor of choice.  Nearly every text editor supports regex based find-and-replace.</p>
<p>Here are a few guides for popular editors.</p>
<p>Regex Substitution in Sublime - <a href="http://docs.sublimetext.info/en/latest/search_and_replace/search_and_replace_overview.html#using-regular-expressions-in-sublime-text">http://docs.sublimetext.info/en/latest/search_and_replace/search_and_replace_overview.html#using-regular-expressions-in-sublime-text</a></p>
<p>Regex Substitution in Vim - <a href="http://vimregex.com/#backreferences">http://vimregex.com/#backreferences</a></p>
<p>Regex Substitution in VSCode - <a href="https://code.visualstudio.com/docs/editor/codebasics#_advanced-search-options">https://code.visualstudio.com/docs/editor/codebasics#_advanced-search-options</a></p>
<p>Regex Substitution in Emacs - <a href="https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html">https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html</a></p>
<h4 id="50extractingsinglelinecsscomments">5.0 - Extracting Single Line CSS Comments</h4>
<p>What if we wanted to find all of the single-line comments within a CSS file?</p>
<p>CSS comments come in the form <code>/* Comment Here */</code></p>
<p>To capture any <em>single-line</em> CSS comment, we can use the following expression.</p>
<pre><code class="language-text">(\/\*+)(.*)(\*+\/)
</code></pre>
<ul>
<li><code>\/</code> - Match <code>/</code> symbol (we have escape the <code>/</code> character)</li>
<li><code>\*+</code> - Match one or more <code>*</code> symbols (again, we have to escape the <code>*</code> character with <code>\</code>).</li>
<li><code>(.*)</code> - Match any character (besides a newline <code>\n</code>), any number of times</li>
<li><code>\*+</code> - Match one or more <code>*</code> characters</li>
<li><code>\/</code> - Match closing <code>/</code> symbol.</li>
</ul>
<p>Note that we have defined three capture groups in the above expression: the opening characters (<code>(\/\*+)</code>), the comment contents (<code>(.*)</code>), and the closing characters (<code>(\*+\/)</code>).</p>
<h4 id="51realworldexampleconvertsinglelinecommentstomultilinecomments">5.1 - Real-World Example - Convert Single-Line Comments to Multi-Line Comments</h4>
<p>We could use this expression to turn each single-line comment into a multi-line comment by performing the following substitution.</p>
<pre><code class="language-text">$1\n$2\n$3
</code></pre>
<p>Here, we are simply adding a newline <code>\n</code> between each capture group.</p>
<p>Try performing this substitution on a file with the following contents.</p>
<pre><code class="language-css">/* Single Line Comment */
body {
  background-color: pink;
}

/*
 Multiline Comment
*/
h1 {
  font-size: 2rem;
}

/* Another Single Line Comment */
h2 {
  font-size: 1rem;
}
</code></pre>
<br>
<p>The substitution will yield the same file, but with each single-line comment converted to a multi-line comment.</p>
<pre><code class="language-css">/*
 Single Line Comment
*/
body {
  background-color: pink;
}

/*
 Multiline Comment
*/
h1 {
  font-size: 2rem;
}

/*
 Another Single Line Comment
*/
h2 {
  font-size: 1rem;
}
</code></pre>
<br>
<h4 id="52realworldexamplestandardizecsscommentopenings">5.2 - Real-World Example - Standardize CSS Comment Openings</h4>
<p>Let's say we have a big messy CSS file that was written by a few different people.  In this file, some of the comments start with <code>/*</code>, some with <code>/**</code>, and some with <code>/*****</code>.</p>
<p>Let's write a regex substitution to standardize all of the single-line CSS comments to start with <code>/*</code>.</p>
<p>In order to do this, we'll extend our expression to only match comments with <em>two or more</em> starting asterisks.</p>
<pre><code class="language-text">(\/\*{2,})(.*)(\*+\/)
</code></pre>
<p>This expression very similar to the original.  The main difference is that at the beginning we've replaced <code>\*+</code> with <code>\*{2,}</code>.  The <code>\*{2,}</code> syntax signifies &quot;two or more&quot; instances of <code>*</code>.</p>
<p>To standardize the opening of each comment we can pass the following substitution.</p>
<pre><code class="language-bash">/*$2$3
</code></pre>
<br>
<p>Let's run this substitution on the following test CSS file.</p>
<pre><code class="language-css">/** Double Asterisk Comment */
body {
  background-color: pink;
}

/* Single Asterisk Comment */
h1 {
  font-size: 2rem;
}

/***** Many Asterisk Comment */
h2 {
  font-size: 1rem;
}
</code></pre>
<br>
<p>The result will be the same file with standardized comment openings.</p>
<pre><code class="language-css">/* Double Asterisk Comment */
body {
  background-color: pink;
}

/* Single Asterisk Comment */
h1 {
  font-size: 2rem;
}

/* Many Asterisk Comment */
h2 {
  font-size: 1rem;
}
</code></pre>
<br>
<h2 id="6urlmatching">6 - URL Matching</h2>
<p>Another highly useful regex recipe is matching URLs in text.</p>
<p>Here an example URL matching expression from <a href="https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url">Stack Overflow</a>.</p>
<pre><code class="language-text">(https?:\/\/)(www\.)?(?&lt;domain&gt;[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})(?&lt;path&gt;\/[-a-zA-Z0-9@:%_\/+.~#?&amp;=]*)?
</code></pre>
<ul>
<li><code>(https?:\/\/)</code> - Match http(s)</li>
<li><code>(www\.)?</code> - Optional &quot;www&quot; prefix</li>
<li><code>(?&lt;domain&gt;[-a-zA-Z0-9@:%._\+~#=]{2,256}</code> - Match a valid domain name</li>
<li><code>\.[a-z]{2,6})</code> - Match a domain extension (i.e. &quot;.com&quot; or &quot;.org&quot;)</li>
<li><code>(?&lt;path&gt;\/[-a-zA-Z0-9@:%_\/+.~#?&amp;=]*)?</code> - Match URL path (<code>/posts</code>), query string (<code>?limit=1</code>), and/or file extension (<code>.html</code>), all optional.</li>
</ul>
<h4 id="60namedcapturegroups">6.0 - Named capture groups</h4>
<p>You'll notice here that some of the capture groups now begin with a <code>?&lt;name&gt;</code> identifier.  This is the syntax for a <em>named capture group</em>, which makes the data extraction cleaner.</p>
<h4 id="61realworldexampleparsedomainnamesfromurlsonawebpage">6.1 - Real-World Example - Parse Domain Names From URLs on A Web Page</h4>
<p>Here's how we could use named capture groups to extract the domain name of each URL in a web page using Python.</p>
<pre><code class="language-python">import re
import urllib.request

html = str(urllib.request.urlopen(&quot;https://moz.com/top500&quot;).read())
regex = r&quot;(https?:\/\/)(www\.)?(?P&lt;domain&gt;[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6})(?P&lt;path&gt;\/[-a-zA-Z0-9@:%_\/+.~#?&amp;=]*)?&quot;
matches = re.finditer(regex, html)

for match in matches:
  print(match.group('domain'))
</code></pre>
<br>
<p>The script will print out each domain name it finds in the raw web page HTML content.</p>
<pre><code class="language-text">...
facebook.com
twitter.com
google.com
youtube.com
linkedin.com
wordpress.org
instagram.com
pinterest.com
wikipedia.org
wordpress.com
...
</code></pre>
<br>
<h2 id="7commandlineusage">7 - Command Line Usage</h2>
<p>Regular expressions are also supported by many Unix command line utilities!  We'll walk through how to use them with <code>grep</code> to find specific files, and with <code>sed</code> to replace text file content in-place.</p>
<h4 id="70realworldexampleimagefilematchingwithgrep">7.0 - Real-World Example - Image File Matching With <code>grep</code></h4>
<p>We'll define another basic regular expression, this time to match image files.</p>
<pre><code class="language-text">^.+\.(?i)(png|jpg|jpeg|gif|webp)$
</code></pre>
<ul>
<li><code>^</code> - Start of line.</li>
<li><code>.+</code> - Match any character (letters, digits, symbols), expect for <code>\n</code> (new line), 1+ times.</li>
<li><code>\.</code> - Match the '.' character.</li>
<li><code>(?i)</code> - Signifies that the next sequence is case-insensitive.</li>
<li><code>(png|jpg|jpeg|gif|webp)</code> - Match common image file extensions</li>
<li><code>$</code> - End of line</li>
</ul>
<p>Here's how you could list all of the image files in your <code>Downloads</code> directory.</p>
<pre><code class="language-bash">ls ~/Downloads | grep -E '^.+\.(?i)(png|jpg|jpeg|gif|webp)$'
</code></pre>
<ul>
<li><code>ls ~/Downloads</code> - List the files in your downloads directory</li>
<li><code>|</code> - Pipe the output to the next command</li>
<li><code>grep -E</code> - Filter the input with regular expression</li>
</ul>
<h4 id="71realworldexampleemailsubstitutionwithsed">7.1 - Real-World Example - Email Substitution With <code>sed</code></h4>
<p>Another good use of regular expressions in bash commands could be redacting emails within a text file.</p>
<p>This can be done quite using the <code>sed</code> command, along with a modified version of our email regex from earlier.</p>
<pre><code class="language-bash">sed -E -i 's/^(.*?\s|)[^@]+@[^\s]+/\1\{redacted\}/g' test.txt
</code></pre>
<ul>
<li><code>sed</code> - The Unix &quot;stream editor&quot; utility, which allows for powerful text file transformations.</li>
<li><code>-E</code> - Use extended regex pattern matching</li>
<li><code>-i</code> - Replace the file stream in-place</li>
<li><code>'s/^(.*?\s|)</code> - Wrap the beginning of the line in a capture group</li>
<li><code>[^@]+@[^\s]+</code> - Simplified version of our email regex.</li>
<li><code>/\1\{redacted\}/g'</code> - Replace each email address with <code>{redacted}</code>.</li>
<li><code>test.txt</code> - Perform the operation on the <code>test.txt</code> file.</li>
</ul>
<p>We can run the above substitution command on a sample <code>test.txt</code> file.</p>
<pre><code class="language-bash">My email is patrick.triest@gmail.com
</code></pre>
<br>
<p>Once the command has been run, the email will be redacted from the <code>test.txt</code> file.</p>
<pre><code class="language-bash">My email is {redacted}
</code></pre>
<blockquote>
<p>Warning - This command will automatically remove all email addresses from any <code>test.txt</code> that you pass it, so be careful where/when you run it, since <strong>this operation cannot be reversed</strong>.  To preview the results within the terminal, instead of replacing the text in-place, simply omit the <code>-i</code> flag.</p>
</blockquote>
<blockquote>
<p>Note - While the above command should work on most Linux distributions, macOS uses the BSD implementation of <code>sed</code>, which is more limited in its supported regex syntax.  To use <code>sed</code> on macOS with decent regex support, I would recommend installing the GNU implementation of <code>sed</code> with <code>brew install gnu-sed</code>, and then using <code>gsed</code> from the command line instead of <code>sed</code>.</p>
</blockquote>
<h2 id="8whennottouseregex">8 - When Not To Use Regex</h2>
<p>Ok, so clearly regex is a powerful, flexible tool.  Are there times when you should avoid writing your own regex expressions? <em>Yes!</em></p>
<h4 id="80languageparsing">8.0 - Language Parsing</h4>
<p>Parsing languages, from English to Java to JSON, can be a real pain using regex expressions.</p>
<p>Writing your own regex expression for this purpose is likely to be an exercise in frustration that will result in eventual (or immediate) disaster when an edge case or minor syntax/grammar inconsistency in the data source causes the expression to fail.</p>
<p>Battle-hardened parsers are available for virtually all machine-readable languages, and <a href="http://www.nltk.org/">NLP tools</a> are available for human languages - I strongly recommend that you use one of them instead of attempting to write your own.</p>
<h4 id="81securitycriticalinputfilteringandblacklists">8.1 - Security-Critical Input Filtering and Blacklists</h4>
<p>It may seem tempting to use regular expressions to filter user input (such as from a web form), to prevent hackers from sending malicious commands (such as SQL injections) to your application.</p>
<p>Using a custom regex expression here is unwise since it is very difficult to cover every potential attack vector or malicious command.  For instance, hackers can use <a href="http://www.cgisecurity.com/lib/URLEmbeddedAttacks.html">alternative character encodings to get around naively programmed input blacklist filters</a>.</p>
<p>This is another instance where I would strongly recommend using the well-tested libraries and/or services, along with <a href="https://www.owasp.org/index.php/Input_Validation_Cheat_Sheet">the use of whitelists instead of blacklists</a>, in order to protect your application from malicious inputs.</p>
<h4 id="82performanceintensiveapplications">8.2 - Performance Intensive Applications</h4>
<p>Regex matching speeds can range from not-very-fast to extremely slow, depending on <a href="https://www.loggly.com/blog/regexes-the-bad-better-best/">how well the expression is written</a>.  This is fine for most use cases, especially if the text being matched is very short (such as an email address form).  For high-performance server applications, however, regex can be a performance bottleneck, especially if expression is poorly written or the text being searched is long.</p>
<h4 id="83forproblemsthatdontrequireregex">8.3 - For Problems That Don't Require Regex</h4>
<p>Regex is an incredibly useful tool, but that doesn't mean you should use it everywhere.</p>
<p>If there is an alternative solution to a problem, which is simpler and/or does not require the use of regular expressions, <strong>please do not use regex just to feel clever</strong>.  Regex is great, but it is also one of the least readable programming tools, and one that is very prone to edge cases and bugs.</p>
<p>Overusing regex is a great way to make your co-workers (and anyone else who needs to work with your code) very angry with you.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I hope that this has been a useful introduction to the many uses of regular expressions.</p>
<p>There still are lots of regex use cases that we have not covered.  For instance, <a href="https://www.postgresql.org/docs/9.5/static/functions-matching.html">regex can be used in PostgreSQL queries</a> to dynamically search for text patterns within a database.</p>
<p>We have also left lots of powerful regex syntax features uncovered, such as <a href="https://www.regular-expressions.info/lookaround.html">lookahead, lookbehind</a>, <a href="https://www.regular-expressions.info/atomic.html">atomic groups</a>, <a href="https://www.regular-expressions.info/recurse.html">recursion</a>, and <a href="https://www.regular-expressions.info/subroutine.html">subroutines</a>.</p>
<p>To improve your regex skills and to learn more about these features, I would recommend the following resources.</p>
<ul>
<li>Learn Regex The Easy Way - <a href="https://github.com/zeeshanu/learn-regex">https://github.com/zeeshanu/learn-regex</a></li>
<li>Regex101 - <a href="https://regex101.com/">https://regex101.com/</a></li>
<li>HackerRank Regex Course - <a href="https://www.hackerrank.com/domains/regex/re-introduction">https://www.hackerrank.com/domains/regex/re-introduction</a></li>
</ul>
<p>The source code for the examples in this tutorial can be found at the Github repository here - <a href="https://github.com/triestpa/You-Should-Learn-Regex">https://github.com/triestpa/You-Should-Learn-Regex</a></p>
<p>Feel free to comment below with any suggestions, ideas, or criticisms regarding this tutorial.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack]]></title><description><![CDATA[Learn to build an interactive map web application showing data from Game of Thrones using Leaflet.js, Webpack, and frameworkless Javascript components.]]></description><link>http://blog.patricktriest.com/game-of-thrones-leaflet-webpack/</link><guid isPermaLink="false">59b362364283e45fbfa6545e</guid><category><![CDATA[Javascript]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Mon, 11 Sep 2017 12:00:00 GMT</pubDate><media:content url="https://blog-images.patricktriest.com/uploads/got_map.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h3 id="theexcitingworldofdigitalcartography">The Exciting World of Digital Cartography</h3>
<img src="https://blog-images.patricktriest.com/uploads/got_map.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"><p>Welcome to part II of the tutorial series &quot;Build An Interactive Game of Thrones Map&quot;.  In this installment, we'll be building a web application to display from from our &quot;Game of Thrones&quot; API on an interactive map.</p>
<p>Our webapp is built on top of the backend application that we completeted in part I of the tutorial - <a href="https://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/">Build An Interactive Game of Thrones Map (Part I) - Node.js, PostGIS, and Redis</a></p>
<p>Using the techniques that we'll cover for this example webapp, you will have a foundation to build any sort of interactive web-based map, from <a href="http://chriswhong.github.io/nyctaxi/">&quot;A Day in the Life of an NYC Taxi Cab&quot;</a> to a <a href="https://www.openstreetmap.org/">completely open-source version of Google Maps</a>.</p>
<p>We will also be going over the basics of wiring up a simple <a href="https://webpack.github.io/">Webpack</a> build system, along with covering some guidelines for creating frameworkless Javascript components.</p>
<p>For a preview of the final result, check out the webapp here - <a href="https://atlasofthrones.com">https://atlasofthrones.com</a></p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/got_map.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>We will be using <a href="http://leafletjs.com/">Leaflet.js</a> to render the map, <a href="http://fusejs.io/">Fuse.js</a> to power the location search, and <a href="http://sass-lang.com/guide">Sass</a> for our styling, all wrapped in a custom <a href="https://webpack.github.io/">Webpack</a> build system.  The application will be built using vanilla Javascript (no frameworks), but we will still organize the codebase into seperate UI components (with seperate HTML, CSS, and JS files) for maximal clarity and seperation-of-concerns.</p>
<h3 id="part0projectsetup">Part 0 - Project Setup</h3>
<h5 id="00installdependencies">0.0 - Install Dependencies</h5>
<p>I'll be writing this tutorial with the assumption that anyone reading it has already completed the first part - <a href="https://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/">Build An Interactive Game of Thrones Map (Part I) - Node.js, PostGIS, and Redis</a>.</p>
<blockquote>
<p>If you stubbornly refuse to learn about the Node.js backend powering this application, I'll provide an API URL that you can use instead of your running the backend on your own machine.  But seriously, try part one, it's pretty fun.</p>
</blockquote>
<p>To setup the project, you can either resume from the same project directory where you completed part one, or you can clone the frontend starter repo to start fresh with the complete backend application.</p>
<h6 id="optionauseyourcodebasefromparti">Option A - Use Your Codebase from Part I</h6>
<p>If you are resuming from your existing backend project, you'll just need to install a few new NPM dependencies.</p>
<pre><code class="language-bash">npm i -D webpack html-loader node-sass sass-loader css-loader style-loader url-loader babel-loader babili-webpack-plugin http-server
npm i axios fuse.js leaflet
</code></pre>
<br>
<p>And that's it, with the dependencies installed you should be good to go.</p>
<h6 id="optionbusefrontendstartergithubrepository">Option B - Use Frontend-Starter Github Repository</h6>
<p>First, <code>git clone</code> the <code>frontend-starter</code> branch of the repository on Github.</p>
<pre><code class="language-bash">git clone -b frontend-starter https://github.com/triestpa/Atlas-Of-Thrones
</code></pre>
<br>
<p>Once the repo is download, enter the directory (<code>cd Atlas-Of-Thrones</code>) and run <code>npm install</code>.</p>
<p>You'll still need to set up PostgreSQL and Redis, along with adding a local <code>.env</code> file.  See parts 1 and 2 of the <a href="https://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/">backend tutorial</a> for details.</p>
<h5 id="01butwaitwherestheframework">0.1 - But wait - where's the framework?</h5>
<p>Right... I decided not to use any specific Javascript framework for this tutorial.  In the past, I've used <a href="https://facebook.github.io/react/">React</a>, <a href="https://angular.io/">Angular</a> (1.x &amp; 2.x+), and <a href="https://vuejs.org/">Vue</a> (my personal favorite) for a variety of projects.  I think that they're all really solid choices.</p>
<p>When writing a tutorial, I would prefer not to alienate anyone who is inexperienced with (or has a deep dogmatic hatred of) the chosen framework, so <strong>I've chosen to build the app using the native Javascript DOM APIs</strong>.</p>
<p>Why?</p>
<ul>
<li>The app is relatively simple and does not require advanced page routing or data binding.</li>
<li><a href="http://leafletjs.com/">Leaflet.js</a> handles the complex map rendering and styling</li>
<li>I want to keep the tutorial accessible to anyone who knows Javascript, without requiring knowledge of any specific framework.</li>
<li>Omitting a framework allows us to minimize the base application payload size to <strong>60kb total</strong> (JS+CSS), most of which (38kb) is Leaflet.js.</li>
<li>Building a frameworkless frontend is a valuable &quot;back-to-basics&quot; Javascript exercise.</li>
</ul>
<p>Am I against Javascript frameworks in general? Of course not!  I use JS frameworks for almost all of my (personal and profession) projects.</p>
<p><strong>But what about project structure? And reusable components?</strong><br>
That's a good point.  Frameworkless frontend applications too often devolve into a monolithic 1000+ line single JS file (along with huge HTML and CSS files), full of spaghetti code and nearly impossible to decipher for those who didn't originally write it.  I'm not a fan of this approach.</p>
<p>What if I told you that it's possible to write structured, reusable Javascript components without a framework? Blasphemy? Too difficult? Not at all.  We'll go deeper into this further down.</p>
<h5 id="02setupwebpackconfig">0.2 - Setup Webpack Config</h5>
<p>Before we actually start coding the webapp, let's get the build system in place.  We'll be using Webpack to bundle our JS/CSS/HTML files, to generate source maps for dev builds, and to minimize resources for production builds.</p>
<p>Create a <code>webpack.config.js</code> file in the project root.</p>
<pre><code class="language-javascript">const path = require('path')
const BabiliPlugin = require('babili-webpack-plugin')

// Babel loader for Transpiling ES8 Javascript for browser usage
const babelLoader = {
  test: /\.js$/,
  loader: 'babel-loader',
  include: [path.resolve(__dirname, '../app')],
  query: { presets: ['es2017'] }
}

// SCSS loader for transpiling SCSS files to CSS
const scssLoader = {
  test: /\.scss$/,
  loader: 'style-loader!css-loader!sass-loader'
}

// URL loader to resolve data-urls at build time
const urlLoader = {
  test: /\.(png|woff|woff2|eot|ttf|svg)$/,
  loader: 'url-loader?limit=100000'
}

// HTML load to allow us to import HTML templates into our JS files
const htmlLoader = {
  test: /\.html$/,
  loader: 'html-loader'
}

const webpackConfig = {
  entry: './app/main.js', // Start at app/main.js
  output: {
    path: path.resolve(__dirname, 'public'),
    filename: 'bundle.js' // Output to public/bundle.js
  },
  module: { loaders: [ babelLoader, scssLoader, urlLoader, htmlLoader ] }
}

if (process.env.NODE_ENV === 'production') {
  // Minify for production build
  webpackConfig.plugins = [ new BabiliPlugin({}) ]
} else {
  // Generate sourcemaps for dev build
  webpackConfig.devtool = 'eval-source-map'
}

module.exports = webpackConfig
</code></pre>
<br>
<p>I won't explain the Webpack config here in-depth since we've got a long tutorial ahead of us.  I hope that the inline-comments will adequately explain what each piece of the configuration does; for a more thorough introduction to Webpack, I would recommend the following resources -</p>
<ul>
<li><a href="https://webpack.js.org/">Official Webpack Introduction</a></li>
<li><a href="https://www.smashingmagazine.com/2017/02/a-detailed-introduction-to-webpack/">Webpack – A Detailed Introduction, Smashing Magazine</a></li>
</ul>
<h5 id="03addnpmscripts">0.3 - Add NPM Scripts</h5>
<p>In the <code>package.json</code> file, add the following scripts -</p>
<pre><code class="language-json">&quot;scripts&quot;: {
  ...
  &quot;serve&quot;: &quot;webpack --watch &amp; http-server ./public&quot;,
  &quot;dev&quot;: &quot;NODE_ENV=local npm start &amp; npm run serve&quot;,
  &quot;build&quot;: &quot;NODE_ENV=production webpack&quot;
}
</code></pre>
<br>
<p>Since we're including the frontend code in the same repository as the backend Node.js application, we'll leave the <code>npm start</code> command reserved for starting the server.</p>
<p>The new <code>npm run serve</code> script will watch our frontend source files, build our application, and serve files from the <code>public</code> directory at <code>localhost:8080</code>.</p>
<p>The <code>npm run build</code> command will build a production-ready (minified) application bundle.</p>
<p>The <code>npm run dev</code> command will start the Node.js API server and serve the webapp, allowing for an integrated (backend + frontend) development environment start command.</p>
<blockquote>
<p>You could also use the NPM module <code>webpack-dev-server</code> to watch/build/serve the frontend application dev bundle with a single command.  Personally, I prefer the flexibility of keeping these tasks decoupled by using <code>webpack --watch</code> with the <code>http-server</code> NPM module.</p>
</blockquote>
<h5 id="03addpublicindexhtml">0.3 - Add <code>public/index.html</code></h5>
<p>Create a new directory called <code>public</code> in the project root.</p>
<p>This is the repository where the public webapp code will be generated.  The only file that we need here is an &quot;index.html&quot; page in order to import our dependencies and to provide a placeholder element for the application to load into.</p>
<p>Add to following to <code>public/index.html</code>.</p>
<pre><code class="language-html">&lt;html&gt;
&lt;head&gt;
  &lt;meta charset=&quot;utf-8&quot;&gt;
  &lt;meta http-equiv=&quot;x-ua-compatible&quot; content=&quot;ie=edge&quot;&gt;
  &lt;meta name=&quot;viewport&quot; content=&quot;width=device-width, initial-scale=1, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no, shrink-to-fit=no&quot;&gt;
  &lt;title&gt;Atlas Of Thrones: A Game of Thrones Interactive Map&lt;/title&gt;
  &lt;meta name=&quot;description&quot; content=&quot;Explore the world of Game of Thrones! An interactive Google Maps style webapp.&quot; /&gt;
  &lt;style&gt;
    html {
      background: #222;
    }

    #loading-container {
      font-family: sans-serif;
      position: absolute;
      color: white;
      letter-spacing: 0.8rem;
      text-align: center;
      top: 40%;
      width: 100%;
      text-transform: uppercase;
    }
  &lt;/style&gt;
&lt;/head&gt;

&lt;body&gt;
  &lt;div id=&quot;loading-container&quot;&gt;
    &lt;h1&gt;Atlas of Thrones&lt;/h1&gt;
  &lt;/div&gt;
  &lt;div id=&quot;app&quot;&gt;&lt;/div&gt;
  &lt;script src=&quot;bundle.js&quot;&gt;&lt;/script&gt;
&lt;/body&gt;
&lt;/html&gt;
</code></pre>
<br>
<p>Here, we are simply importing our bundle (<code>bundle.js</code>) and adding a placeholder <code>div</code> to load our app into.</p>
<p>We are rendering an &quot;Atlas of Thrones&quot; title screen that will be displayed to the user instantly and be replaced once the app bundle is finished loading.  This is a good practice for single-page javascript apps (which can often have large payloads) in order to replace the default blank browser loading screen with some app-specific content.</p>
<h5 id="04addappmainjs">0.4 - Add <code>app/main.js</code></h5>
<p>Now create a new directory <code>app/</code>.  This is where our pre-built frontend application code will live.</p>
<p>Create a file, <code>app/main.js</code>, with the following contents.</p>
<pre><code class="language-javascript">/** Main UI Controller Class */
class ViewController {
  /** Initialize Application */
  constructor () {
    console.log('hello world')
  }
}

window.ctrl = new ViewController()
</code></pre>
<br>
<p>We are just creating a (currently useless class) <code>ViewController</code>, and instantiating it.  The ViewController class will be our top-level full-page controller, which we will use to instantiate and compose the various application components.</p>
<h5 id="05tryitout">0.5 - Try it out!</h5>
<p>Ok, now we're ready to run our very basic application template.  Run <code>npm run serve</code> on the command line.  We should see some output text indicating a successful Webpack application build, as well as some <code>http-server</code> output notifying us that the <code>public</code> directory is now being served at <code>localhost:8080</code>.</p>
<pre><code class="language-bash">$ npm run serve

&gt; atlas-of-thrones@0.8.0 serve 
&gt; webpack --watch &amp; http-server ./public

Starting up http-server, serving ./public
Available on:
  http://127.0.0.1:8080
  http://10.3.21.159:8080
Hit CTRL-C to stop the server

Webpack is watching the files…

Hash: 5bf9c88ced32655e0ca3
Version: webpack 3.5.5
Time: 77ms
    Asset     Size  Chunks             Chunk Names
bundle.js  3.41 kB       0  [emitted]  main
   [0] ./app/main.js 178 bytes {0} [built]
</code></pre>
<br>
<p>Visit <code>localhost:8080</code> in your browser; you should see the following screen.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_0_5.png" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>Open up the browser Javascript console (In Chrome - Command+Option+J on macOS, Control+Shift+J on Windows / Linux).  You should see a <code>hello world</code> message in the console output.</p>
<h5 id="06addbaselineapphtmlandscss">0.6 - Add baseline app HTML and SCSS</h5>
<p>Now that our Javascript app is being loaded correctly, we'll add our baseline HTML template and SCSS styling.</p>
<p>In the <code>app/</code> directory, add a new file - <code>main.html</code>, with the following contents.</p>
<pre><code class="language-html">&lt;div id=&quot;app-container&quot;&gt;
  &lt;div id=&quot;map-placeholder&quot;&gt;&lt;/div&gt;
  &lt;div id=&quot;layer-panel-placeholder&quot;&gt;&lt;/div&gt;
  &lt;div id=&quot;search-panel-placeholder&quot;&gt;&lt;/div&gt;
  &lt;div id=&quot;info-panel-placeholder&quot;&gt;&lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>This file simply contains placeholders for the components that we will soon add.</p>
<p>Next add <code>_variables.scss</code> to the <code>app/</code> directory.</p>
<pre><code class="language-scss">$offWhite: #faebd7;
$grey: #666;
$offDark: #111;
$midDark: #222;
$lightDark: #333;
$highlight: #BABA45;
$highlightSecondary: red;
$footerHeight: 80px;
$searchBarHeight: 60px;
$panelMargin: 24px;
$toggleLayerPanelButtonWidth: 40px;
$leftPanelsWidth: 500px;
$breakpointMobile: 600px;
$fontNormal: 'Lato', sans-serif;
$fontFancy: 'MedievalSharp', cursive;
</code></pre>
<br>
<p>Here we are defining some of our global style variables, which will make it easier to keep the styling consistent in various component-specific SCSS files.</p>
<p>To define our global styles, create <code>main.scss</code> in the <code>app/</code> directory.</p>
<pre><code class="language-scss">@import url(https://fonts.googleapis.com/css?family=Lato|MedievalSharp);
@import &quot;~leaflet/dist/leaflet.css&quot;;
@import './_variables.scss';

/** Page Layout **/
body {
  margin: 0;
  font-family: $fontNormal;
  background: $lightDark;
  overflow: hidden;
  height: 100%;
  width: 100%;
}

a {
  color: $highlight;
}

#loading-container {
  display: none;
}

#app-container {
  display: block;
}
</code></pre>
<br>
<p>At the top of the file, we are importing three items - our application fonts from <a href="https://fonts.google.com/">Google Fonts</a>, the Leaflet CSS styles, and our SCSS variables.  Next, we are setting some very basic top-level styling rules, and we are hiding the loading container.</p>
<blockquote>
<p>I won't be going into the specifics of the CSS/SCSS styling during this tutorial, since the tutorial is already quite long, and explaining CSS rules in-depth tends to be tedious.  The provided styling is designed to be minimal, extensible, and completely responsive for desktop/tablet/mobile usage, so feel free to modify it with whatever design ideas you might have.</p>
</blockquote>
<p>Finally, edit our <code>app/main.js</code> file to have the following contents.</p>
<pre><code class="language-javascript">import './main.scss'
import template from './main.html'

/** Main UI Controller Class */
class ViewController {
  /** Initialize Application */
  constructor () {
    document.getElementById('app').outerHTML = template
  }
}

window.ctrl = new ViewController()
</code></pre>
<br>
<p>We've changed the application behavior now to import our global SCSS styles and to load the base application HTML template into the <code>app</code> id placeholder (as defined in <code>public/index.html</code>).</p>
<p>Check the terminal output (re-run <code>npm run serve</code> if you stopped it), you should see successful build output from Webpack.  Open <code>localhost:8080</code> in you browser, and you should now see an empty dark screen.  This is good since it means that the application SCSS styles have been loaded correctly, and have hidden the loading placeholder container.</p>
<blockquote>
<p>For a more thorough test, you can use the browser's <a href="https://developers.google.com/web/tools/chrome-devtools/network-performance/network-conditions">network throttling settings</a> to load the page slowly (I would recommend testing everything you build using the <code>Slow 3G</code> setting).  With network throttling enabled (and the browser cache disabled), you should see the &quot;Atlas Of Thrones&quot; loading screen appear for a few seconds, and then disappear once the application bundle is loaded.</p>
</blockquote>
<h3 id="step1addnativejavascriptcomponentstructure">Step 1 - Add Native Javascript Component Structure</h3>
<p>We will now set up a simple way to create frameworkless Javascript components.</p>
<p>Add a new directory - <code>app/components</code>.</p>
<h5 id="10addbasecomponentclass">1.0 - Add base <code>Component</code> class</h5>
<p>Create a new file <code>app/components/component.js</code> with the following contents.</p>
<pre><code class="language-javascript">/**
 * Base component class to provide view ref binding, template insertion, and event listener setup
 */
export class Component {
  /** SearchPanel Component Constructor
   * @param { String } placeholderId - Element ID to inflate the component into
   * @param { Object } props - Component properties
   * @param { Object } props.events - Component event listeners
   * @param { Object } props.data - Component data properties
   * @param { String } template - HTML template to inflate into placeholder id
   */
  constructor (placeholderId, props = {}, template) {
    this.componentElem = document.getElementById(placeholderId)

    if (template) {
      // Load template into placeholder element
      this.componentElem.innerHTML = template

      // Find all refs in component
      this.refs = {}
      const refElems = this.componentElem.querySelectorAll('[ref]')
      refElems.forEach((elem) =&gt; { this.refs[elem.getAttribute('ref')] = elem })
    }

    if (props.events) { this.createEvents(props.events) }
  }

  /** Read &quot;event&quot; component parameters, and attach event listeners for each */
  createEvents (events) {
    Object.keys(events).forEach((eventName) =&gt; {
      this.componentElem.addEventListener(eventName, events[eventName], false)
    })
  }

  /** Trigger a component event with the provided &quot;detail&quot; payload */
  triggerEvent (eventName, detail) {
    const event = new window.CustomEvent(eventName, { detail })
    this.componentElem.dispatchEvent(event)
  }
}
</code></pre>
<br>
<p>This is the base component class that all of our custom components will extend.</p>
<p>The component class handles three important tasks</p>
<ol>
<li>Load the component HTML template into the placeholder ID.</li>
<li>Assign each DOM element with a <code>ref</code> tag to <code>this.refs</code>.</li>
<li>Bind window event listener callbacks for the provided event types.</li>
</ol>
<p>It's ok if this seems a bit confusing right now, we'll see soon how each tiny block of code will provide essential functionality for our custom components.</p>
<p>If you are unfamiliar with object-oriented programming and/or class-inheritance, it's really quite simple, here are some Javascript-centric  introductions.</p>
<ul>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes">Official &quot;Class&quot; MDN Documentation</a></li>
<li><a href="https://scotch.io/tutorials/better-javascript-with-es6-pt-ii-a-deep-dive-into-classes">Better JavaScript with ES6, Pt. II: A Deep Dive into Classes</a></li>
</ul>
<blockquote>
<p>Javascript &quot;Classes&quot; are syntactic sugar over Javascript's existing prototype-based object declaration and inheritance model.  This differs from &quot;true&quot; object-oriented languages, such as Java, Python, and C++, which were designed from the ground-up with class-based inheritance in mind.  Despite this caveat (which is necessary due to the existing limitations in legacy browser Javascript interpreter engines), the new Javascript class syntax is really quite useful and is much cleaner and more standardized (i.e. more similar to virtually every other OOP language) than the legacy prototype-based inheritance syntax.</p>
</blockquote>
<h5 id="11addbaselineinfopanelcomponent">1.1 - Add Baseline <code>info-panel</code> Component</h5>
<p>To demonstrate how our base component class works, let's create an <code>info-panel</code> component.</p>
<p>First create a new directory - <code>app/components/info-panel</code>.</p>
<p>Next, add an HTML template in <code>app/components/info-panel/info-panel.html</code></p>
<pre><code class="language-html">&lt;div ref=&quot;container&quot; class=&quot;info-container&quot;&gt;
  &lt;div ref=&quot;title&quot; class=&quot;info-title&quot;&gt;
    &lt;h1&gt;Nothing Selected&lt;/h1&gt;
  &lt;/div&gt;
  &lt;div class=&quot;info-body&quot;&gt;
    &lt;div class=&quot;info-content-container&quot;&gt;
      &lt;div ref=&quot;content&quot; class=&quot;info-content&quot;&gt;&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<blockquote>
<p>Notice that some of the elements have the <code>ref</code> attribute defined.  This is a system (modeled on similar features in React and Vue) to make these native HTML elements (ex. <code>ref=&quot;title&quot;</code>) easily accessible from the component (using <code>this.refs.title</code>).  See the <code>Component</code> class constructor for the simple implementation details of how this system works.</p>
</blockquote>
<p>We'll also add our component styling in <code>app/components/info-panel/info-panel.scss</code></p>
<pre><code class="language-scss">@import '../../_variables.scss';

.info-title {
  font-family: $fontFancy;
  height: $footerHeight;
  display: flex;
  justify-content: center;
  align-items: center;
  cursor: pointer;
  user-select: none;
  color: $offWhite;
  position: fixed;
  top: 0;
  left: 0;
  width: 100%;

  h1 {
    letter-spacing: 0.3rem;
    max-width: 100%;
    padding: 20px;
    text-overflow: ellipsis;
    text-align: center;
  }
}

.info-content {
  padding: 0 8% 24px 8%;
  margin: 0 auto;
  background: $lightDark;
  overflow-y: scroll;
  font-size: 1rem;
  line-height: 1.25em;
  font-weight: 300;
}

.info-container {
  position: absolute;
  overflow-y: hidden;
  bottom: 0;
  left: 24px;
  z-index: 1000;
  background: $midDark;
  width: $leftPanelsWidth;
  height: 60vh;
  color: $offWhite;
  transition: all 0.4s ease-in-out;
  transform: translateY(calc(100% - #{$footerHeight}));
}

.info-container.info-active {
  transform: translateY(0);
}

.info-body {
  margin-top: $footerHeight;
  overflow-y: scroll;
  overflow-x: hidden;
  position: relative;
  height: 80%;
}

.info-footer {
  font-size: 0.8rem;
  font-family: $fontNormal;
  padding: 8%;
  text-align: center;
  text-transform: uppercase;
}

.blog-link {
  letter-spacing: 0.1rem;
  font-weight: bold;
}

@media (max-width: $breakpointMobile) {
  .info-container  {
    left: 0;
    width: 100%;
    height: 80vh;
  }
}
</code></pre>
<br>
<p>Finally, link it all together in <code>app/components/info-panel/info-panel.js</code></p>
<pre><code class="language-javascript">import './info-panel.scss'
import template from './info-panel.html'
import { Component } from '../component'

/**
 * Info Panel Component
 * Download and display metadata for selected items.
 * @extends Component
 */
export class InfoPanel extends Component {
  /** LayerPanel Component Constructor
   * @param { Object } props.data.apiService ApiService instance to use for data fetching
   */
  constructor (placeholderId, props) {
    super(placeholderId, props, template)

    // Toggle info panel on title click
    this.refs.title.addEventListener('click', () =&gt; this.refs.container.classList.toggle('info-active'))
  }
}
</code></pre>
<br>
<p>Here, we are creating an <code>InfoPanel</code> class which <em>extends</em> our <code>Component</code> class.</p>
<p>By calling <code>super(placeholderId, props, template)</code>, the <code>constructor</code> of the base <code>Component</code> class will be triggered.  As a result, the template will be added to our main view, and we will have access to our assigned HTML elements using <code>this.refs</code>.</p>
<p>We're using three different types of <code>import</code> statements at the top of this file.</p>
<p><code>import './info-panel.scss'</code> is required for Webpack to bundle the component styles with the rest of the application.  We're not assigning it to a variable since we don't need to refer to this SCSS file anywhere in our component.  This is refered to as <em>importing a module for side-effects only</em>.</p>
<p><code>import template from './info-panel.html'</code> is taking the entire contents of <code>layer-panel.html</code> and assigning those contents as a string to the <code>template</code> variable.  This is possible through the use of the <code>html-loader</code> Webpack module.</p>
<p><code>import { Component } from '../component'</code> is importing the <code>Component</code> base class.  We're using the curly braces <code>{...}</code> here since the <code>Component</code> is a <em>named member export</em> (as opposed to a <em>default export</em>) of <code>component.js</code>.</p>
<h5 id="12instantiatetheinfopanelcomponentinmainjs">1.2 - Instantiate The <code>info-panel</code> Component in <code>main.js</code></h5>
<p>Now we'll modify <code>main.js</code> to instantiate our new <code>info-panel</code> component.</p>
<pre><code class="language-javascript">import './main.scss'
import template from './main.html'

import { InfoPanel } from './components/info-panel/info-panel'

/** Main UI Controller Class */
class ViewController {
  /** Initialize Application */
  constructor () {
    document.getElementById('app').outerHTML = template
    this.initializeComponents()
  }

  /** Initialize Components with data and event listeners */
  initializeComponents () {
    // Initialize Info Panel
    this.infoComponent = new InfoPanel('info-panel-placeholder')
  }
}

window.ctrl = new ViewController()
</code></pre>
<br>
<p>At the top of the <code>main.js</code> file, we are now importing our <code>InfoPanel</code> component class.</p>
<p>We have declared a new method within <code>ViewController</code> to initialize our application components, and we are now calling that method within our constructor.  Within the <code>initializeComponents</code> method we are creating a new <code>InfoPanel</code> instance with &quot;info-panel-placeholder&quot; as the <code>placeholderId</code> parameter.</p>
<h5 id="13tryitout">1.3 - Try It Out!</h5>
<p>When we reload our app at <code>localhost:8080</code>, we'll see that we now have a small info tab at the bottom of the screen, which can be expanded by clicking on its title.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_1_3.png" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>Great! Our simple, native Javascript component system is now in place.</p>
<h3 id="step2addmapcomponent">Step 2 - Add Map Component</h3>
<p>Now, finally, <strong>let's actually add the map to our application</strong>.</p>
<p>Create a new directory for this component - <code>app/components/map</code></p>
<h5 id="20addmapcomponentjavascript">2.0 - Add Map Component Javascript</h5>
<p>Add a new file to store our map component logic - <code>app/components/map/map.js</code></p>
<pre><code class="language-javascript">import './map.scss'
import L from 'leaflet'
import { Component } from '../component'

const template = '&lt;div ref=&quot;mapContainer&quot; class=&quot;map-container&quot;&gt;&lt;/div&gt;'

/**
 * Leaflet Map Component
 * Render GoT map items, and provide user interactivity.
 * @extends Component
 */
export class Map extends Component {
  /** Map Component Constructor
   * @param { String } placeholderId Element ID to inflate the map into
   * @param { Object } props.events.click Map item click listener
   */
  constructor (mapPlaceholderId, props) {
    super(mapPlaceholderId, props, template)

    // Initialize Leaflet map
    this.map = L.map(this.refs.mapContainer, {
      center: [ 5, 20 ],
      zoom: 4,
      maxZoom: 8,
      minZoom: 4,
      maxBounds: [ [ 50, -30 ], [ -45, 100 ] ]
    })

    this.map.zoomControl.setPosition('bottomright') // Position zoom control
    this.layers = {} // Map layer dict (key/value = title/layer)
    this.selectedRegion = null // Store currently selected region

    // Render Carto GoT tile baselayer
    L.tileLayer(
      'https://cartocdn-gusc.global.ssl.fastly.net/ramirocartodb/api/v1/map/named/tpl_756aec63_3adb_48b6_9d14_331c6cbc47cf/all/{z}/{x}/{y}.png',
      { crs: L.CRS.EPSG4326 }).addTo(this.map)
  }
}
</code></pre>
<br>
<p>Here, we're initializing our <code>leaflet</code> map with our desired settings..</p>
<p>As our base tile layer, we are using an awesome &quot;Game of Thrones&quot; base map provided by <a href="https://carto.com/blog/game-of-thrones-basemap/">Carto</a>.</p>
<blockquote>
<p>Note that since <code>leaflet</code> will handle the view rendering for us, our HTML template is so simple that we're just declaring it as a string instead of putting it in a whole separate file.</p>
</blockquote>
<h5 id="21addmapcomponentscss">2.1 - Add Map Component SCSS</h5>
<p>New, add the following styles to <code>app/components/map/map.scss</code>.</p>
<pre><code class="language-scss">@import '../../_variables.scss';

.map-container {
  background: $lightDark;
  height: 100%;
  width: 100%;
  position: relative;
  top: 0;
  left: 0;

  /** Leaflet Style Overrides **/
  .leaflet-popup {
    bottom: 0;
  }

  .leaflet-popup-content {
    user-select: none;
    cursor: pointer;
  }

  .leaflet-popup-content-wrapper, .leaflet-popup-tip {
    background: $lightDark;
    color: $offWhite;
    text-align: center;
  }

  .leaflet-control-zoom {
    border: none;
  }

  .leaflet-control-zoom-in, .leaflet-control-zoom-out {
    background: $lightDark;
    color: $offWhite;
    border: none;
  }

  .leaflet-control-attribution {
    display: none;
  }

  @media (max-width: $breakpointMobile) {
    .leaflet-bottom {
      bottom: calc(#{$footerHeight} + 10px)
    }
  }
}
</code></pre>
<br>
<p>Since <code>leaflet</code> is rendering the core view for this component, these rules are just overriding a few default styles to give our map a distinctive look.</p>
<h5 id="22instantiatemapcomponent">2.2 - Instantiate Map Component</h5>
<p>In <code>app/main.js</code> add the following code to instantiate our Map component.</p>
<pre><code class="language-javascript">import { Map } from './components/map/map'
...
class ViewController {
...
  initializeComponents () {
    ...
    // Initialize Map
    this.mapComponent = new Map('map-placeholder')
  }
</code></pre>
<blockquote>
<p>Don't forget to import the Map component at the top of <code>main.js</code>!</p>
</blockquote>
<p>If you re-load the page at <code>localhost:8080</code>, you should now see our base <code>leaflet</code> map in the background.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_2_2.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>Awesome!</p>
<h3 id="step3displaygameofthronesdatafromourapi">Step 3 - Display &quot;Game Of Thrones&quot; Data From Our API</h3>
<p>Now that our map component is in place, let's display data from the &quot;Game of Thrones&quot; geospatial API that we set up in the <a href="https://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/">first part of this tutorial series</a>.</p>
<h5 id="30createapiserviceclass">3.0 - Create API Service Class</h5>
<p>To keep the application well-structured, we'll create a new class to wrap our API calls.  It is a good practice to maintain separation-of-concerns in front-end applications by separating <strong>components</strong> (the UI controllers) from <strong>services</strong> (the data fetching and handling logic).</p>
<p>Create a new directory - <code>app/services</code>.</p>
<p>Now, add a new file for our API service - <code>app/services/api.js</code>.</p>
<pre><code class="language-javascript">import { CancelToken, get } from 'axios'

/** API Wrapper Service Class */
export class ApiService {
  constructor (url = 'http://localhost:5000/') {
    this.url = url
    this.cancelToken = CancelToken.source()
  }

  async httpGet (endpoint = '') {
    this.cancelToken.cancel('Cancelled Ongoing Request')
    this.cancelToken = CancelToken.source()
    const response = await get(`${this.url}${endpoint}`, { cancelToken: this.cancelToken.token })
    return response.data
  }

  getLocations (type) {
    return this.httpGet(`locations/${type}`)
  }

  getLocationSummary (id) {
    return this.httpGet(`locations/${id}/summary`)
  }

  getKingdoms () {
    return this.httpGet('kingdoms')
  }

  getKingdomSize (id) {
    return this.httpGet(`kingdoms/${id}/size`)
  }

  getCastleCount (id) {
    return this.httpGet(`kingdoms/${id}/castles`)
  }

  getKingdomSummary (id) {
    return this.httpGet(`kingdoms/${id}/summary`)
  }

  async getAllKingdomDetails (id) {
    return {
      kingdomSize: await this.getKingdomSize(id),
      castleCount: await this.getCastleCount(id),
      kingdomSummary: await this.getKingdomSummary(id)
    }
  }
}
</code></pre>
<br>
<p>This class is fairly simple.  We're using <a href="https://github.com/mzabriskie/axios">Axios</a> to make our API requests, and we are providing a method to wrap each API endpoint string.</p>
<p>We are using a <code>CancelToken</code> to ensure that we only have one outgoing request at a time.  <strong>This helps to avoid network race-conditions</strong> when a user is rapidly clicking through different locations. This rapid clicking will create lots of HTTP GET requests, and can often result in the wrong data being displayed once they stop clicking.</p>
<p><strong>Without the <code>CancelToken</code> logic, the displayed data would be that for whichever HTTP request finished last, instead of whichever location the user clicked on last.</strong>  By canceling each previous request when a new request is made, we can ensure that the application is only downloading data for the <em>currently selected location</em>.</p>
<h5 id="31initializeapiserviceclass">3.1 - Initialize API Service Class</h5>
<p>In <code>app/main.js</code>, initialize the API class in the constructor.</p>
<pre><code class="language-javascript">import { ApiService } from './services/api'

...

/** Main UI Controller Class */
class ViewController {
  /** Initialize Application */
  constructor () {
    document.getElementById('app').outerHTML = template

    // Initialize API service
    if (window.location.hostname === 'localhost') {
      this.api = new ApiService('http://localhost:5000/')
    } else {
      this.api = new ApiService('https://api.atlasofthrones.com/')
    }

    this.initializeComponents()
  }

  ...
  
}
</code></pre>
<br> 
<p>Here, we'll use <code>localhost:5000</code> as our API URL if the site is being served on a <code>localhost</code> URL, and we'll use the hosted Atlas of Thrones API URL if not.</p>
<p>Don't forget to import the API service near the top of the file!</p>
<blockquote>
<p>If you decided to skip part one of the tutorial, you can just instantiate the API with <code>https://api.atlasofthrones.com/</code> as the default URL.  Be warned, however, that this API could go offline (or have the CORs configuration adjusted to reject cross-domain requests), if it is abused or if I decide that it's costing too much to host.  If you want to publicly deploy your own application using this code, please host your own backend using the instructions from <a href="https://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/">part one of this tutorial</a>.  For assistance in finding affordable hosting, you could also check out an <a href="https://blog.patricktriest.com/host-webapps-free/">article I wrote on how to host your own applications for very cheap</a>.</p>
</blockquote>
<h5 id="32downloadgotlocationdata">3.2 - Download GoT Location Data</h5>
<p>Modify <code>app/main.js</code> with a new method to download the map data.</p>
<pre><code class="language-javascript">/** Main UI Controller Class */
class ViewController {
  /** Initialize Application */
  constructor () {
    document.getElementById('app').outerHTML = template

    // Initialize API service
    if (window.location.hostname === 'localhost') {
      this.api = new ApiService('http://localhost:5000/')
    } else {
      this.api = new ApiService('https://api.atlasofthrones.com/')
    }

    this.locationPointTypes = [ 'castle', 'city', 'town', 'ruin', 'region', 'landmark' ]
    this.initializeComponents()
    this.loadMapData()
  }

...

  /** Load map data from the API */
  async loadMapData () {
    // Download kingdom boundaries
    const kingdomsGeojson = await this.api.getKingdoms()

    // Add data to map
    this.mapComponent.addKingdomGeojson(kingdomsGeojson)
    
    // Show kingdom boundaries
    this.mapComponent.toggleLayer('kingdom')

    // Download location point geodata
    for (let locationType of this.locationPointTypes) {
      // Download GeoJSON + metadata
      const geojson = await this.api.getLocations(locationType)

      // Add data to map
      this.mapComponent.addLocationGeojson(locationType, geojson, this.getIconUrl(locationType))
      
      // Display location layer
      this.mapComponent.toggleLayer(locationType)
    }
  }

  /** Format icon URL for layer type  */
  getIconUrl (layerName) {
    return `https://cdn.patricktriest.com/atlas-of-thrones/icons/${layerName}.svg`
  }
}
</code></pre>
<br>
<p>In the above code, we're calling the API service to download GeoJSON data, and then we're passing this data to the map component.</p>
<p>We're also declaring a small helper method at the end to format a resource URL corresponding to the icon for each layer type (I'm providing the hosted icons currently, but feel free to use your own by adjusting the URL).</p>
<p>Notice a problem?  We haven't defined any methods yet in the map component for adding GeoJSON! Let's do that now.</p>
<h5 id="33addmapcomponentmethodsfordisplayinggeojsondata">3.3 - Add Map Component Methods For Displaying GeoJSON Data</h5>
<p>First, we'll add some methods to <code>app/components/map.js</code> in order to add map layers for the location coordinate GeoJSON (the castles, towns, villages, etc.).</p>
<pre><code class="language-javascript">export class Map extends Component {

  ...

  /** Add location geojson to the leaflet instance */
  addLocationGeojson (layerTitle, geojson, iconUrl) {
    // Initialize new geojson layer
    this.layers[layerTitle] = L.geoJSON(geojson, {
      // Show marker on location
      pointToLayer: (feature, latlng) =&gt; {
        return L.marker(latlng, {
          icon: L.icon({ iconUrl, iconSize: [ 24, 56 ] }),
          title: feature.properties.name })
      },
      onEachFeature: this.onEachLocation.bind(this)
    })
  }

  /** Assign Popup and click listener for each location point */
  onEachLocation (feature, layer) {
    // Bind popup to marker
    layer.bindPopup(feature.properties.name, { closeButton: false })
    layer.on({ click: (e) =&gt; {
      this.setHighlightedRegion(null) // Deselect highlighed region
      const { name, id, type } = feature.properties
      this.triggerEvent('locationSelected', { name, id, type })
    }})
  }
}
</code></pre>
<br>
<p>We are using the Leaflet helper function <code>L.geoJSON</code> to create new map layers using the downloaded GeoJSON data.  We are then binding markers to each GeoJSON feature with <code>L.icon</code>, and attaching a popup to display the feature name when the icon is selected.</p>
<p>Next, we'll add a few methods to set up the kingdom polygon GeoJSON layer.</p>
<pre><code class="language-javascript">export class Map extends Component {

  ...

  /** Add boundary (kingdom) geojson to the leaflet instance */
  addKingdomGeojson (geojson) {
    // Initialize new geojson layer
    this.layers.kingdom = L.geoJSON(geojson, {
      // Set layer style
      style: {
        'color': '#222',
        'weight': 1,
        'opacity': 0.65
      },
      onEachFeature: this.onEachKingdom.bind(this)
    })
  }

  /** Assign click listener for each kingdom GeoJSON item  */
  onEachKingdom (feature, layer) {
    layer.on({ click: (e) =&gt; {
      const { name, id } = feature.properties
      this.map.closePopup() // Deselect selected location marker
      this.setHighlightedRegion(layer) // Highlight kingdom polygon
      this.triggerEvent('locationSelected', { name, id, type: 'kingdom' })
    }})
  }

  /** Highlight the selected region */
  setHighlightedRegion (layer) {
    // If a layer is currently selected, deselect it
    if (this.selected) { this.layers.kingdom.resetStyle(this.selected) }

    // Select the provided region layer
    this.selected = layer
    if (this.selected) {
      this.selected.bringToFront()
      this.selected.setStyle({ color: 'blue' })
    }
  }
}
</code></pre>
<br>
<p>We are loading the GeoJSON data the exact same way as before, using <code>L.geoJSON</code>.  This time we are adding properties and behavior more appropriate for a region boundary polygons than individual coordinates.</p>
<pre><code class="language-javascript">export class Map extends Component {

  ...
  
  /** Toggle map layer visibility */
  toggleLayer (layerName) {
    const layer = this.layers[layerName]
    if (this.map.hasLayer(layer)) {
      this.map.removeLayer(layer)
    } else {
      this.map.addLayer(layer)
    }
  }
}
</code></pre>
<p>Finally, we'll declare a method to allow toggling the visibilities of individual layers.  We'll be enabling all of the layers by default right now, but this toggle method will come in handy later on.</p>
<h5 id="34tryitout">3.4 - Try it out!</h5>
<p>Now that we're actually using the backend, we'll need to start the API server (unless you're using the hosted <code>api.atlasofthrones</code> URL).  You can do this is a separate terminal window using <code>npm start</code>, or you can run the frontend file server and the Node.js in the same window using <code>npm run dev</code>.</p>
<p>Try reloading <code>localhost:8080</code>, and you should see the &quot;Game Of Thrones&quot; data added to the map.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_3_4.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>Nice!</p>
<h3 id="step4showlocationinformationonclick">Step 4 - Show Location Information On Click</h3>
<p>Now let's link the map component with the info panel component in order to display information about each location when it is selected.</p>
<h5 id="40addlistenerformaplocationclicks">4.0 - Add Listener For Map Location Clicks</h5>
<p>You might have noticed in the above code (if you were actually reading it instead of just copy/pasting), that we are calling <code>this.triggerEvent('locationSelected', ...)</code> whenever a map feature is selected.</p>
<p>This is the final useful piece of the <code>Component</code> class that we created earlier.  Using <code>this.triggerEvent</code> we can trigger native Javascript window DOM events to be received by the parent component.</p>
<p>In <code>app/main.js</code>, make the following changes to the <code>initializeComponents</code> method.</p>
<pre><code class="language-javascript">/** Initialize Components with data and event listeners */
initializeComponents () {
  // Initialize Info Panel
  this.infoComponent = new InfoPanel('info-panel-placeholder', {
    data: { apiService: this.api }
  })

  // Initialize Map
  this.mapComponent = new Map('map-placeholder', {
    events: { locationSelected: event =&gt; {
      // Show data in infoComponent on &quot;locationSelected&quot; event
      const { name, id, type } = event.detail
      this.infoComponent.showInfo(name, id, type)
    }}
  })
}
</code></pre>
<br>
<p>We are making two important changes here.</p>
<ol>
<li>We're passing the API service as a data property to the info panel component (we'll get to this in a second).</li>
<li>We're adding a listener for the <code>locationSelected</code> event, which will then trigger the info panel component's <code>showInfo</code> method with the location data.</li>
</ol>
<h5 id="41showlocationinfomationininfopanelcomponent">4.1 - Show Location Infomation In <code>info-panel</code> Component</h5>
<p>Make the following changes to <code>app/components/info-panel/info-panel.js</code>.</p>
<pre><code class="language-javascript">export class InfoPanel extends Component {
  constructor (placeholderId, props) {
    super(placeholderId, props, template)
    this.api = props.data.apiService

    // Toggle info panel on title click
    this.refs.title.addEventListener('click', () =&gt; this.refs.container.classList.toggle('info-active'))
  }

  /** Show info when a map item is selected */
  async showInfo (name, id, type) {
    // Display location title
    this.refs.title.innerHTML = `&lt;h1&gt;${name}&lt;/h1&gt;`

    // Download and display information, based on location type
    this.refs.content.innerHTML = (type === 'kingdom')
      ? await this.getKingdomDetailHtml(id)
      : await this.getLocationDetailHtml(id, type)
  }

  /** Create kingdom detail HTML string */
  async getKingdomDetailHtml (id) {
    // Get kingdom metadata
    let { kingdomSize, castleCount, kingdomSummary } = await this.api.getAllKingdomDetails(id)

    // Convert size to an easily readable string
    kingdomSize = kingdomSize.toLocaleString(undefined, { maximumFractionDigits: 0 })

    // Format summary HTML
    const summaryHTML = this.getInfoSummaryHtml(kingdomSummary)

    // Return filled HTML template
    return `
      &lt;h3&gt;KINGDOM&lt;/h3&gt;
      &lt;div&gt;Size Estimate - ${kingdomSize} km&lt;sup&gt;2&lt;/sup&gt;&lt;/div&gt;
      &lt;div&gt;Number of Castles - ${castleCount}&lt;/div&gt;
      ${summaryHTML}
      `
  }

  /** Create location detail HTML string */
  async getLocationDetailHtml (id, type) {
    // Get location metadata
    const locationInfo = await this.api.getLocationSummary(id)

    // Format summary template
    const summaryHTML = this.getInfoSummaryHtml(locationInfo)

    // Return filled HTML template
    return `
      &lt;h3&gt;${type.toUpperCase()}&lt;/h3&gt;
      ${summaryHTML}`
  }

  /** Format location summary HTML template */
  getInfoSummaryHtml (info) {
    return `
      &lt;h3&gt;Summary&lt;/h3&gt;
      &lt;div&gt;${info.summary}&lt;/div&gt;
      &lt;div&gt;&lt;a href=&quot;${info.url}&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Read More...&lt;/a&gt;&lt;/div&gt;`
  }
}
</code></pre>
<br>
<p>In the constructor, we are now receiving the API service from the <code>props.data</code> parameter, and assigning it to an instance variable.</p>
<p>We're also defining 4 new methods, which will work together to receive and display information on the selected location.  The methods will -</p>
<ol>
<li>Call the API to retrieve metadata about the selected location.</li>
<li>Generate HTML to display this information using Javascript ES7 template literals.</li>
<li>Insert that HTML into the <code>info-panel</code> component HTML.</li>
</ol>
<p>This is less graceful than having the data-binding capabilities of a full Javascript framework, but is still a reasonably simple way to insert HTML content into the DOM using the native Javascript browser APIs.</p>
<h5 id="42tryitout">4.2 - Try It Out!</h5>
<p>Now that everything is wired-up, reload the test page at <code>localhost:8080</code> and try clicking on a location or kingdom.  We'll see the title of that entity appear in the info-panel header, and clicking on the header will reveal the full location description.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_4_2.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<p>Sweeeeet.</p>
<h3 id="part5addlayerpanelcomponent">Part 5 - Add Layer Panel Component</h3>
<p>The map is pretty fun to explore now, but it's a bit crowded with all of the icons showing at once.  Let's create a new component that will allow us to toggle individual map layers.</p>
<h5 id="50addlayerpaneltemplateandstyling">5.0 - Add Layer Panel Template and Styling</h5>
<p>Create a new directory - <code>app/components/layer-panel</code>.</p>
<p>Add the following template to <code>app/components/layer-panel/layer-panel.html</code></p>
<pre><code class="language-html">&lt;div ref=&quot;panel&quot; class=&quot;layer-panel&quot;&gt;
  &lt;div ref=&quot;toggle&quot; class=&quot;layer-toggle&quot;&gt;Layers&lt;/div&gt;
  &lt;div class=&quot;layer-panel-content&quot;&gt;
    &lt;h3&gt;Layers&lt;/h3&gt;
    &lt;div ref=&quot;buttons&quot; class=&quot;layer-buttons&quot;&gt;&lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>Next, add some component styling to <code>app/components/layer-panel/layer-panel.scss</code></p>
<pre><code class="language-scss">@import '../../_variables.scss';

.layer-toggle {
  display: none;
}

.layer-panel {
  position: absolute;
  top: $panelMargin;
  right: $panelMargin;
  padding: 12px;
  background: $midDark;
  z-index: 1000;
  color: $offWhite;

  h3 {
    text-align: center;
    text-transform: uppercase;
    margin: 0 auto;
  }
}

.layer-buttons {
  text-transform: uppercase;

  div {
    color: $grey;
    border-top: 1px solid $offWhite;
    padding: 6px;
    cursor: pointer;
    user-select: none;
    font-family: $fontNormal;
  }

  div.toggle-active {
    color: $offWhite;
  }

  :last-child {
    border-bottom: 1px solid $offWhite;
  }
}

@media (max-width: $breakpointMobile) {
  .layer-panel {
    display: inline-flex;
    align-items: center;
    top: 15%;
    right: 0;
    transform: translateX(calc(100% - #{$toggleLayerPanelButtonWidth}));
    transition: all 0.3s ease-in-out;
  }

  .layer-panel.layer-panel-active {
    transform: translateX(0);
  }

  .layer-toggle {
    cursor: pointer;
    display: block;
    width: $toggleLayerPanelButtonWidth;
    transform: translateY(120%) rotate(-90deg);
    padding: 10px;
    margin-left: -20px;
    letter-spacing: 1rem;
    text-transform: uppercase;
  }
}
</code></pre>
<br>
<h5 id="51addlayerpanelbehavior">5.1 - Add Layer Panel Behavior</h5>
<p>Add a new file <code>app/components/layer-panel/layer-panel.js</code>.</p>
<pre><code class="language-javascript">import './layer-panel.scss'
import template from './layer-panel.html'
import { Component } from '../component'

/**
 * Layer Panel Component
 * Render and control layer-toggle side-panel
 */
export class LayerPanel extends Component {
  constructor (placeholderId, props) {
    super(placeholderId, props, template)

    // Toggle layer panel on click (mobile only)
    this.refs.toggle.addEventListener('click', () =&gt; this.toggleLayerPanel())

    // Add a toggle button for each layer
    props.data.layerNames.forEach((name) =&gt; this.addLayerButton(name))
  }

  /** Create and append new layer button DIV */
  addLayerButton (layerName) {
    let layerItem = document.createElement('div')
    layerItem.textContent = `${layerName}s`
    layerItem.setAttribute('ref', `${layerName}-toggle`)
    layerItem.addEventListener('click', (e) =&gt; this.toggleMapLayer(layerName))
    this.refs.buttons.appendChild(layerItem)
  }

  /** Toggle the info panel (only applies to mobile) */
  toggleLayerPanel () {
    this.refs.panel.classList.toggle('layer-panel-active')
  }

  /** Toggle map layer visibility */
  toggleMapLayer (layerName) {
    // Toggle active UI status
    this.componentElem.querySelector(`[ref=${layerName}-toggle]`).classList.toggle('toggle-active')

    // Trigger layer toggle callback
    this.triggerEvent('layerToggle', layerName)
  }
}
</code></pre>
<br>
<p>Here we've created a new LayerPanel component class that takes an array of layer names and renders them as a list of buttons.  The component will also emit a <code>layerToggle</code> event whenever one of these buttons is pressed.</p>
<p>Easy enough so far.</p>
<h5 id="52instantiatelayerpanel">5.2 - Instantiate <code>layer-panel</code></h5>
<p>In <code>app/main.js</code>, add the following code at the bottom of the <code>initializeComponents</code> method.</p>
<pre><code class="language-javascript">import { LayerPanel } from './components/layer-panel/layer-panel'

class ViewController {

  ...
  
  initializeComponents () {
    
    ...
    
    // Initialize Layer Toggle Panel
    this.layerPanel = new LayerPanel('layer-panel-placeholder', {
      data: { layerNames: ['kingdom', ...this.locationPointTypes] },
      events: { layerToggle:
        // Toggle layer in map controller on &quot;layerToggle&quot; event
        event =&gt; { this.mapComponent.toggleLayer(event.detail) }
      }
    })
  }

  ...
}
</code></pre>
<br>
<p>We are inflating the layer panel into the <code>layer-panel-placeholder</code> element, and we are passing in the layer names as data.  When the component triggers the <code>layerToggle</code> event, the callback will then toggle the layer within the map component.</p>
<p>Don't forget to import the <code>LayerPanel</code> component at the top of <code>main.js</code>!</p>
<blockquote>
<p>Note - Frontend code tends to get overly complicated when components are triggering events within sibling components.  It's fine here since our app is relatively small, but be aware that passing messages/data between components through their parent component can get very messy very fast.  In a larger application, it's a good idea to use a centralized state container (with strict unidirectional data flow) to reduce this complexity, such as <a href="http://redux.js.org/">Redux</a> in React, <a href="https://github.com/vuejs/vuex">Vuex</a> in Vue, and <a href="https://github.com/ngrx/store">ngrx/store</a> in Angular.</p>
</blockquote>
<h5 id="53syncinitiallayerloadingwithlayerpanelcomponent">5.3 - Sync Initial Layer Loading With Layer Panel Component</h5>
<p>The only additional change that we should still make is in the <code>loadMapData</code> method of <code>app/main.js</code>.  In this method, change each call of <code>this.mapComponent.toggleLayer</code> to <code>this.layerPanel.toggleMapLayer</code> in order to sync up the initial layer loading with our new layer panel UI component.</p>
<h5 id="53tryitout">5.3 - Try it out!</h5>
<p>Aaaaaand that's it. Since our map component already has the required <code>toggleLayer</code> method, we shouldn't need to add anything more to make our layer panel work.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_5_3.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<h3 id="step6addlocationsearch">Step 6 - Add Location Search</h3>
<p>Ok, <strong>we're almost done now</strong>, just one final component to add - the search bar.</p>
<h5 id="60clientsidesearchvsserversidesearch">6.0 - Client-Side Search vs. Server-Side Search</h5>
<p>Originally, I had planned to do the search on the server-side using the string-matching functionality of our PostgreSQL database.  I was even considering writing a part III tutorial on setting up a search microservice using <a href="https://www.elastic.co/products/elasticsearch">ElasticSearch</a> (let me know if you would still like to see a tutorial on this!).</p>
<p>Then I realized two things -</p>
<ol>
<li>We're already downloading <em>all</em> of the location titles up-front in order to render the GeoJSON.</li>
<li>There are less than 300 total entities that we need to search through.</li>
</ol>
<p>Given these two facts, it became apparent that this is one of those rare cases where performing the search on the client-side is actually the most appropriate option.</p>
<p>To perform this search, we'll add the lightweight-yet-powerful <a href="http://fusejs.io/">Fuse.js</a> library to our app.  As with out API operations, we'll wrap Fuse.js inside a service class to provide a layer of abstraction to our search functionality.</p>
<h5 id="61addsearchservice">6.1 - Add Search Service</h5>
<p>Before adding our search bar, we'll need to create a new service class to actually perform the search on our location data.</p>
<p>Add a new file <code>app/services/search.js</code></p>
<pre><code class="language-javascript">import Fuse from 'fuse.js'

/** Location Search Service Class */
export class SearchService {
  constructor () {
    this.options = {
      keys: ['name'],
      shouldSort: true,
      threshold: 0.3,
      location: 0,
      distance: 100,
      maxPatternLength: 32,
      minMatchCharLength: 1
    }

    this.searchbase = []
    this.fuse = new Fuse([], this.options)
  }

  /** Add JSON items to Fuse intance searchbase
   * @param { Object[] } geojson Array geojson items to add
   * @param { String } geojson[].properties.name Name of the GeoJSON item
   * @param { String } geojson[].properties.id ID of the GeoJSON item
   * @param { String } layerName Name of the geojson map layer for the given items
  */
  addGeoJsonItems (geojson, layerName) {
    // Add items to searchbase
    this.searchbase = this.searchbase.concat(geojson.map((item) =&gt; {
      return { layerName, name: item.properties.name, id: item.properties.id }
    }))

    // Re-initialize fuse search instance
    this.fuse = new Fuse(this.searchbase, this.options)
  }

  /** Search for the provided term */
  search (term) {
    return this.fuse.search(term)
  }
}
</code></pre>
<br>
<p>Using this new class, we can directly pass our GeoJSON arrays to the search service in order to index the title, id, and layer name of each item.  We can then simply call the <code>search</code> method of the class instance to perform a fuzzy-search on all of these items.</p>
<blockquote>
<p>&quot;Fuzzy-search&quot; refers to a search that can provide inexact matches for the query string to accommodate for typos.  This is very useful for location search, particularly for &quot;Game of Thrones&quot;, since many users will only be familiar with how a location <em>sounds</em> (when spoken in the TV show) instead of it's precise spelling.</p>
</blockquote>
<h5 id="62addgeojsondatatosearchservice">6.2 - Add GeoJSON data to search service</h5>
<p>We'll modify our <code>loadMapData()</code> function in <code>main.js</code> to add the downloaded data to our search service in addition to our map.</p>
<p>Make the following changes to <code>app/main.js</code>.</p>
<pre><code class="language-javascript">import { SearchService } from './services/search'

class ViewController {

  constructor () {
    ...

    this.searchService = new SearchService()
    
    ...
  }
  
  ...

  /** Load map data from the API */
  async loadMapData () {
    // Download kingdom boundaries
    const kingdomsGeojson = await this.api.getKingdoms()

    // Add boundary data to search service
    this.searchService.addGeoJsonItems(kingdomsGeojson, 'kingdom')

    // Add data to map
    this.mapComponent.addKingdomGeojson(kingdomsGeojson)

    // Show kingdom boundaries
    this.layerPanel.toggleMapLayer('kingdom')

    // Download location point geodata
    for (let locationType of this.locationPointTypes) {
      // Download location type GeoJSON
      const geojson = await this.api.getLocations(locationType)

      // Add location data to search service
      this.searchService.addGeoJsonItems(geojson, locationType)

      // Add data to map
      this.mapComponent.addLocationGeojson(locationType, geojson, this.getIconUrl(locationType))
    }
  }
}
</code></pre>
<br>
<p>We are now instantiating a <code>SearchService</code> instance in the constructor, and calling <code>this.searchService.addGeoJsonItems</code> after each GeoJSON request.</p>
<p>Great, that wasn't too difficult.  Now at the bottom of the function, you could test it out with, say, <code>console.log(this.searchService.search('winter'))</code> to view the search results in the browser console.</p>
<h5 id="63addsearchbarcomponent">6.3 - Add Search Bar Component</h5>
<p>Now let's actually build the search bar.</p>
<p>First, create a new directory - <code>app/components/search-bar</code>.</p>
<p>Add the HTML template to <code>app/components/search-bar/search-bar.html</code></p>
<pre><code class="language-html">&lt;div class=&quot;search-container&quot;&gt;
  &lt;div class=&quot;search-bar&quot;&gt;
    &lt;input ref=&quot;input&quot; type=&quot;text&quot; name=&quot;search&quot; placeholder=&quot;Search...&quot; class=&quot;search-input&quot;&gt;&lt;/input&gt;
  &lt;/div&gt;
  &lt;div ref=&quot;results&quot; class=&quot;search-results&quot;&gt;&lt;/div&gt;
&lt;/div&gt;
</code></pre>
<br>
<p>Add component styling to <code>app/components/search-bar/search-bar.scss</code></p>
<pre><code class="language-scss">@import '../../_variables.scss';

.search-container {
  position: absolute;
  top: $panelMargin;
  left: $panelMargin;
  background: transparent;
  z-index: 1000;
  color: $offWhite;
  box-sizing: border-box;
  font-family: $fontNormal;

  input[type=text] {
    width: $searchBarHeight;
    -webkit-transition: width 0.4s ease-in-out;
    transition: width 0.4s ease-in-out;
    cursor: pointer;
}

  /* When the input field gets focus, change its width to 100% */
  input[type=text]:focus {
      width: $leftPanelsWidth;
      outline: none;
      cursor: text;
  }
}

.search-results {
  margin-top: 4px;
  border-radius: 4px;
  background: $midDark;

  div {
    padding: 16px;
    cursor: pointer;
  }

  div:hover {
    background: $lightDark;
  }
}

.search-bar, .search-input {
  height: $searchBarHeight;
}

.search-input {
  background-color: $midDark;
  color: $offWhite;
  border: 3px $lightDark solid;
  border-radius: 4px;
  font-size: 1rem;
  padding: 4px;
  background-image: url('https://storage.googleapis.com/material-icons/external-assets/v4/icons/svg/ic_search_white_18px.svg');
  background-position: 20px 17px;
  background-size: 24px 24px;
  background-repeat: no-repeat;
  padding-left: $searchBarHeight;
}

@media (max-width: $breakpointMobile) {
  .search-container {
    width: 100%;
    top: 0;
    left: 0;

    .search-input {
      border-radius: 0;
    }

    .search-results {
      margin-top: 0;
      border-radius: 0;
    }
  }
}
</code></pre>
<br>
<p>Finally, we'll tie it all together in the component JS file -  <code>app/components/search-bar/search-bar.js</code></p>
<pre><code class="language-javascript">import './search-bar.scss'
import template from './search-bar.html'
import { Component } from '../component'

/**
 * Search Bar Component
 * Render and manage search-bar and search results.
 * @extends Component
 */
export class SearchBar extends Component {
  /** SearchBar Component Constructor
   * @param { Object } props.events.resultSelected Result selected event listener
   * @param { Object } props.data.searchService SearchService instance to use
   */
  constructor (placeholderId, props) {
    super(placeholderId, props, template)
    this.searchService = props.data.searchService
    this.searchDebounce = null

    // Trigger search function for new input in searchbar
    this.refs.input.addEventListener('keyup', (e) =&gt; this.onSearch(e.target.value))
  }

  /** Receive search bar input, and debounce by 500 ms */
  onSearch (value) {
    clearTimeout(this.searchDebounce)
    this.searchDebounce = setTimeout(() =&gt; this.search(value), 500)
  }

  /** Search for the input term, and display results in UI */
  search (term) {
    // Clear search results
    this.refs.results.innerHTML = ''

    // Get the top ten search results
    this.searchResults = this.searchService.search(term).slice(0, 10)

    // Display search results on UI
    this.searchResults.forEach((result) =&gt; this.displaySearchResult(result))
  }

  /** Add search result row to UI */
  displaySearchResult (searchResult) {
    let layerItem = document.createElement('div')
    layerItem.textContent = searchResult.name
    layerItem.addEventListener('click', () =&gt; this.searchResultSelected(searchResult))
    this.refs.results.appendChild(layerItem)
  }

  /** Display the selected search result  */
  searchResultSelected (searchResult) {
    // Clear search input and results
    this.refs.input.value = ''
    this.refs.results.innerHTML = ''

    // Send selected result to listeners
    this.triggerEvent('resultSelected', searchResult)
  }
}
</code></pre>
<br>
<p>As you can see, we're expecting the component to receive a <code>SearchService</code> instance as a data property, and to emit a <code>resultSelected</code> event.</p>
<p>The rest of the class is pretty straightforward.  The component will listen for changes to the search input element (debounced by 500 ms) and will then search for the input term using the <code>SearchService</code> instance.</p>
<blockquote>
<p>&quot;Debounce&quot; refers to the practice of waiting for a break in the input before executing an operation.  Here, the component is configured to wait for a break of at least 500ms between keystrokes before performing the search. 500ms was chosen since the average computer user types at 8,000 keystrokes-per-hour, or one keystroke every 450 milliseconds.  Using debounce is an important performance optimization to avoid computing new search results every time the user taps a key.</p>
</blockquote>
<p>The component will then render the search results as a list in the <code>searchResults</code> container div and will emit the <code>resultSelected</code> event when a result is clicked.</p>
<h5 id="64instantiatethesearchbarcomponent">6.4 - Instantiate the Search Bar Component</h5>
<p>Now that the search bar component is built, we can simply instantiate it with the required properties in <code>app/main.js</code>.</p>
<pre><code class="language-javascript">import { SearchBar } from './components/search-bar/search-bar'

class ViewController {

  ...

  initializeComponents () {
    
    ...
    
    // Initialize Search Panel
    this.searchBar = new SearchBar('search-panel-placeholder', {
      data: { searchService: this.searchService },
      events: { resultSelected: event =&gt; {
        // Show result on map when selected from search results
        let searchResult = event.detail
        if (!this.mapComponent.isLayerShowing(searchResult.layerName)) {
          // Show result layer if currently hidden
          this.layerPanel.toggleMapLayer(searchResult.layerName)
        }
        this.mapComponent.selectLocation(searchResult.id, searchResult.layerName)
      }}
    })
  }
  
  ...
  
}
</code></pre>
<br>
<p>In the component properties, we're defining a listener for the <code>resultSelected</code> event. This listener will add the map layer of the selected result if it is not currently visible, and will select the location within the map component.</p>
<h5 id="65addmethodstomapcomponent">6.5 - Add Methods To Map Component</h5>
<p>In the above listener, we're using two new methods in the map component - <code>isLayerShowing</code> and <code>selectLocation</code>.  Let's add these methods to <code>app/components/map</code>.</p>
<pre><code class="language-javascript">export class Map extends Component {
  
  ...

  /** Check if layer is added to map  */
  isLayerShowing (layerName) {
    return this.map.hasLayer(this.layers[layerName])
  }

  /** Trigger &quot;click&quot; on layer with provided name */
  selectLocation (id, layerName) {
    // Find selected layer
    const geojsonLayer = this.layers[layerName]
    const sublayers = geojsonLayer.getLayers()
    const selectedSublayer = sublayers.find(layer =&gt; {
      return layer.feature.geometry.properties.id === id
    })

    // Zoom map to selected layer
    if (selectedSublayer.feature.geometry.type === 'Point') {
      this.map.flyTo(selectedSublayer.getLatLng(), 5)
    } else {
      this.map.flyToBounds(selectedSublayer.getBounds(), 5)
    }

    // Fire click event
    selectedSublayer.fireEvent('click')
  }
}
</code></pre>
<br>
<p>The <code>isLayerShowing</code> method simply returns a boolean representing whether the layer is currently added to the <code>leaflet</code> map.</p>
<p>The <code>selectLocation</code> method is slightly more complicated.  It will first find the selected geographic feature by searching for a matching ID in the corresponding layer.  It will then call the leaflet method <code>flyTo</code> (for locations) or <code>flyToBounds</code> (for kingdoms), in order to center the map on the selected location.  Finally, it will emit the <code>click</code> event from the map component, in order to display the selected region's information in the info panel.</p>
<h5 id="66tryitout">6.6 - Try it out!</h5>
<p>The webapp is now complete!  It should look like this.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/step_6_6.jpg" alt="Build An Interactive Game of Thrones Map (Part II) - Leaflet.js & Webpack"></p>
<h3 id="nextsteps">Next Steps</h3>
<p>Congrats, you've just built a frameworkless &quot;Game of Thrones&quot; web map!</p>
<p>Whew, this tutorial was a bit longer than expected.</p>
<p>You can view the completed webapp here - <a href="https://atlasofthrones.com/">https://atlasofthrones.com/</a></p>
<p>There are lots of ways that you could build out the app from this point.</p>
<ul>
<li>Polish the design and make the map beautiful.</li>
<li>Build an online, multiplayer strategy game such as <a href="https://en.wikipedia.org/wiki/Diplomacy_(game)">Diplomacy</a> or <a href="https://en.wikipedia.org/wiki/Risk_(game)">Risk</a> using this codebase as a foundation.</li>
<li>Modify the application to show geo-data from your favorite fictional universe.  Keep in mind that <a href="http://www.bostongis.com/blog/index.php?/archives/266-geography-type-is-not-limited-to-earth.html">the PostGIS geography type is not limited to earth</a>.</li>
<li>If you are a &quot;Game of Thrones&quot; expert (and/or if you are George R. R. Martin), use a program such as <a href="http://www.qgis.org/en/site/">QGIS</a> to augment the included open-source location data with your own knowledge.</li>
<li><strong>Build a useful real-world application</strong> using civic open-data, such as <a href="https://www.crisiscleanup.org/public_map">this map</a> visualizing active work-orders from recent US natural disasters such as Hurricanes Harvey and Irma.</li>
</ul>
<p>You can find the complete open-source codebase here - <a href="https://github.com/triestpa/Atlas-Of-Thrones">https://github.com/triestpa/Atlas-Of-Thrones</a></p>
<p>Thanks for reading, feel free to comment below with any feedback, ideas, and suggestions!</p>
</div>]]></content:encoded></item><item><title><![CDATA[Build An Interactive Game of Thrones Map (Part I) - Node.js,  PostGIS, and Redis]]></title><description><![CDATA[A 20-minute guide to building a Node.js API to serve geospatial "Game of Thrones" data from PostgreSQL (with the PostGIS extension) and Redis.]]></description><link>http://blog.patricktriest.com/game-of-thrones-map-node-postgres-redis/</link><guid isPermaLink="false">59a7b37887161e1db0107f33</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Javascript]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Sun, 03 Sep 2017 12:00:00 GMT</pubDate><media:content url="https://blog-images.patricktriest.com/uploads/painted_table.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h3 id="agameofmaps">A Game of Maps</h3>
<img src="https://blog-images.patricktriest.com/uploads/painted_table.jpg" alt="Build An Interactive Game of Thrones Map (Part I) - Node.js,  PostGIS, and Redis"><p><em>Have you ever wondered how &quot;Google Maps&quot; might be working in the background?</em></p>
<p><em>Have you watched &quot;Game of Thrones&quot; and been confused about where all of the castles and cities are located in relation to each other?</em></p>
<p><em>Do you not care about &quot;Game of Thrones&quot;, but still want a guide to setting up a Node.js server with PostgreSQL and Redis?</em></p>
<p>In this 20 minute tutorial, we'll walk through building a Node.js API to serve geospatial &quot;Game of Thrones&quot; data from PostgreSQL (with the PostGIS extension) and Redis.</p>
<p><a href="https://blog.patricktriest.com/game-of-thrones-leaflet-webpack/">Part II</a> of this series provides a tutorial on building a &quot;Google Maps&quot; style web application to visualize the data from this API.</p>
<p>Check out <a href="https://atlasofthrones.com/">https://atlasofthrones.com/</a> for a preview of the final product.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/got_map.jpg" alt="Build An Interactive Game of Thrones Map (Part I) - Node.js,  PostGIS, and Redis"></p>
<br>
<h3 id="step0setuplocaldependencies">Step 0 - Setup Local Dependencies</h3>
<p>Before starting, we'll need to install the project dependencies.</p>
<h5 id="00postgresqlandpostgis">0.0 - PostgreSQL and PostGIS</h5>
<p>The primary datastore for this app is <a href="https://www.postgresql.org/">PostgreSQL</a>.  Postgres is a powerful and modern SQL database, and is a very solid choice for any app that requires storing and querying relational data.  We'll also be using the <a href="http://postgis.net/">PostGIS</a> spatial database extender for Postgres, which will allow us to run advanced queries and operations on geographic datatypes.</p>
<p>This page contains the official download and installation instructions for PostgreSQL - <a href="https://www.postgresql.org/download/">https://www.postgresql.org/download/</a></p>
<p>Another good resource for getting started with Postgres can be found here - <a href="http://postgresguide.com/setup/install.html">http://postgresguide.com/setup/install.html</a></p>
<p>If you are using a version of PostgreSQL that does not come bundled with PostGIS, you can find installation guides for PostGIS here -<br>
<a href="http://postgis.net/install/">http://postgis.net/install/</a></p>
<h5 id="01redis">0.1 - Redis</h5>
<p>We'll be using <a href="https://redis.io/">Redis</a> in order to cache API responses.  Redis is an in-memory key-value datastore that will enable our API to serve data with single-digit millisecond response times.</p>
<p>Installation instructions for Redis can be found here - <a href="https://redis.io/topics/quickstart">https://redis.io/topics/quickstart</a></p>
<h5 id="02nodejs">0.2 - Node.js</h5>
<p>Finally, we'll need <a href="https://nodejs.org/">Node.js</a> v7.6 or above to run our core application server and endpoint handlers, and to interface with the two datastores.</p>
<p>Installation instructions for Node.js can be found here -<br>
<a href="https://nodejs.org/en/download/">https://nodejs.org/en/download/</a></p>
<h3 id="step1gettingstartedwithpostgres">Step 1 - Getting Started With Postgres</h3>
<h5 id="10downloaddatabasedump">1.0 - Download Database Dump</h5>
<p>To keep things simple, we'll be using a pre-built database dump for this project.</p>
<blockquote>
<p>The database dump contains polygons and coordinate points for locations in the &quot;Game of Thrones&quot; world, along with their text description data.  The geo-data is based on multiple open source contributions, which I've cleaned and combined with text data scraped from <a href="http://awoiaf.westeros.org/index.php/Main_Page">A Wiki of Ice and Fire</a>, <a href="http://gameofthrones.wikia.com/wiki/Game_of_Thrones_Wiki">Game of Thrones Wiki</a>, and <a href="http://www.westeroscraft.com/home/">WesterosCraft</a>.  More detailed attribution can be found <a href="https://github.com/triestpa/Atlas-Of-Thrones/blob/master/attribution.md">here</a>.</p>
</blockquote>
<p>In order to load the database locally, first download the database dump.</p>
<pre><code class="language-bash">wget https://cdn.patricktriest.com/atlas-of-thrones/atlas_of_thrones.sql
</code></pre>
<br>
<h5 id="11createpostgresuser">1.1 - Create Postgres User</h5>
<p>We'll need to create a user in the Postgres database.</p>
<blockquote>
<p>If you already have a Postgres instance with users/roles set up, feel free to skip this step.</p>
</blockquote>
<p>Run <code>psql -U postgres</code> on the command line to enter the Postgres shell as the default <code>postgres</code> user.  You might need to run this command as root (with <code>sudo</code>) or as the Postgres user in the operating system (with <code>sudo -u postgres psql</code>) depending on how Postgres is installed on your machine.</p>
<pre><code class="language-bash">psql -U postgres
</code></pre>
<br>
<p>Next, create a new user in Postgres.</p>
<pre><code class="language-sql">CREATE USER patrick WITH PASSWORD 'the_best_passsword';
</code></pre>
<br>
<p>In case it wasn't obvious, you should replace <code>patrick</code> and <code>the_best_passsword</code> in the above command with your desired username and password respectively.</p>
<h5 id="12createatlas_of_thronesdatabase">1.2 - Create &quot;atlas_of_thrones&quot; Database</h5>
<p>Next, create a new database for your project.</p>
<pre><code class="language-sql">CREATE DATABASE atlas_of_thrones;
</code></pre>
<br>
<p>Grant query privileges in the new database to your newly created user.</p>
<pre><code class="language-sql">GRANT ALL PRIVILEGES ON DATABASE atlas_of_thrones to patrick;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO patrick;
</code></pre>
<br>
<p>Then connect to this new database, and activate the PostGIS extension.</p>
<pre><code class="language-sql">\c atlas_of_thrones
CREATE EXTENSION postgis;
</code></pre>
<br>
<p>Run <code>\q</code> to exit the Postgres shell.</p>
<h5 id="13importdatabasedump">1.3 - Import Database Dump</h5>
<p>Load the downloaded SQL dump into your newly created database.</p>
<pre><code class="language-bash">psql -d atlas_of_thrones &lt; atlas_of_thrones.sql
</code></pre>
<br>
<h5 id="14listdatabsetables">1.4 - List Databse Tables</h5>
<p>If you've had no errors so far, congrats!</p>
<p>Let's enter the <code>atlas_of_thrones</code> database from the command line.</p>
<pre><code class="language-bash">psql -d atlas_of_thrones -U patrick
</code></pre>
<br>
<p>Again, substitute &quot;patrick&quot; here with your username.</p>
<p>Once we're in the Postgres shell, we can get a list of available tables with the <code>\dt</code> command.</p>
<pre><code class="language-sql">\dt
</code></pre>
<pre><code class="language-sql">             List of relations
 Schema |      Name       | Type  |  Owner  
--------+-----------------+-------+---------
 public | kingdoms        | table | patrick
 public | locations       | table | patrick
 public | spatial_ref_sys | table | patrick
(3 rows)
</code></pre>
<br>
<h5 id="15inspecttableschema">1.5 - Inspect Table Schema</h5>
<p>We can inspect the schema of an individual table by running</p>
<pre><code class="language-sql">\d kingdoms
</code></pre>
<pre><code class="language-sql">                                      Table &quot;public.kingdoms&quot;
  Column   |             Type             |                        Modifiers                        
-----------+------------------------------+---------------------------------------------------------
 gid       | integer                      | not null default nextval('political_gid_seq'::regclass)
 name      | character varying(80)        | 
 claimedby | character varying(80)        | 
 geog      | geography(MultiPolygon,4326) | 
 summary   | text                         | 
 url       | text                         | 
Indexes:
    &quot;political_pkey&quot; PRIMARY KEY, btree (gid)
    &quot;political_geog_idx&quot; gist (geog)
</code></pre>
<br>
<h5 id="16queryallkingdoms">1.6 - Query All Kingdoms</h5>
<p>Now, let's get a list of all of the kingdoms, with their corresponding names, claimants, and ids.</p>
<pre><code class="language-sql">SELECT name, claimedby, gid FROM kingdoms;
</code></pre>
<pre><code class="language-sql">       name       |   claimedby   | gid 
------------------+---------------+-----
 The North        | Stark         |   5
 The Vale         | Arryn         |   8
 The Westerlands  | Lannister     |   9
 Riverlands       | Tully         |   1
 Gift             | Night's Watch |   3
 The Iron Islands | Greyjoy       |   2
 Dorne            | Martell       |   6
 Stormlands       | Baratheon     |   7
 Crownsland       | Targaryen     |  10
 The Reach        | Tyrell        |  11
(10 rows)
</code></pre>
<br>
<p>Nice!  If you're familiar with Game of Thrones, these names probably look familiar.</p>
<h5 id="17queryalllocationtypes">1.7 - Query All Location Types</h5>
<p>Let's try out one more query, this time on the <code>location</code> table.</p>
<pre><code class="language-sql">SELECT DISTINCT type FROM locations;
</code></pre>
<pre><code class="language-sql">   type   
----------
 Landmark
 Ruin
 Castle
 City
 Region
 Town
(6 rows)
</code></pre>
<br>
<p>This query returns a list of available <code>location</code> entity types.</p>
<p>Go ahead and exit the Postgres shell with <code>\q</code>.</p>
<h3 id="step2setupnodejsproject">Step 2 - Setup NodeJS project</h3>
<h5 id="20clonestarterrepository">2.0 - Clone Starter Repository</h5>
<p>Run the following commands to clone the starter project and install the dependencies</p>
<pre><code class="language-bash">git clone -b backend-starter https://github.com/triestpa/Atlas-Of-Thrones
cd Atlas-Of-Thrones
npm install
</code></pre>
<br>
<p>The starter branch includes a base directory template, with dependencies declared in package.json. It is configured with <a href="https://github.com/eslint/eslint">ESLint</a> and <a href="https://github.com/standard/standard">JavaScript Standard Style</a>.</p>
<blockquote>
<p>If the lack of semicolons in this style guide makes you uncomfortable, that's fine, you're welcome to switch the project to another style in the <code>.eslintrc.js</code> config.</p>
</blockquote>
<h5 id="21addenvfile">2.1 - Add .env file</h5>
<p>Before starting, we'll need to add a <code>.env</code> file to the project root in order to provide environment variables (such as database credentials and CORs configuration) for the Node.js app to use.</p>
<p>Here's a sample <code>.env</code> file with sensible defaults for local development.</p>
<pre><code class="language-bash">PORT=5000
DATABASE_URL=postgres://patrick:@localhost:5432/atlas_of_thrones?ssl=false
REDIS_HOST=localhost
REDIS_PORT=6379
CORS_ORIGIN=http://localhost:8080
</code></pre>
<br>
<p>You'll need to change the &quot;patrick&quot; in the DATABASE_URL entry to match your Postgres user credentials. Unless your name is Patrick, that is, in which case it might already be fine.</p>
<p>A very simple <code>index.js</code> file with the following contents is in the project root directory.</p>
<pre><code class="language-javascript">require('dotenv').config()
require('./server')
</code></pre>
<br>
<p>This will load the variables defined in <code>.env</code> into the process environment, and will start the app defined in the <code>server</code> directory.  Now that everything is setup, we're (finally) ready to actually begin building our app!</p>
<blockquote>
<p>Setting authentication credentials and other environment specific configuration using ENV variables is a good, language agnostic way to handle this information.  For a tutorial like this it might be considered overkill, but I've encountered quite a few production Node.js servers that are omitting these basic best practices (using hardcoded credentials checked into Git for instance). I imagine these bad practices may have been learned from tutorials which skip these important steps, so I try to focus my tutorial code on providing examples of best practices.</p>
</blockquote>
<h3 id="step3initializebasickoaserver">Step 3 - Initialize basic Koa server</h3>
<p>We'll be using <a href="https://github.com/koajs/koa">Koa.js</a> as an API framework.  Koa is a sequel-of-sorts to the wildly popular <a href="https://github.com/expressjs/express">Express.js</a>.  It was built by the same team as Express, with a focus on minimalism, clean control flow, and modern conventions.</p>
<h5 id="30importdependencies">3.0 - Import Dependencies</h5>
<p>Open <code>server/index.js</code> to begin setting up our server.</p>
<p>First, import the required dependencies at the top of the file.</p>
<pre><code class="language-javascript">const Koa = require('koa')
const cors = require('kcors')
const log = require('./logger')
const api = require('./api')
</code></pre>
<br>
<h5 id="31initializeapp">3.1 - Initialize App</h5>
<p>Next, we'll initialize our Koa app, and retrieve the API listening port and CORs settings from the local environment variables.</p>
<p>Add the following (below the imports) in <code>server/index.js</code>.</p>
<pre><code class="language-javascript">// Setup Koa app
const app = new Koa()
const port = process.env.PORT || 5000

// Apply CORS config
const origin = process.env.CORS_ORIGIN | '*'
app.use(cors({ origin }))
</code></pre>
<br>
<h5 id="32definedefaultmiddleware">3.2 - Define Default Middleware</h5>
<p>Now we'll define two middleware functions with <code>app.use</code>.  These functions will be applied to every request.  The first function will log the response times, and the second will catch any errors that are thrown in the endpoint handlers.</p>
<p>Add the following code to <code>server/index.js</code>.</p>
<pre><code class="language-javascript">// Log all requests
app.use(async (ctx, next) =&gt; {
  const start = Date.now()
  await next() // This will pause this function until the endpoint handler has resolved
  const responseTime = Date.now() - start
  log.info(`${ctx.method} ${ctx.status} ${ctx.url} - ${responseTime} ms`)
})

// Error Handler - All uncaught exceptions will percolate up to here
app.use(async (ctx, next) =&gt; {
  try {
    await next()
  } catch (err) {
    ctx.status = err.status || 500
    ctx.body = err.message
    log.error(`Request Error ${ctx.url} - ${err.message}`)
  }
})
</code></pre>
<br>
<p>Koa makes heavy use of async/await for handling the control flow of API request handlers.  If you are unclear on how this works, I would recommend reading these resources -</p>
<ul>
<li><a href="https://medium.com/ninjadevs/node-7-6-koa-2-asynchronous-flow-control-made-right-b0d41c6ba570">Node 7.6 + Koa 2: Asynchronous Flow Control Made Right</a></li>
<li><a href="https://github.com/koajs/koa">Koa Github Readme</a></li>
<li><a href="https://blog.patricktriest.com/what-is-async-await-why-should-you-care/">Async/Await Will Make Your Code Simpler</a></li>
</ul>
<h5 id="33addloggermodule">3.3 - Add Logger Module</h5>
<p>You might notice that we're using <code>log.info</code> and <code>log.error</code> instead of <code>console.log</code> in the above code.  In Node.js projects, it's really best to avoid using <code>console.log</code> on production servers, since it makes it difficult to monitor and retain application logs.  As an alternative, we'll define our own custom logging configuration using <a href="https://github.com/winstonjs/winston">winston</a>.</p>
<p>Add the following code to <code>server/logger.js</code>.</p>
<pre><code class="language-javascript">const winston = require('winston')
const path = require('path')

// Configure custom app-wide logger
module.exports = new winston.Logger({
  transports: [
    new (winston.transports.Console)(),
    new (winston.transports.File)({
      name: 'info-file',
      filename: path.resolve(__dirname, '../info.log'),
      level: 'info'
    }),
    new (winston.transports.File)({
      name: 'error-file',
      filename: path.resolve(__dirname, '../error.log'),
      level: 'error'
    })
  ]
})
</code></pre>
<br>
<p>Here we're just defining a small logger module using the <code>winston</code> package. The configuration will forward our application logs to two locations - the command line and the log files.  Having this centralized configuration will allow us to easily modify logging behavior (say, to forward logs to an ELK server) when transitioning from development to production.</p>
<h5 id="34definehelloworldendpoint">3.4 - Define &quot;Hello World&quot; Endpoint</h5>
<p>Now open up the <code>server/api.js</code> file and add the following imports.</p>
<pre><code class="language-javascript">const Router = require('koa-router')
const database = require('./database')
const cache = require('./cache')
const joi = require('joi')
const validate = require('koa-joi-validate')
</code></pre>
<br>
<p>In this step, all we really care about is the <code>koa-router</code> module.</p>
<p>Below the imports, initialize a new API router.</p>
<pre><code class="language-javascript">const router = new Router()
</code></pre>
<br>
<p>Now add a simple &quot;Hello World&quot; endpoint.</p>
<pre><code class="language-javascript">// Hello World Test Endpoint
router.get('/hello', async ctx =&gt; {
  ctx.body = 'Hello World'
})
</code></pre>
<br>
<p>Finally, export the router at the bottom of the file.</p>
<pre><code class="language-javascript">module.exports = router
</code></pre>
<br>
<h5 id="35startserver">3.5 - Start Server</h5>
<p>Now we can mount the endpoint route(s) and start the server.</p>
<p>Add the following at the end of <code>server/index.js</code>.</p>
<pre><code class="language-javascript">// Mount routes
app.use(api.routes(), api.allowedMethods())

// Start the app
app.listen(port, () =&gt; { log.info(`Server listening at port ${port}`) })
</code></pre>
<br>
<h5 id="36testtheserver">3.6 - Test The Server</h5>
<p>Try starting the server with <code>npm start</code>.  You should see the output <code>Server listening at port 5000</code>.</p>
<p>Now try opening <code>http://localhost:5000/hello</code> in your browser.  You should see a &quot;Hello World&quot; message in the browser, and a request log on the command line.  Great, we now have totally useless API server.  Time to add some database queries.</p>
<h3 id="step4addbasicpostgresintegraton">Step 4 - Add Basic Postgres Integraton</h3>
<h5 id="40connecttopostgres">4.0 - Connect to Postgres</h5>
<p>Now that our API server is running, we'll want to connect to our Postgres database in order to actually serve data.  In the <code>server/database.js</code> file, we'll add the following code to connect to our database based on the defined environment variables.</p>
<pre><code class="language-javascript">const postgres = require('pg')
const log = require('./logger')
const connectionString = process.env.DATABASE_URL

// Initialize postgres client
const client = new postgres.Client({ connectionString })

// Connect to the DB
client.connect().then(() =&gt; {
  log.info(`Connected To ${client.database} at ${client.host}:${client.port}`)
}).catch(log.error)
</code></pre>
<br>
<p>Try starting the server again with <code>npm start</code>. You should now see an additional line of output.</p>
<pre><code class="language-bash">info: Server listening at 5000
info: Connected To atlas_of_thrones at localhost:5432
</code></pre>
<br>
<h5 id="41addbasicnowquery">4.1 - Add Basic &quot;NOW&quot; Query</h5>
<p>Now let's add a basic query test to make sure that our database and API server are communicating correctly.</p>
<p>In <code>server/database.js</code>, add the following code at the bottom -</p>
<pre><code class="language-javascript">module.exports = {
  /** Query the current time */
  queryTime: async () =&gt; {
    const result = await client.query('SELECT NOW() as now')
    return result.rows[0]
  }
}
</code></pre>
<br>
<p>This will perform one the simplest possible queries (besides <code>SELECT 1;</code>) on our Postgres database: retrieving the current time.</p>
<h5 id="42connecttimequerytoanapiroute">4.2 - Connect Time Query To An API Route</h5>
<p>In <code>server/api.js</code> add the following route below our &quot;Hello World&quot; route.</p>
<pre><code class="language-javascript">// Get time from DB
router.get('/time', async ctx =&gt; {
  const result = await database.queryTime()
  ctx.body = result
})
</code></pre>
<br>
<p>Now, we've defined a new endpoint, <code>/time</code>, which will call our time Postgres query and return the result.</p>
<p>Run <code>npm start</code> and visit <code>http:/localhost:5000/time</code> in the browser.  You should see a JSON object containing the current UTC time.  Ok cool, we're now serving information from Postgres over our API.  The server is still a bit boring and useless though, so let's move on to the next step.</p>
<h3 id="step5addgeojsonendpoints">Step 5 - Add Geojson Endpoints</h3>
<p>Our end goal is to render our &quot;Game of Thrones&quot; dataset on a map.  To do so, we'll need to serve our data in a web-map friendly format: <a href="http://geojson.org/">GeoJSON</a>.  GeoJSON is a JSON specification (<a href="https://tools.ietf.org/html/rfc7946">RFC 7946</a>), which will format geographic coordinates and polygons in a way that can be natively understood by browser-based map rendering tools.</p>
<blockquote>
<p>Note - If you want to minimize payload size, you could convert the GeoJSON results to <a href="https://github.com/topojson/topojson">TopoJSON</a>, a newer format that is able to represent shapes more efficiently by eliminating redundancy.  Our GeoJSON results are not prohibitively large (around 50kb for all of the Kingdom shapes, and less than 5kb for each set of location types), so we won't bother with that in this tutorial.</p>
</blockquote>
<h5 id="50addgeojsonqueries">5.0 - Add GeoJSON Queries</h5>
<p>In the <code>server/database.js</code> file, add the following functions under the <code>queryTime</code> function, inside the <code>module.exports</code> block.</p>
<pre><code class="language-javascript">/** Query the locations as geojson, for a given type */
getLocations: async (type) =&gt; {
  const locationQuery = `
    SELECT ST_AsGeoJSON(geog), name, type, gid
    FROM locations
    WHERE UPPER(type) = UPPER($1);`
  const result = await client.query(locationQuery, [ type ])
  return result.rows
},

/** Query the kingdom boundaries */
getKingdomBoundaries: async () =&gt; {
  const boundaryQuery = `
    SELECT ST_AsGeoJSON(geog), name, gid
    FROM kingdoms;`
  const result = await client.query(boundaryQuery)
  return result.rows
}
</code></pre>
  <br>
<p>Here, we are using the <code>ST_AsGeoJSON</code> function from PostGIS in order to convert the polygons and coordinate points to browser-friendly GeoJSON.  We are also retrieving the name and id for each entry.</p>
<blockquote>
<p>Note that in the location query, we are not directly appending the provided type to the query string.  Instead, we're using <code>$1</code> as a placeholder in the query string and passing the type as a parameter to the <code>client.query</code> call.  This is important since it will allow the Postgres to sanitize the &quot;type&quot; input and prevent SQL injection attacks.</p>
</blockquote>
<h5 id="51addgeojsonendpoint">5.1 - Add GeoJSON Endpoint</h5>
<p>In the <code>server/api.js</code> file, declare the following endpoints.</p>
<pre><code class="language-javascript">router.get('/locations/:type', async ctx =&gt; {
  const type = ctx.params.type
  const results = await database.getLocations(type)
  if (results.length === 0) { ctx.throw(404) }

  // Add row metadata as geojson properties
  const locations = results.map((row) =&gt; {
    let geojson = JSON.parse(row.st_asgeojson)
    geojson.properties = { name: row.name, type: row.type, id: row.gid }
    return geojson
  })

  ctx.body = locations
})

// Respond with boundary geojson for all kingdoms
router.get('/kingdoms', async ctx =&gt; {
  const results = await database.getKingdomBoundaries()
  if (results.length === 0) { ctx.throw(404) }

  // Add row metadata as geojson properties
  const boundaries = results.map((row) =&gt; {
    let geojson = JSON.parse(row.st_asgeojson)
    geojson.properties = { name: row.name, id: row.gid }
    return geojson
  })

  ctx.body = boundaries
})
</code></pre>
<br>
<p>Here, we are are executing the corresponding Postgres queries and awaiting each response.  We are then mapping over each result row to add the entity metadata as GeoJSON properties.</p>
<h5 id="52testthegeojsonendpoints">5.2 - Test the GeoJSON Endpoints</h5>
<p>I've deployed a very simple HTML page <a href="https://cdn.patricktriest.com/atlas-of-thrones/geojsonpreview.html">here</a> to test out the GeoJSON responses using <a href="https://github.com/Leaflet/Leaflet">Leaflet</a>.</p>
<p>In order to provide a background for the GeoJSON data, the test page loads a sweet &quot;Game of Thrones&quot; basemap produced by <a href="https://carto.com/blog/game-of-thrones-basemap/">Carto</a>.  This simple HTML page is also included in the starter project, in the <code>geojsonpreview</code> directory.</p>
<p>Start the server (<code>npm start</code>) and open <code>http://localhost:5000/kingdoms</code> in your browser to download the kingdom boundary GeoJSON.  Paste the response into the textbox in the &quot;geojsonpreview&quot; web app, and you should see an outline of each kingdom.  Clicking on each kingdom will reveal the geojson properties for that polygon.</p>
<p>Now try the adding the GeoJSON from the location type endpoint - <code>http://localhost:5000/locations/castle</code></p>
<p>Pretty cool, huh?</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/geojson_preview.jpg" alt="Build An Interactive Game of Thrones Map (Part I) - Node.js,  PostGIS, and Redis"></p>
<blockquote>
<p>If your interested in learning more about rendering these GeoJSON results, be sure to check back next week for part II of this tutorial, where we'll be building out the webapp using our API - <a href="https://atlasofthrones.com/">https://atlasofthrones.com/</a></p>
</blockquote>
<h3 id="step6advancedpostgisqueries">Step 6 - Advanced PostGIS Queries</h3>
<p>Now that we have a basic GeoJSON service running, let's play with some of the more interesting capabilities of PostgreSQL and PostGIS.</p>
<h4 id="60calculatekingdomsizes">6.0 - Calculate Kingdom Sizes</h4>
<p>PostGIS has a function called <code>ST_AREA</code> that can be used to calculate the total area covered by a polygon.  Let's add a new query to calculate the total area for each kingdom of Westeros.</p>
<p>Add the following function to the <code>module.exports</code> block in <code>server/database.js</code>.</p>
<pre><code class="language-javascript">/** Calculate the area of a given region, by id */
getRegionSize: async (id) =&gt; {
  const sizeQuery = `
      SELECT ST_AREA(geog) as size
      FROM kingdoms
      WHERE gid = $1
      LIMIT(1);`
  const result = await client.query(sizeQuery, [ id ])
  return result.rows[0]
},
</code></pre>
<br>
<p>Next, add an endpoint in <code>server/api.js</code> to execute this query.</p>
<pre><code class="language-javascript">// Respond with calculated area of kingdom, by id
router.get('/kingdoms/:id/size', async ctx =&gt; {
  const id = ctx.params.id
  const result = await database.getRegionSize(id)
  if (!result) { ctx.throw(404) }

  // Convert response (in square meters) to square kilometers
  const sqKm = result.size * (10 ** -6)
  ctx.body = sqKm
})
</code></pre>
<br>
<blockquote>
<p>We know that the resulting units are in square meters because the geography data was originally loaded into Postgres using an EPSG:4326 coordinate system.</p>
</blockquote>
<p>While the computation is mathematically sound, we are performing this operation on a fictional landscape, so the resulting value is an estimate at best.  These computations put the entire continent of Westeros at about 9.5 million square kilometers, which actually sounds about right compared to Europe, which is 10.18 million square kilometers.</p>
<p>Now you can call, say, <code>http://localhost:5000/kingdoms/1/size</code> to get the size of a kingdom (in this case &quot;The Riverlands&quot;) in square kilometers.  You can refer to the table from step 1.3 to link each kingdom with their respective id.</p>
<h4 id="61countcastlesineachkingdom">6.1 - Count Castles In Each Kingdom</h4>
<p>Using PostgreSQL and PostGIS, we can even perform geospatial joins on our dataset!</p>
<blockquote>
<p>In SQL terminology, a JOIN is when you combine columns from more than one table in a single result.</p>
</blockquote>
<p>For instance, let's create a query to count the number of castles in each kingdom.  Add the following query function to our <code>server/database.js</code> module.</p>
<pre><code class="language-javascript">/** Count the number of castles in a region, by id */
countCastles: async (regionId) =&gt; {
  const countQuery = `
    SELECT count(*)
    FROM kingdoms, locations
    WHERE ST_intersects(kingdoms.geog, locations.geog)
    AND kingdoms.gid = $1
    AND locations.type = 'Castle';`
  const result = await client.query(countQuery, [ regionId ])
  return result.rows[0]
},
</code></pre>
<br>
<p>Easy!  Here we're using <code>ST_intersects</code>, a PostGIS function to find interections in the geometries.  The result will be the number of locations coordinates of type <code>Castle</code> that intersect with the specified kingdom boundaries polygon.</p>
<p>Now we can add an API endpoint to <code>/server/api.js</code> in order to return the results of this query.</p>
<pre><code class="language-javascript">// Respond with number of castle in kingdom, by id
router.get('/kingdoms/:id/castles', async ctx =&gt; {
  const regionId = ctx.params.id
  const result = await database.countCastles(regionId)
  ctx.body = result ? result.count : ctx.throw(404)
})
</code></pre>
<br>
<p>If you try out <code>http://localhost:5000/kingdoms/1/castles</code> you should see the number of castles in the specified kingdom.  In this case, it appears the &quot;The Riverlands&quot; contains eleven castles.</p>
<h3 id="step7inputvalidation">Step 7 - Input Validation</h3>
<p>We've been having so much fun playing with PostGIS queries that we've forgotten an essential part of building an API - Input Validation!</p>
<p>For instance, if we pass an invalid ID to our endpoint, such as <code>http://localhost:5000/kingdoms/gondor/castles</code>, the query will reach the database before it's rejected, resulting in a thrown error and an HTTP 500 response.  Not good!</p>
<p>A naive approach to this issue would have us manually checking each query parameter at the beginning of each endpoint handler, but that's tedious and difficult to keep consistent across multiple endpoints, let alone across a larger team.</p>
<p><a href="https://github.com/hapijs/joi">Joi</a> is a fantastic library for validating Javascript objects.  It is often paired with the <a href="https://github.com/hapijs/hapi">Hapi.js</a> framework, since it was built by the Hapi.js team.  Joi is framework agnostic, however, so we can use it in our Koa app without issue.</p>
<p>We'll use the <a href="https://www.npmjs.com/package/koa-joi-validate">koa-joi-validate</a> NPM package to generate input validation middleware.</p>
<blockquote>
<p>Disclaimer - I'm the author of <code>koa-joi-validate</code>.  It's a very short module that was built for use in some of my own projects.  If you don't trust me, feel free to just copy the code into your own project - it's only about 50 lines total, and <code>Joi</code> is the only dependency (<a href="https://github.com/triestpa/koa-joi-validate/blob/master/index.js">https://github.com/triestpa/koa-joi-validate/blob/master/index.js</a>).</p>
</blockquote>
<p>In <code>server/api.js</code>, above our API endpoint handlers, we'll define two input validation functions - one for validating IDs, and one for validating location types.</p>
<pre><code class="language-javascript">// Check that id param is valid number
const idValidator = validate({
  params: { id: joi.number().min(0).max(1000).required() }
})

// Check that query param is valid location type
const typeValidator = validate({
  params: { type: joi.string().valid(['castle', 'city', 'town', 'ruin', 'landmark', 'region']).required() }
})
</code></pre>
<br>
<p>Now, with our validators defined, we can use them as middleware to each route in which we need to parse URL parameter input.</p>
<pre><code class="language-javascript">router.get('/locations/:type', typeValidator, async ctx =&gt; {
...
}

router.get('/kingdoms/:id/castles', idValidator, async ctx =&gt; {
...
}

router.get('/kingdoms/:id/size', idValidator, async ctx =&gt; {
...
}
</code></pre>
<br>
<p>Ok great, problem solved.  Now if we try to pull any sneaky <code>http://localhost:5000/locations/;DROP%20TABLE%20LOCATIONS;</code> shenanigans the request will be automatically rejected with an HTTP 400 &quot;Bad Request&quot; response before it even hits our endpoint handler.</p>
<h3 id="step8retrievingsummarydata">Step 8 - Retrieving Summary Data</h3>
<p>Let's add one more set of endpoints now, to retrieve the summary data and wiki URLs for each kingdom/location.</p>
<h5 id="80addsummarypostgresqueries">8.0 - Add Summary Postgres Queries</h5>
<p>Add the following query function to the <code>module.exports</code> block in <code>server/database.js</code>.</p>
<pre><code class="language-javascript">/** Get the summary for a location or region, by id */
getSummary: async (table, id) =&gt; {
  if (table !== 'kingdoms' &amp;&amp; table !== 'locations') {
    throw new Error(`Invalid Table - ${table}`)
  }

  const summaryQuery = `
      SELECT summary, url
      FROM ${table}
      WHERE gid = $1
      LIMIT(1);`
  const result = await client.query(summaryQuery, [ id ])
  return result.rows[0]
}
</code></pre>
<br>
<p>Here we're taking the table name as a function parameter, which will allow us to reuse the function for both tables.  This is a bit dangerous, so we'll make sure it's an expected table name before appending it to the query string.</p>
<h5 id="81addsummaryapiroutes">8.1 - Add Summary API Routes</h5>
<p>In <code>server/api.js</code>, we'll add endpoints to retrieve this summary data.</p>
<pre><code class="language-javascript">// Respond with summary of kingdom, by id
router.get('/kingdoms/:id/summary', idValidator, async ctx =&gt; {
  const id = ctx.params.id
  const result = await database.getSummary('kingdoms', id)
  ctx.body = result || ctx.throw(404)
})

// Respond with summary of location , by id
router.get('/locations/:id/summary', idValidator, async ctx =&gt; {
  const id = ctx.params.id
  const result = await database.getSummary('locations', id)
  ctx.body = result || ctx.throw(404)
})
</code></pre>
<br>
<p>Ok cool, that was pretty straightforward.</p>
<p>We can test out the new endpoints with, say, <code>localhost:5000/locations/1/summary</code>, which should return a JSON object containing a summary string, and the URL of the wiki article that it was scraped from.</p>
<h3 id="step9integrateredis">Step 9 - Integrate Redis</h3>
<p>Now that all of the endpoints and queries are in place, we'll add a request cache using Redis to make our API super fast and efficient.</p>
<h5 id="90doweactuallyneedredis">9.0 - Do We Actually Need Redis?</h5>
<p>No, not really.</p>
<p>So here's what happened - The project was originally hitting the Mediawiki APIs directly for each location summary, which was taking around 2000-3000 milliseconds per request.  In order to speed up the summary endpoints, and to avoid overloading the wiki API, I added a Redis cache to the project in order to save the summary data responses after each Mediawiki api call.</p>
<p>Since then, however, I've scraped all of the summary data from the wikis and added it directly to the database.  Now that the summaries are stored directly in Postgres, the Redis cache is much less necessary.</p>
<p>Redis is probably overkill here since we won't really be taking advantage of its ultra-fast write speeds, ACID compliance, and other useful features (like being able to set expiry dates on key entries).  Additionally, Postgres has its own in-memory query cache, so using Redis won't even be <em>that</em> much faster.</p>
<p>Despite this, we'll throw it into our project anyway since it's easy, fun, and will hopefully provide a good introduction to using Redis in a Node.js project.</p>
<h5 id="91addcachemodule">9.1 - Add Cache Module</h5>
<p>First, we'll add a new module to connect with Redis, and to define two helper middleware functions.</p>
<p>Add the following code to <code>server/cache.js</code>.</p>
<pre><code class="language-javascript">const Redis = require('ioredis')
const redis = new Redis(process.env.REDIS_PORT, process.env.REDIS_HOST)

module.exports = {
  /** Koa middleware function to check cache before continuing to any endpoint handlers */
  async checkResponseCache (ctx, next) {
    const cachedResponse = await redis.get(ctx.path)
    if (cachedResponse) { // If cache hit
      ctx.body = JSON.parse(cachedResponse) // return the cached response
    } else {
      await next() // only continue if result not in cache
    }
  },
  /** Koa middleware function to insert response into cache */
  async addResponseToCache (ctx, next) {
    await next() // Wait until other handlers have finished
    if (ctx.body &amp;&amp; ctx.status === 200) { // If request was successful
      // Cache the response
      await redis.set(ctx.path, JSON.stringify(ctx.body))
    }
  }
}
</code></pre>
<br>
<p>The first middleware function (<code>checkResponseCache</code>) here will check the cache for the request path (<code>/kingdoms/5/size</code>, for example) before continuing to the endpoint handler.  If there is a cache hit, the cached response will be returned immediately, and the endpoint handler will not be called.</p>
<p>The second middleware function (<code>addResponseToCache</code>) will wait until the endpoint handler has completed, and will cache the response using the request path as a key.  This function will only ever be executed if the response is not yet in the cache.</p>
<h5 id="92applycachemiddleware">9.2 - Apply Cache Middleware</h5>
<p>At the beginning of <code>server/api.js</code>, right after <code>const router = new Router()</code>, apply the two cache middleware functions.</p>
<pre><code class="language-javascript">// Check cache before continuing to any endpoint handlers
router.use(cache.checkResponseCache)

// Insert response into cache once handlers have finished
router.use(cache.addResponseToCache)
</code></pre>
<br>
<p>That's it! Redis is now fully integrated into our app, and our response times should plunge down into the optimal 0-5 millisecond range for repeated requests.</p>
<blockquote>
<p>There's a famous adage among software engineers - &quot;There are only two hard things in Computer Science: cache invalidation and naming things.&quot; (credited to Phil Karlton).  In a more advanced application, we would have to worry about cache invalidation - or selectively removing entries from the cache in order to serve updated data.  Luckily for us, our API is read-only, so we never actually have to worry about updating the cache. Score!  If you use this technique in an app that is not read-only, keep in mind that Redis allows you to set the expiration timeout of entries using the &quot;SETEX&quot; command.</p>
</blockquote>
<h5 id="93rediscliprimer">9.3 - Redis-CLI Primer</h5>
<p>We can use the redis-cli to monitor the cache status and operations.</p>
<pre><code class="language-bash">redis-cli monitor
</code></pre>
<br>
<p>This command will provide a live-feed of Redis operations.  If we start making requests with a clean cache, we'll initially see lots of &quot;set&quot; commands, with resources being inserted in the cache.  On subsequent requests, most of the output will be &quot;get&quot; commands, since the responses will have already been cached.</p>
<p>We can get a list of cache entries with the <code>--scan</code> flag.</p>
<pre><code class="language-bash">redis-cli --scan | head -5
</code></pre>
<pre><code class="language-bash">/kingdoms/2/summary
/locations/294/summary
/locations/town
/kingdoms
/locations/region
</code></pre>
<br>
<p>To directly interact with our local Redis instance, we can launch the Redis shell by running <code>redis-cli</code>.</p>
<pre><code class="language-bash">redis-cli
</code></pre>
<br>
<p>We can use the <code>dbsize</code> command to check how many entries are currently cached.</p>
<pre><code class="language-bash">127.0.0.1:6379&gt; dbsize
</code></pre>
<pre><code class="language-bash">(integer) 15
</code></pre>
<br>
<p>We can preview a specific cache entry with the <code>GET</code> command.</p>
<pre><code class="language-bash">127.0.0.1:6379&gt; GET /kingdoms/2/summary
</code></pre>
<pre><code class="language-bash">&quot;{\&quot;summary\&quot;:\&quot;The Iron Islands is one of the constituent regions of the Seven Kingdoms. Until Aegons Conquest it was ruled by the Kings of the Iron ...}&quot;
</code></pre>
<br>
<p>Finally, if we want to completely clear the cache we can run the <code>FLUSHALL</code> command.</p>
<pre><code class="language-bash">128.127.0.0.1:6379&gt; FLUSHALL
</code></pre>
<br>
<p>Redis is a very powerful and flexible datastore, and can be used for much, much more than basic HTTP request caching.  I hope that this section has been a useful introduction to integrating Redis in a Node.js project. I would recommend that you read more about Redis if you want to learn the full extent of its capabilities - <a href="https://redis.io/topics/introduction">https://redis.io/topics/introduction</a>.</p>
<h3 id="nextupthemapui">Next up - The Map UI</h3>
<p>Congrats, you've just built a highly-performant geospatial data server!</p>
<p>There are lots of additions that can be made from here, the most obvious of which is building a frontend web application to display data from our API.</p>
<p><a href="https://blog.patricktriest.com/game-of-thrones-leaflet-webpack/">Part II</a> of this tutorial provides a step-by-step guide to building a fast, mobile-responsive &quot;Google Maps&quot; style UI for this data using <a href="https://github.com/Leaflet/Leaflet">Leaflet.js</a>.</p>
<p>For a preview of this end-result, check out the webapp here - <a href="https://atlasofthrones.com/">https://atlasofthrones.com/</a></p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/got_map/got_map.jpg" alt="Build An Interactive Game of Thrones Map (Part I) - Node.js,  PostGIS, and Redis"></p>
<p>Visit the open-source Github repository to explore the complete backend and frontend codebase - <a href="https://github.com/triestpa/Atlas-Of-Thrones">https://github.com/triestpa/Atlas-Of-Thrones</a></p>
<p>I hope this tutorial was informative and fun!  Feel free to comment below with any suggestions, criticisms, or ideas about where to take the app from here.</p>
</div>]]></content:encoded></item><item><title><![CDATA[10 Tips To Host Your Web Apps For Free]]></title><description><![CDATA[A guide to navigating of the competitive marketplace of web hosting companies and cloud service providers.]]></description><link>http://blog.patricktriest.com/host-webapps-free/</link><guid isPermaLink="false">598eaf93b7d6af1a6a795fcf</guid><category><![CDATA[Web Development]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Sun, 27 Aug 2017 12:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/webhosting-header.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h4 id="aguidetonavigatingofthecompetitivemarketplaceofcloudserviceproviders">A guide to navigating of the competitive marketplace of cloud service providers.</h4>
<img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/webhosting-header.jpg" alt="10 Tips To Host Your Web Apps For Free"><p>2017 is a great year to deploy a web app.</p>
<p>The landscape of web service providers is incredibly competitive right now, and almost all of them offer generous free plans as an attempt to acquire long-term customers.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/free-hosting-meme.jpg" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>This article is a collection of tips, from my own experience, on hosting high-performance web apps for free.  If you are experienced in deploying web apps, then you are probably already familiar with many of the services and techniques that we will cover, but I hope that you will still learn something new.  If you are a newcomer to web application deployment, I hope that this article will help to guide you to the best services and to avoid some of the potential pitfalls.</p>
<p><em>Note - I am not being paid or sponsored by any of these services.  This is just advice based on my experience at various organizations, and on how I host my own web applications.</em></p>
<blockquote>
<h4 id="staticfrontendwebsites">Static Front-End Websites</h4>
<p>The first 5 tips are for static websites.  These are self-contained websites, consisting of HTML, CSS, and Javascript files, that do not rely on custom server-side APIs or databases to function.</p>
</blockquote>
<h4 id="1avoidwebsitehostingcompanies">1. Avoid &quot;Website Hosting&quot; companies</h4>
<p>Thousands of website hosting companies compete to provide web services to non-technical customers and small businesses. These companies often place a priority on advertising/marketing over actually providing a great service; some examples include Bluehost, GoDaddy, HostGator, and iPage.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/webhosting.png" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>Almost all of these companies offer sub-par shared-hosting deals with deceptive pricing models.  The pricing plans are usually not a good value, and you can achieve better results for free (or for very, very cheap) by using the tools described later in this post.</p>
<p>These services are only good options for people who want the least-technical experience possible, and who are willing to pay 10-1000x as much per month in exchange for a marginally simpler setup experience.</p>
<p>Many of these companies have highly-polished homepages offering aggressively-discounted &quot;80% off for the first 12 Months&quot; types of deals. They will then make it difficult to remove payment methods and/or cancel the plan, and will automatically charge you $200-$400 dollars for an automatic upgrade to 12-24 months of the &quot;premium plan&quot; a year later.  This is how these companies make their money, don't fall for it.</p>
<h4 id="2donthostonyourownhardwareunlessyoureallyknowwhatyouredoing">2. Don't host on your own hardware (unless you really know what you're doing)</h4>
<p>Another option is to host the website on your personal computer. This is a really a <em>Very Bad Idea</em>.  Your computer will be slow, your website will be unreliable, and your personal computer (and entire home network) will probably get hacked.  Not good.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/gatsby-hacked.jpg" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>You could also buy your own server hardware dedicated to hosting the website.  In order to do this, however, you'll need a solid understanding of network hardware and software, a blazing-fast internet connection, and a reliable power supply.  Even then, you still might be opening up your home network to security risks, the upfront costs could be significant, and the site will still likely never be as fast as it would be if hosted in an enterprise data center.</p>
<h4 id="3usegithubpagesforstaticwebsitehosting">3. Use GitHub pages for static website hosting</h4>
<p>Front-end project code on GitHub can be hosted using <a href="https://pages.github.com/">GitHub Pages</a>.  The biggest advantage here is that the hosting is 100% free, which is pretty sweet.  They also provide a GitHub pages subdomain (<code>yoursite.github.io</code>) hosted over HTTPS.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/gh-pages.png" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>The main disadvantage of this offering is in flexibility, or the lack thereof.</p>
<p>For an ultra-basic website, with an <code>index.html</code> file at the root of the project, a handful JS/CSS/image resources, and no build system, GitHub Pages works very well.</p>
<p>Larger projects, however, often have more complex directory layouts, such as a pre-built <code>src</code> directory containing the source code modules, a <code>node_modules</code> directory containing external dependencies, and a separate <code>public</code> directory containing the built website files.  These projects can be difficult to configure in order to work correctly with GitHub Pages since it is configured to serve from the root of the repository.</p>
<p>It is possible to have a GH pages site only serve from, say, the project's <code>public</code> or <code>dist</code> subdirectory, but it requires setting up a git subtree for that directory prefix, which can be a bit complex.  For more advanced projects, I've found that using a cloud storage service is generally simpler and provides greater flexibility.</p>
<h4 id="4usecloudstorageservicesforstaticwebsitehosting">4. Use cloud storage services for static website hosting</h4>
<p><a href="https://aws.amazon.com/s3/pricing/">AWS S3</a>, <a href="https://azure.microsoft.com/en-us/services/storage/">Microsoft Azure Storage</a>, and <a href="https://cloud.google.com/storage/">Google Cloud Storage</a> are ultra-cheap, ultra-fast, ultra-reliable file storage services.  These products are commonly used by corporations to archive massive collections of data and media, but you can also host a website on them for very, very cheap.</p>
<p><strong>These are the best options for hosting a static website, in my opinion.</strong></p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/cloudhosting.png" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>These services allow you to upload files to &quot;storage buckets&quot; (think enterprise-friendly Dropbox). You can then to make the bucket contents publicly available (for read access) to the rest of the internet, allowing you to serve the bucket contents as a website.</p>
<p>Here are tutorials for how to do this with each service -</p>
<ul>
<li><a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html">Hosting a Static Website on Amazon S3</a></li>
<li><a href="https://cloud.google.com/storage/docs/hosting-static-website">Hosting a Static Website on Google Cloud Storage</a></li>
<li><a href="https://buildazure.com/2016/11/30/static-website-hosting-in-azure-storage/">Hosting a Static Website on Microsoft Azure</a></li>
</ul>
<p>The great thing about this setup (unlike the pricing models of &quot;web hosting&quot; companies such as Bluehost and Godaddy) is that <strong>you only pay for the storage and bandwidth that you use</strong>.</p>
<p>The resulting website will be <em>very</em> fast, scalable, and reliable, since it will be served from the same infrastructure that companies such as Netflix, Spotify, and Pinterest use for their own resources.</p>
<p>Here is a pricing breakdown <sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup><sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup><sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup> -</p>
<table>
  <tr>
    <th></th>    
    <th>AWS S3</th>
    <th>Google Cloud Storage</th>
    <th>Azure Storage</th>
  </tr>
  <tr>
    <td>File Storage per GB per month</td>
    <td>$0.023</td>
    <td>$0.026</td>
    <td>$0.024</td>
  </tr>
  <tr>
    <td>Data Transfer per GB</td>
    <td>$0.09</td>
    <td>$0.11</td>
    <td>$0.087</td>
  </tr>
</table>
<p><em>Note that pricing can vary by region.  Also, some of these services charge additional fees, such as for each HTTP GET request; see the official pricing pages in the footnotes for more details.</em></p>
<p>For most websites, these costs will come out to almost nothing, regardless of which service you choose.  The data storage costs will be totally negligible for any website, and the data transfer costs can be all-but-eliminated by serving the site from behind a CDN (see <a href="#tip-10">tip #10</a>).  Furthermore, you can leverage the free credits available for these services in order to host your static websites without paying a single dime (skip to <a href="#tip-5">tip #5</a> for more details).</p>
<p>If you need to host a site with lots of hi-res photo/video content, I would recommend storing your photos separately on a service such as <a href="https://imgur.com/">Imgur</a>, and embedding your videos from <a href="https://www.youtube.com/">Youtube</a> or <a href="https://vimeo.com/">Vimeo</a>.  This tactic will allow you to host lots of media without footing the bill for associated data transfer costs.</p>
<blockquote>
<h4 id="dynamicwebapps">Dynamic Web Apps</h4>
<p>Now for the trickier part - cheaply hosting a web app that relies on a backend and/or database to function.  This includes most blogs (unless you use a <a href="https://davidwalsh.name/introduction-static-site-generators">static site generator</a>), as well as any website that requires users to log in and to submit/edit content.  Generally, it would cost at least $5 per month to rent a cloud compute instance for this purpose, but there are a few good ways to circumvent these fees.</p>
</blockquote>
<p><span id="tip-5"></span></p>
<h4 id="5leveragecloudhostingproviderfreeplans">5. Leverage cloud hosting provider free plans</h4>
<p>The most popular option for server-based apps are cloud hosting services, such as <a href="https://cloud.google.com/">Google Cloud Platform</a> (GCP), <a href="https://aws.amazon.com/">Amazon Web Services</a> (AWS), and <a href="https://azure.microsoft.com">Microsoft Azure</a>.  These services are in fierce competition with each other, and have so much capital and infrastructure available that they are willing to give away money and compute power in order to get users hooked on their respective platforms.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/freemoney.gif" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>Google Cloud Platform automatically provides $300 worth of credit to anyone who joins and allows you to run a small (f1-micro) server at no cost, indefinitely, along with providing a variety of other free-tier usage limits.  See here for more info - <a href="https://cloud.google.com/free/">https://cloud.google.com/free/</a></p>
<p>AWS offers very similar free-tier limits to GCP, allowing you to run 1 small compute instance (t2-micro) for free each month.  See here - <a href="https://aws.amazon.com/free/">https://aws.amazon.com/free/</a></p>
<p>Microsoft Azure offers $200 in free credit when you join, but this free credit expires after one month.  They also provide a free tier on their &quot;App Service&quot; offering, although this free tier is more limited than the equivalent offerings from AWS and GCP. See here - <a href="https://azure.microsoft.com/en-us/free">https://azure.microsoft.com/en-us/free</a></p>
<p>Personally, <strong>I would recommend GCP</strong> since their free plan is the most robust, and their web admin interface is the most polished and pleasant to work with.</p>
<blockquote>
<p>Note - If you are a student, Github offers a really fantastic pack of free stuff, <a href="https://education.github.com/pack">https://education.github.com/pack</a>, including $110 in AWS credits, $50 in <a href="https://www.digitalocean.com/">DigitalOcean</a> credits, and much more.</p>
</blockquote>
<h4 id="6useherokuforfreebackendapphosting">6. Use Heroku for free backend app hosting</h4>
<p><a href="https://www.heroku.com/">Heroku</a> also offers a free tier.  The difference with Heroku is that you can indefinitely <strong>run up to 100 backend apps at the same time for free</strong>.  Not only will they provide a server for your application code to run on, but there are also lots of free plugins to add databases and other external services to your application cluster.  It's worth noting that Heroku has a wonderful, developer focused user-experience compared to its competitors.</p>
<img style="max-height: 300px;" src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/heroku.png" alt="10 Tips To Host Your Web Apps For Free">
<p>There is, of course, a catch - <strong>you are limited to 1000 free app-hours per month</strong>.  This means that you'll only be able to run 1 app full-time for the entire month (730 hours).  Additionally, Heroku's free servers will &quot;sleep&quot; after 30 minutes of inactivity; the next time someone makes a request to the server it will take around 15-20 seconds to respond at first while it &quot;wakes up&quot;. The good news is that when servers are asleep, they don't count towards to monthly limit, so you could theoretically host 100 low-traffic apps on Heroku completely for free, and just let them wake up for the occasional visitor.</p>
<p>The <a href="https://www.heroku.com/pricing">Heroku free plan</a> is a great option for casual side projects, test environments, and low-traffic, non-critical applications.</p>
<blockquote>
<p><a href="https://zeit.co/now">Now</a>, from Zeit, is a similar service to Heroku, with a more minimalist focus.  It offers near-unlimited free hosting for Node.js and Docker based applications, along with a simple, developer-focused CLI tool.  You might want to check this out if you like Heroku, but don't need all of the Github integrations, CI tools, and plugin support.</p>
</blockquote>
<br>
<h4 id="7usefirebaseforappswithstraightforwardbackendrequirements">7. Use Firebase for apps with straightforward backend requirements</h4>
<p><a href="https://firebase.google.com/">Firebase</a> is Google's backend-as-a-service, and is the dominant entrant in this field at the moment.  Firebase provides a suite of backend services, such as database-storage, user authentication, client-side SDKs, and in-depth analytics and monitoring.  Firebase offers an unlimited-duration <a href="https://firebase.google.com/pricing/">free plan</a>, with usage limits on some of the features. Additionally, you can host your frontend website on Firebase for free, with up to 1GB of file storage and 10GB of data transfer per month.</p>
<img style="max-height: 300px;" src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/firebase.png" alt="10 Tips To Host Your Web Apps For Free">
<p>For applications that just allow users to log in and store/share data (such as a social networking app), Firebase can be a great choice.  For applications with more advanced backend requirements, such as complex database schemas or high-security user/organization authorization handling, writing a custom backend might be a simpler, more scalable solution than Firebase in the long-run.</p>
<p>Firebase offers &quot;Cloud Functions&quot; to write specific app logic and run custom jobs, but these functions are more limited in capability than running your own backend server (they can only be written using Node.js, for instance).  You can also use a &quot;Cloud Function&quot; style architecture without specifically using Firebase, as we'll see in the next section.</p>
<br>
<h4 id="8useaserverlessarchitecture">8. Use a serverless architecture</h4>
<p>Serverless architecture is an emerging paradigm for backed infrastructure design where instead of managing a full server to run your API code, you can run individual functions on-demand using services such as <a href="https://aws.amazon.com/lambda/">AWS Lambda</a>, <a href="https://cloud.google.com/functions/docs/">Google Cloud Functions</a>, or <a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview">Azure Functions</a>.</p>
<blockquote>
<p>Note - &quot;Serverless&quot; is a buzzy, somewhat misleading term; <strong>your application code still runs on servers</strong>, you just don't have to manage them.  Also, note that while the core application logic can be &quot;serverless&quot;, you'll probably still need to have a persistent server somewhere in order to host your application database.</p>
</blockquote>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/serverless.png" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>The advantage of these services is that instead of paying a fixed monthly fee for renting a compute instance in a datacenter (typically between $5 and $50 per month), you can &quot;pay-as-you-go&quot; based on the number of function calls that your application receives.</p>
<p>These services are priced by the number of function call requests per month -</p>
<table>
  <tr>
    <th></th>    
    <th>AWS Lambda</th>
    <th>Google Cloud Functions</th>
    <th>Azure Functions</th>
  </tr>
  <tr>
    <td>Free Requests Per Month</td>
    <td>1 million</td>
    <td>2 million</td>
    <td>1 million</td>
  </tr>
  <tr>
    <td>Price Per Million Requests</td>
    <td>$0.20</td>
    <td>$0.40</td>
    <td>$0.20</td>
  </tr>
</table>
<p>Each service also charges for the precise amount of CPU time used (rounded up to the nearest 100ms), but this pricing is a bit more complicated, so I'll just refer you their respective pricing pages.</p>
<ul>
<li><a href="https://aws.amazon.com/lambda/pricing/">AWS Lambda Pricing</a></li>
<li><a href="https://cloud.google.com/functions/">Google Cloud Compute Pricing</a></li>
<li><a href="https://azure.microsoft.com/en-us/pricing/details/functions/">Microsoft Azure Functions</a></li>
</ul>
<p>The quickest way to get started is to use the open-source <a href="https://github.com/serverless/serverless">Serverless Framework</a>, which provides an easy way to deploy Node.js, Python, Java, and Scala functions on either of the three services.</p>
<p>Serverless architecture has a lot of buzz right now, but <strong>I cannot personally vouch for how well it works in a production environment</strong>, so <em>caveat emptor</em>.</p>
<br>
<h4 id="9usedockertohostmultiplelowtrafficappsonasinglemachine">9. Use Docker to host multiple low-traffic apps on a single machine</h4>
<p>Sometimes you might have multiple backend applications to run, but each without a very demanding CPU or memory footprint.  In this situation, it can be an advantageous cost-cutting move to run all of the applications on the same machine instead of running each on a separate instance.  This can be difficult, however, if the projects have differing dependencies (say, one requires Node v6.9 and another requires Node v8.4), or need to be run on different operating system distributions.</p>
<img style="max-height: 300px;" src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/docker.png" alt="10 Tips To Host Your Web Apps For Free">
<p><a href="https://www.docker.com/">Docker</a> is a containerization engine that provides an elegant solution to these issues.  To make an application work with Docker, you can write a Dockerfile to include with the source code, specifying the base operating system and providing instructions to set up the project and dependencies.  The resulting Docker container can be run on any operating system, making it very easy to consistently manage development/production environments and to avoid conflicting dependencies.</p>
<p><a href="https://docs.docker.com/compose/">Docker-Compose</a> is a tool that allows you write a configuration file to run multiple Docker containers at once.  This makes it easy to run multiple lightweight applications, services, and database containers, all on the same system without needing to worry about conflicts.</p>
<p>Ports inside each container can be forwarded to ports on the host machine, so a simple reverse-proxy configuration (<a href="https://www.nginx.com/resources/wiki/">Nginx</a> is a dependable, well-tested option) is all that is required to mount each application port behind a specific subdomain or URL route in order to make them all accessible via HTTPS on the host machine.</p>
<p>I have personally used this setup for a few of my personal projects in the past (and the present); it can save a lot of time and money if you are willing to take the time to get familiar the tools (which can, admittedly, have a steep learning curve at first).</p>
<p><span id="tip-10"></span></p>
<h4 id="10usecloudflarefordnsmanagementandssl">10. Use Cloudflare for DNS management and SSL</h4>
<p>Once you have your website/server hosted, you'll need a way to point your domain name to your content and to serve your domain over HTTPS.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/cloudflare.png" alt="10 Tips To Host Your Web Apps For Free"></p>
<p><a href="https://www.cloudflare.com/">Cloudflare</a> is a domain management service backed by the likes of Google and Microsoft.  At its core, Cloudflare allows you to point your domain name (and subdomains) to your website server(s).  Beyond this basic functionality, however, it offers lots of free features that are hugely beneficial for anyone hosting a web app or API.</p>
<h6 id="benefit1security">Benefit 1 - Security</h6>
<p>Cloudflare will automatically protect your website from malicious traffic.  Their massive infrastructure provides protection from DDoS (Distributed Denial of Service) attacks, and their firewall will protect your site from a continuously updated list of threats that are detected throughout their network.</p>
<h5 id="benefit2speed">Benefit 2 - Speed</h5>
<p>Cloudflare will distribute your content quickly by sending it through a global CDN (content delivery network).  The benefit of a CDN is that when someone visits the site, the data will be sent to them from a data center in their geographic region instead of from halfway around the world, allowing the page to load quicker.</p>
<h5 id="benefit3datatransfercostsavings">Benefit 3 - Data Transfer Cost Savings</h5>
<p>An added benefit to using a CDN is that by sending the cached content from Cloudflare's servers, you can reduce the bandwidth (and therefore the costs) from wherever your website is being hosted from.  Cloudflare offers unlimited free bandwidth through their CDN.</p>
<h6 id="benefit4freessl">Benefit 4 - Free SSL</h6>
<p>Best of all, Cloudflare provides a free SSL certificate and automatically serves your website over HTTPS.  This is very important for security (seriously, don't deploy a website without HTTPS), and would usually require server-side technical setup and annual fees; I've never seen another company (besides <a href="https://letsencrypt.org/">Let's Encrypt</a>) offer it for free.</p>
<p>Cloudflare offers a somewhat absurd <a href="https://www.cloudflare.com/plans/">free plan</a>, with which you can apply all of these benefits to any number of domain names.</p>
<blockquote>
<p>A note on domain names - I don't know of any way to score a free domain name, so you might have to pay a registration fee and an annual fee.  It's usually around $10/year, but you can get the first year for $1 or even for free if you shop around for deals.  As an alternative, services like Heroku and GitHub can host your site behind their own custom subdomains for free, but you'll lose some brand flexibility with this option.  I recommend buying a single domain (such as <a href="http://patricktriest.com">patricktriest.com</a>) and deploying your apps for free on subdomains (such as <a href="http://blog.patricktriest.com">blog.patricktriest.com</a>) using Cloudflare.</p>
</blockquote>
<br>
<h3 id="wantmorefreestuff">Want more free stuff?</h3>
<p>In my workflow, I also use <a href="https://github.com/">Github</a> to store source code and <a href="https://circleci.com/">CircleCI</a> to automate my application build/test/deployment processes. Both are completely free, of course, until you need more advanced, enterprise friendly capabilities.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/cheap-web-hosting/free-stuff.jpg" alt="10 Tips To Host Your Web Apps For Free"></p>
<p>If you need some beautiful free images check out <a href="https://www.pexels.com/">Pexels</a> and <a href="https://unsplash.com/">Unsplash</a>, and for great icons, I would recommend <a href="http://fontawesome.io/">Font Awesome</a> and <a href="https://thenounproject.com/">The Noun Project</a>.</p>
<p>2017 is a great year to deploy a web app.  If your app is a huge success, you can expect the costs to go up proportionally with the amount of traffic it receives, but with a well-optimized codebase and a scalable deployment setup, these costs can still be bounded within a very manageable range.</p>
<p>I hope that this post has been useful!  Feel free to comment below with any other techniques and tactics for obtaining cheap web application hosting.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p><a href="https://cloud.google.com/storage/pricing#transfer-service-pricing">https://cloud.google.com/storage/pricing#transfer-service-pricing</a> <a href="#fnref1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p><a href="https://aws.amazon.com/s3/pricing/">https://aws.amazon.com/s3/pricing/</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn3" class="footnote-item"><p><a href="https://azure.microsoft.com/en-us/pricing/details/storage/blobs-general/">https://azure.microsoft.com/en-us/pricing/details/storage/blobs-general/</a> <a href="#fnref3" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</div>]]></content:encoded></item><item><title><![CDATA[Analyzing Cryptocurrency Markets Using Python]]></title><description><![CDATA[A data-driven approach to cryptocurrency (Bitcoin, Ethereum, Litecoin, Ripple etc.) market analysis and visualization using Python.]]></description><link>http://blog.patricktriest.com/analyzing-cryptocurrencies-python/</link><guid isPermaLink="false">598eaf94b7d6af1a6a795fd6</guid><category><![CDATA[Python]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Sun, 20 Aug 2017 12:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/crypto_header.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h2 id="adatadrivenapproachtocryptocurrencyspeculation">A Data-Driven Approach To Cryptocurrency Speculation</h2>
<img src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/crypto_header.jpg" alt="Analyzing Cryptocurrency Markets Using Python"><p><em>How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values?  Are the markets for different altcoins inseparably linked or largely independent?  <strong>How can we predict what will happen next?</strong></em></p>
<p>Articles on cryptocurrencies, such as Bitcoin and Ethereum, are rife with speculation these days, with hundreds of self-proclaimed experts advocating for the trends that they expect to emerge.  What is lacking from many of these analyses is a strong foundation of data and statistics to backup the claims.</p>
<p>The goal of this article is to provide an easy introduction to cryptocurrency analysis using Python.  We will walk through a simple Python script to retrieve, analyze, and visualize data on different cryptocurrencies.  In the process, we will uncover an interesting trend in how these volatile markets behave, and how they are evolving.</p>
<img id="altcoin_prices_combined_0" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/altcoin_prices_combined.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>This is not a post explaining what cryptocurrencies are (if you want one, I would recommend <a href="https://medium.com/tradecraft-traction/blockchain-for-the-rest-of-us-c3fc5e42254f" target="_blank" rel="noopener">this great overview</a>), nor is it an opinion piece on which specific currencies will rise and which will fall.  Instead, all that we are concerned about in this tutorial is procuring the raw data and uncovering the stories hidden in the numbers.</p>
<h3 id="step1setupyourdatalaboratory">Step 1 - Setup Your Data Laboratory</h3>
<p>The tutorial is intended to be accessible for enthusiasts, engineers, and data scientists at all skill levels.  The only skills that you will need are a basic understanding of Python and enough knowledge of the command line to setup a project.</p>
<p>A completed version of the notebook with all of the results is available <a href="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/Cryptocurrency-Pricing-Analysis.html">here</a>.</p>
<h5 id="step11installanaconda">Step 1.1 - Install Anaconda</h5>
<p>The easiest way to install the dependencies for this project from scratch is to use Anaconda, a prepackaged Python data science ecosystem and dependency manager.</p>
<p>To setup Anaconda, I would recommend following the official installation instructions - <a href="https://www.continuum.io/downloads">https://www.continuum.io/downloads</a>.</p>
<p><em>If you're an advanced user, and you don't want to use Anaconda, that's totally fine; I'll assume you don't need help installing the required dependencies.  Feel free to skip to section 2.</em></p>
<h5 id="step12setupananacondaprojectenvironment">Step 1.2 - Setup an Anaconda Project Environment</h5>
<p>Once Anaconda is installed, we'll want to create a new environment to keep our dependencies organized.</p>
<p>Run <code>conda create --name cryptocurrency-analysis python=3</code> to create a new Anaconda environment for our project.</p>
<p>Next, run <code>source activate cryptocurrency-analysis</code> (on Linux/macOS) or <code>activate cryptocurrency-analysis</code> (on windows) to activate this environment.</p>
<p>Finally, run <code>conda install numpy pandas nb_conda jupyter plotly quandl</code> to install the required dependencies in the environment.  This could take a few minutes to complete.</p>
<p><em>Why use environments?  If you plan on developing multiple Python projects on your computer, it is helpful to keep the dependencies (software libraries and packages) separate in order to avoid conflicts.  Anaconda will create a special environment directory for the dependencies for each project to keep everything organized and separated.</em></p>
<h5 id="step13startaninterativejupyternotebook">Step 1.3 - Start An Interative Jupyter Notebook</h5>
<p>Once the environment and dependencies are all set up, run <code>jupyter notebook</code> to start the iPython kernel, and open your browser to <code>http://localhost:8888/</code>.  Create a new Python notebook, making sure to use the <code>Python [conda env:cryptocurrency-analysis]</code> kernel.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/jupyter-setup.png" alt="Analyzing Cryptocurrency Markets Using Python"></p>
<h5 id="step14importthedependenciesatthetopofthenotebook">Step 1.4 - Import the Dependencies At The Top of The Notebook</h5>
<p>Once you've got a blank Jupyter notebook open, the first thing we'll do is import the required dependencies.</p>
<pre><code class="language-python">import os
import numpy as np
import pandas as pd
import pickle
import quandl
from datetime import datetime
</code></pre>
<br>
<p>We'll also import Plotly and enable the offline mode.</p>
<pre><code class="language-python">import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
py.init_notebook_mode(connected=True)
</code></pre>
<br>
<h3 id="step2retrievebitcoinpricingdata">Step 2 - Retrieve Bitcoin Pricing Data</h3>
<p>Now that everything is set up, we're ready to start retrieving data for analysis.  First, we need to get Bitcoin pricing data using <a href="https://blog.quandl.com/api-for-bitcoin-data">Quandl's free Bitcoin API</a>.</p>
<h5 id="step21definequandlhelperfunction">Step 2.1 - Define Quandl Helper Function</h5>
<p>To assist with this data retrieval we'll define a function to download and cache datasets from Quandl.</p>
<pre><code class="language-python">def get_quandl_data(quandl_id):
    '''Download and cache Quandl dataseries'''
    cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
    try:
        f = open(cache_path, 'rb')
        df = pickle.load(f)   
        print('Loaded {} from cache'.format(quandl_id))
    except (OSError, IOError) as e:
        print('Downloading {} from Quandl'.format(quandl_id))
        df = quandl.get(quandl_id, returns=&quot;pandas&quot;)
        df.to_pickle(cache_path)
        print('Cached {} at {}'.format(quandl_id, cache_path))
    return df
</code></pre>
<p>We're using <code>pickle</code> to serialize and save the downloaded data as a file, which will prevent our script from re-downloading the same data each time we run the script.  The function will return the data as a <a href="http://blog.patricktriest.com/analyzing-cryptocurrencies-python/'http://pandas.pydata.org/'">Pandas</a> dataframe.  If you're not familiar with dataframes, you can think of them as super-powered spreadsheets.</p>
<h5 id="step22pullkrakenexchangepricingdata">Step 2.2 - Pull Kraken Exchange Pricing Data</h5>
<p>Let's first pull the historical Bitcoin exchange rate for the <a href="https://www.kraken.com/">Kraken</a> Bitcoin exchange.</p>
<pre><code class="language-python"># Pull Kraken BTC price exchange data
btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')
</code></pre>
<br>
<p>We can inspect the first 5 rows of the dataframe using the <code>head()</code> method.</p>
<pre><code class="language-python">btc_usd_price_kraken.head()
</code></pre>
<div class="dataframe">
<table border="1">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Open</th>
      <th>High</th>
      <th>Low</th>
      <th>Close</th>
      <th>Volume (BTC)</th>
      <th>Volume (Currency)</th>
      <th>Weighted Price</th>
    </tr>
    <tr>
      <th>Date</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2014-01-07</th>
      <td>874.67040</td>
      <td>892.06753</td>
      <td>810.00000</td>
      <td>810.00000</td>
      <td>15.622378</td>
      <td>13151.472844</td>
      <td>841.835522</td>
    </tr>
    <tr>
      <th>2014-01-08</th>
      <td>810.00000</td>
      <td>899.84281</td>
      <td>788.00000</td>
      <td>824.98287</td>
      <td>19.182756</td>
      <td>16097.329584</td>
      <td>839.156269</td>
    </tr>
    <tr>
      <th>2014-01-09</th>
      <td>825.56345</td>
      <td>870.00000</td>
      <td>807.42084</td>
      <td>841.86934</td>
      <td>8.158335</td>
      <td>6784.249982</td>
      <td>831.572913</td>
    </tr>
    <tr>
      <th>2014-01-10</th>
      <td>839.99000</td>
      <td>857.34056</td>
      <td>817.00000</td>
      <td>857.33056</td>
      <td>8.024510</td>
      <td>6780.220188</td>
      <td>844.938794</td>
    </tr>
    <tr>
      <th>2014-01-11</th>
      <td>858.20000</td>
      <td>918.05471</td>
      <td>857.16554</td>
      <td>899.84105</td>
      <td>18.748285</td>
      <td>16698.566929</td>
      <td>890.671709</td>
    </tr>
  </tbody>
</table>
</div>
<p>Next, we'll generate a simple chart as a quick visual verification that the data looks correct.</p>
<pre><code class="language-python"># Chart the BTC pricing data
btc_trace = go.Scatter(x=btc_usd_price_kraken.index, y=btc_usd_price_kraken['Weighted Price'])
py.iplot([btc_trace])
</code></pre>
<img id="kraken_price_plot" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/kraken_price_plot.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>Here, we're using <a href="https://plot.ly/">Plotly</a> for generating our visualizations.  This is a less traditional choice than some of the more established Python data visualization libraries such as <a href="https://matplotlib.org/">Matplotlib</a>, but I think Plotly is a great choice since it produces fully-interactive charts using <a href="https://d3js.org/">D3.js</a>.  These charts have attractive visual defaults, are easy to explore, and are very simple to embed in web pages.</p>
<blockquote>
<p>As a quick sanity check, you should compare the generated chart with publicly available graphs on Bitcoin prices(such as those on <a href="https://www.coinbase.com/dashboard">Coinbase</a>), to verify that the downloaded data is legit.</p>
</blockquote>
<h5 id="step23pullpricingdatafrommorebtcexchanges">Step 2.3 - Pull Pricing Data From More BTC Exchanges</h5>
<p>You might have noticed a hitch in this dataset - there are a few notable down-spikes, particularly in late 2014 and early 2016.  These spikes are specific to the Kraken dataset, and we obviously don't want them to be reflected in our overall pricing analysis.</p>
<p>The nature of Bitcoin exchanges is that the pricing is determined by supply and demand, hence no single exchange contains a true &quot;master price&quot; of Bitcoin.  To solve this issue, along with that of down-spikes (which are likely the result of technical outages and data set glitches) we will pull data from three more major Bitcoin exchanges to calculate an aggregate Bitcoin price index.</p>
<p>First, we will download the data from each exchange into a dictionary of dataframes.</p>
<pre><code class="language-python"># Pull pricing data for 3 more BTC exchanges
exchanges = ['COINBASE','BITSTAMP','ITBIT']

exchange_data = {}

exchange_data['KRAKEN'] = btc_usd_price_kraken

for exchange in exchanges:
    exchange_code = 'BCHARTS/{}USD'.format(exchange)
    btc_exchange_df = get_quandl_data(exchange_code)
    exchange_data[exchange] = btc_exchange_df
</code></pre>
<br>
<h5 id="step24mergeallofthepricingdataintoasingledataframe">Step 2.4 - Merge All Of The Pricing Data Into A Single Dataframe</h5>
<p>Next, we will define a simple function to merge a common column of each dataframe into a new combined dataframe.</p>
<pre><code class="language-python">def merge_dfs_on_column(dataframes, labels, col):
    '''Merge a single column of each dataframe into a new combined dataframe'''
    series_dict = {}
    for index in range(len(dataframes)):
        series_dict[labels[index]] = dataframes[index][col]
        
    return pd.DataFrame(series_dict)
</code></pre>
<br>
<p>Now we will merge all of the dataframes together on their &quot;Weighted Price&quot; column.</p>
<pre><code class="language-python"># Merge the BTC price dataseries' into a single dataframe
btc_usd_datasets = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Weighted Price')
</code></pre>
<br>
<p>Finally, we can preview last five rows the result using the <code>tail()</code> method, to make sure it looks ok.</p>
<pre><code class="language-python">btc_usd_datasets.tail()
</code></pre>
<div class="dataframe">
  <table border="1" class="dataframe">
    <tr style="text-align: right;">
      <th></th>
      <th>BITSTAMP</th>
      <th>COINBASE</th>
      <th>ITBIT</th>
      <th>KRAKEN</th>
    </tr>
    <tr>
      <th>Date</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
    
    <tbody>
      <tr>
        <th>2017-08-14</th>
        <td>4210.154943</td>
        <td>4213.332106</td>
        <td>4207.366696</td>
        <td>4213.257519</td>
      </tr>
      <tr>
        <th>2017-08-15</th>
        <td>4101.447155</td>
        <td>4131.606897</td>
        <td>4127.036871</td>
        <td>4149.146996</td>
      </tr>
      <tr>
        <th>2017-08-16</th>
        <td>4193.426713</td>
        <td>4193.469553</td>
        <td>4190.104520</td>
        <td>4187.399662</td>
      </tr>
      <tr>
        <th>2017-08-17</th>
        <td>4338.694675</td>
        <td>4334.115210</td>
        <td>4334.449440</td>
        <td>4346.508031</td>
      </tr>
      <tr>
        <th>2017-08-18</th>
        <td>4182.166174</td>
        <td>4169.555948</td>
        <td>4175.440768</td>
        <td>4198.277722</td>
      </tr>
    </tbody>
  </table>
</div>
<p>The prices look to be as expected: they are in similar ranges, but with slight variations based on the supply and demand of each individual Bitcoin exchange.</p>
<h5 id="step25visualizethepricingdatasets">Step 2.5 - Visualize The Pricing Datasets</h5>
<p>The next logical step is to visualize how these pricing datasets compare.  For this, we'll define a helper function to provide a single-line command to generate a graph from the dataframe.</p>
<pre><code class="language-python">def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
    '''Generate a scatter plot of the entire dataframe'''
    label_arr = list(df)
    series_arr = list(map(lambda col: df[col], label_arr))
    
    layout = go.Layout(
        title=title,
        legend=dict(orientation=&quot;h&quot;),
        xaxis=dict(type='date'),
        yaxis=dict(
            title=y_axis_label,
            showticklabels= not seperate_y_axis,
            type=scale
        )
    )
    
    y_axis_config = dict(
        overlaying='y',
        showticklabels=False,
        type=scale )
    
    visibility = 'visible'
    if initial_hide:
        visibility = 'legendonly'
        
    # Form Trace For Each Series
    trace_arr = []
    for index, series in enumerate(series_arr):
        trace = go.Scatter(
            x=series.index, 
            y=series, 
            name=label_arr[index],
            visible=visibility
        )
        
        # Add seperate axis for the series
        if seperate_y_axis:
            trace['yaxis'] = 'y{}'.format(index + 1)
            layout['yaxis{}'.format(index + 1)] = y_axis_config    
        trace_arr.append(trace)

    fig = go.Figure(data=trace_arr, layout=layout)
    py.iplot(fig)
</code></pre>
<p>In the interest of brevity, I won't go too far into how this helper function works.  Check out the documentation for <a href="http://pandas.pydata.org/">Pandas</a> and <a href="https://plot.ly/">Plotly</a> if you would like to learn more.</p>
<p>We can now easily generate a graph for the Bitcoin pricing data.</p>
<pre><code class="language-python"># Plot all of the BTC exchange prices
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
</code></pre>
<img id="combined-exchanges-pricing" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/combined-exchanges-pricing.png" alt="Analyzing Cryptocurrency Markets Using Python">
<h5 id="step26cleanandaggregatethepricingdata">Step 2.6 - Clean and Aggregate the Pricing Data</h5>
<p>We can see that, although the four series follow roughly the same path, there are various irregularities in each that we'll want to get rid of.</p>
<p>Let's remove all of the zero values from the dataframe, since we know that the price of Bitcoin has never been equal to zero in the timeframe that we are examining.</p>
<pre><code class="language-python"># Remove &quot;0&quot; values
btc_usd_datasets.replace(0, np.nan, inplace=True)
</code></pre>
<br>
<p>When we re-chart the dataframe, we'll see a much cleaner looking chart without the down-spikes.</p>
<pre><code class="language-python"># Plot the revised dataframe
df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
</code></pre>
<img id="combined-exchanges-pricing-clean" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/combined-exchanges-pricing-clean.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>We can now calculate a new column, containing the average daily Bitcoin price across all of the exchanges.</p>
<pre><code class="language-python"># Calculate the average BTC price as a new column
btc_usd_datasets['avg_btc_price_usd'] = btc_usd_datasets.mean(axis=1)
</code></pre>
<br>
<p>This new column is our Bitcoin pricing index!  Let's chart that column to make sure it looks ok.</p>
<pre><code class="language-python"># Plot the average BTC price
btc_trace = go.Scatter(x=btc_usd_datasets.index, y=btc_usd_datasets['avg_btc_price_usd'])
py.iplot([btc_trace])
</code></pre>
<img id="aggregate-bitcoin-price" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/aggregate-bitcoin-price.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>Yup, looks good.  We'll use this aggregate pricing series later on, in order to convert the exchange rates of other cryptocurrencies to USD.</p>
<h3 id="step3retrievealtcoinpricingdata">Step 3 - Retrieve Altcoin Pricing Data</h3>
<p>Now that we have a solid time series dataset for the price of Bitcoin, let's pull in some data for non-Bitcoin cryptocurrencies, commonly referred to as altcoins.</p>
<h5 id="step31definepoloniexapihelperfunctions">Step 3.1 - Define Poloniex API Helper Functions</h5>
<p>For retrieving data on cryptocurrencies we'll be using the <a href="https://poloniex.com/support/api/">Poloniex API</a>.  To assist in the altcoin data retrieval, we'll define two helper functions to download and cache JSON data from this API.</p>
<p>First, we'll define <code>get_json_data</code>, which will download and cache JSON data from a provided URL.</p>
<pre><code class="language-python">def get_json_data(json_url, cache_path):
    '''Download and cache JSON data, return as a dataframe.'''
    try:        
        f = open(cache_path, 'rb')
        df = pickle.load(f)   
        print('Loaded {} from cache'.format(json_url))
    except (OSError, IOError) as e:
        print('Downloading {}'.format(json_url))
        df = pd.read_json(json_url)
        df.to_pickle(cache_path)
        print('Cached {} at {}'.format(json_url, cache_path))
    return df
</code></pre>
<br>
<p>Next, we'll define a function that will generate Poloniex API HTTP requests, and will subsequently call our new <code>get_json_data</code> function to save the resulting data.</p>
<pre><code class="language-python">base_polo_url = 'https://poloniex.com/public?command=returnChartData&amp;currencyPair={}&amp;start={}&amp;end={}&amp;period={}'
start_date = datetime.strptime('2015-01-01', '%Y-%m-%d') # get data from the start of 2015
end_date = datetime.now() # up until today
pediod = 86400 # pull daily data (86,400 seconds per day)

def get_crypto_data(poloniex_pair):
    '''Retrieve cryptocurrency data from poloniex'''
    json_url = base_polo_url.format(poloniex_pair, start_date.timestamp(), end_date.timestamp(), pediod)
    data_df = get_json_data(json_url, poloniex_pair)
    data_df = data_df.set_index('date')
    return data_df
</code></pre>
<p>This function will take a cryptocurrency pair string (such as 'BTC_ETH') and return a dataframe containing the historical exchange rate of the two currencies.</p>
<h5 id="step32downloadtradingdatafrompoloniex">Step 3.2 - Download Trading Data From Poloniex</h5>
<p>Most altcoins cannot be bought directly with USD; to acquire these coins individuals often buy Bitcoins and then trade the Bitcoins for altcoins on cryptocurrency exchanges.  For this reason, we'll be downloading the exchange rate to BTC for each coin, and then we'll use our existing BTC pricing data to convert this value to USD.</p>
<p>We'll download exchange data for nine of the top cryptocurrencies -<br>
<a href="https://www.ethereum.org/">Ethereum</a>, <a href="https://litecoin.org/">Litecoin</a>, <a href="https://ripple.com/">Ripple</a>, <a href="https://ethereumclassic.github.io/">Ethereum Classic</a>, <a href="https://www.stellar.org/">Stellar</a>, <a href="https://www.dash.org/">Dash</a>, <a href="http://sia.tech/">Siacoin</a>, <a href="https://getmonero.org/">Monero</a>, and <a href="https://www.nem.io/">NEM</a>.</p>
<pre><code class="language-python">altcoins = ['ETH','LTC','XRP','ETC','STR','DASH','SC','XMR','XEM']

altcoin_data = {}
for altcoin in altcoins:
    coinpair = 'BTC_{}'.format(altcoin)
    crypto_price_df = get_crypto_data(coinpair)
    altcoin_data[altcoin] = crypto_price_df
</code></pre>
<br>
<p>Now we have a dictionary with 9 dataframes, each containing the historical daily average exchange prices between the altcoin and Bitcoin.</p>
<p>We can preview the last few rows of the Ethereum price table to make sure it looks ok.</p>
<pre><code class="language-python">altcoin_data['ETH'].tail()
</code></pre>
<div class="dataframe">
  <table border="1">
    <thead>
      <tr style="text-align: right;">
        <th></th>
        <th>close</th>
        <th>high</th>
        <th>low</th>
        <th>open</th>
        <th>quoteVolume</th>
        <th>volume</th>
        <th>weightedAverage</th>
      </tr>
      <tr>
        <th>date</th>
        <th></th>
        <th></th>
        <th></th>
        <th></th>
        <th></th>
        <th></th>
        <th></th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>2017-08-18 12:00:00</th>
        <td>0.070510</td>
        <td>0.071000</td>
        <td>0.070170</td>
        <td>0.070887</td>
        <td>17364.271529</td>
        <td>1224.762684</td>
        <td>0.070533</td>
      </tr>
      <tr>
        <th>2017-08-18 16:00:00</th>
        <td>0.071595</td>
        <td>0.072096</td>
        <td>0.070004</td>
        <td>0.070510</td>
        <td>26644.018123</td>
        <td>1893.136154</td>
        <td>0.071053</td>
      </tr>
      <tr>
        <th>2017-08-18 20:00:00</th>
        <td>0.071321</td>
        <td>0.072906</td>
        <td>0.070482</td>
        <td>0.071600</td>
        <td>39655.127825</td>
        <td>2841.549065</td>
        <td>0.071657</td>
      </tr>
      <tr>
        <th>2017-08-19 00:00:00</th>
        <td>0.071447</td>
        <td>0.071855</td>
        <td>0.070868</td>
        <td>0.071321</td>
        <td>16116.922869</td>
        <td>1150.361419</td>
        <td>0.071376</td>
      </tr>
      <tr>
        <th>2017-08-19 04:00:00</th>
        <td>0.072323</td>
        <td>0.072550</td>
        <td>0.071292</td>
        <td>0.071447</td>
        <td>14425.571894</td>
        <td>1039.596030</td>
        <td>0.072066</td>
      </tr>
    </tbody>
  </table>
</div>
<h5 id="step33convertpricestousd">Step 3.3 - Convert Prices to USD</h5>
<p>Now we can combine this BTC-altcoin exchange rate data with our Bitcoin pricing index to directly calculate the historical USD values for each altcoin.</p>
<pre><code class="language-python"># Calculate USD Price as a new column in each altcoin dataframe
for altcoin in altcoin_data.keys():
    altcoin_data[altcoin]['price_usd'] =  altcoin_data[altcoin]['weightedAverage'] * btc_usd_datasets['avg_btc_price_usd']
</code></pre>
<p>Here, we've created a new column in each altcoin dataframe with the USD prices for that coin.</p>
<p>Next, we can re-use our <code>merge_dfs_on_column</code> function from earlier to create a combined dataframe of the USD price for each cryptocurrency.</p>
<pre><code class="language-python"># Merge USD price of each altcoin into single dataframe 
combined_df = merge_dfs_on_column(list(altcoin_data.values()), list(altcoin_data.keys()), 'price_usd')
</code></pre>
<br>
<p>Easy.  Now let's also add the Bitcoin prices as a final column to the combined dataframe.</p>
<pre><code class="language-python"># Add BTC price to the dataframe
combined_df['BTC'] = btc_usd_datasets['avg_btc_price_usd']
</code></pre>
<br>
<p>Now we should have a single dataframe containing daily USD prices for the ten cryptocurrencies that we're examining.</p>
<p>Let's reuse our <code>df_scatter</code> function from earlier to chart all of the cryptocurrency prices against each other.</p>
<pre><code class="language-python"># Chart all of the altocoin prices
df_scatter(combined_df, 'Cryptocurrency Prices (USD)', seperate_y_axis=False, y_axis_label='Coin Value (USD)', scale='log')
</code></pre>
<img id="altcoin_prices_combined" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/altcoin_prices_combined.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>Nice! This graph provides a pretty solid &quot;big picture&quot; view of how the exchange rates for each currency have varied over the past few years.</p>
<blockquote>
<p>Note that we're using a logarithmic y-axis scale in order to compare all of the currencies on the same plot.  You are welcome to try out different parameter values here (such as <code>scale='linear'</code>) to get different perspectives on the data.</p>
</blockquote>
<h5 id="step34performcorrelationanalysis">Step 3.4 - Perform Correlation Analysis</h5>
<p>You might notice is that the cryptocurrency exchange rates, despite their wildly different values and volatility, look slightly correlated. Especially since the spike in April 2017, even many of the smaller fluctuations appear to be occurring in sync across the entire market.</p>
<p>A visually-derived hunch is not much better than a guess until we have the stats to back it up.</p>
<p>We can test our correlation hypothesis using the Pandas <code>corr()</code> method, which computes a Pearson correlation coefficient for each column in the dataframe against each other column.</p>
<blockquote>
<p>Revision Note 8/22/2017 - This section has been revised in order to use the daily return percentages instead of the absolute price values in calculating the correlation coefficients.</p>
</blockquote>
<p>Computing correlations directly on a non-stationary time series (such as raw pricing data) can give biased correlation values. We will work around this by first applying the <code>pct_change()</code> method, which will convert each cell in the dataframe from an absolute price value to a daily return percentage.</p>
<p>First we'll calculate correlations for 2016.</p>
<pre><code class="language-python"># Calculate the pearson correlation coefficients for cryptocurrencies in 2016
combined_df_2016 = combined_df[combined_df.index.year == 2016]
combined_df_2016.pct_change().corr(method='pearson')
</code></pre>
<div class="dataframe">
  <table border="1">
    <thead>
      <tr style="text-align: right;">
        <th></th>
        <th>DASH</th>
        <th>ETC</th>
        <th>ETH</th>
        <th>LTC</th>
        <th>SC</th>
        <th>STR</th>
        <th>XEM</th>
        <th>XMR</th>
        <th>XRP</th>
        <th>BTC</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>DASH</th>
        <td>1.000000</td>
        <td>0.003992</td>
        <td>0.122695</td>
        <td>-0.012194</td>
        <td>0.026602</td>
        <td>0.058083</td>
        <td>0.014571</td>
        <td>0.121537</td>
        <td>0.088657</td>
        <td>-0.014040</td>
      </tr>
      <tr>
        <th>ETC</th>
        <td>0.003992</td>
        <td>1.000000</td>
        <td>-0.181991</td>
        <td>-0.131079</td>
        <td>-0.008066</td>
        <td>-0.102654</td>
        <td>-0.080938</td>
        <td>-0.105898</td>
        <td>-0.054095</td>
        <td>-0.170538</td>
      </tr>
      <tr>
        <th>ETH</th>
        <td>0.122695</td>
        <td>-0.181991</td>
        <td>1.000000</td>
        <td>-0.064652</td>
        <td>0.169642</td>
        <td>0.035093</td>
        <td>0.043205</td>
        <td>0.087216</td>
        <td>0.085630</td>
        <td>-0.006502</td>
      </tr>
      <tr>
        <th>LTC</th>
        <td>-0.012194</td>
        <td>-0.131079</td>
        <td>-0.064652</td>
        <td>1.000000</td>
        <td>0.012253</td>
        <td>0.113523</td>
        <td>0.160667</td>
        <td>0.129475</td>
        <td>0.053712</td>
        <td>0.750174</td>
      </tr>
      <tr>
        <th>SC</th>
        <td>0.026602</td>
        <td>-0.008066</td>
        <td>0.169642</td>
        <td>0.012253</td>
        <td>1.000000</td>
        <td>0.143252</td>
        <td>0.106153</td>
        <td>0.047910</td>
        <td>0.021098</td>
        <td>0.035116</td>
      </tr>
      <tr>
        <th>STR</th>
        <td>0.058083</td>
        <td>-0.102654</td>
        <td>0.035093</td>
        <td>0.113523</td>
        <td>0.143252</td>
        <td>1.000000</td>
        <td>0.225132</td>
        <td>0.027998</td>
        <td>0.320116</td>
        <td>0.079075</td>
      </tr>
      <tr>
        <th>XEM</th>
        <td>0.014571</td>
        <td>-0.080938</td>
        <td>0.043205</td>
        <td>0.160667</td>
        <td>0.106153</td>
        <td>0.225132</td>
        <td>1.000000</td>
        <td>0.016438</td>
        <td>0.101326</td>
        <td>0.227674</td>
      </tr>
      <tr>
        <th>XMR</th>
        <td>0.121537</td>
        <td>-0.105898</td>
        <td>0.087216</td>
        <td>0.129475</td>
        <td>0.047910</td>
        <td>0.027998</td>
        <td>0.016438</td>
        <td>1.000000</td>
        <td>0.027649</td>
        <td>0.127520</td>
      </tr>
      <tr>
        <th>XRP</th>
        <td>0.088657</td>
        <td>-0.054095</td>
        <td>0.085630</td>
        <td>0.053712</td>
        <td>0.021098</td>
        <td>0.320116</td>
        <td>0.101326</td>
        <td>0.027649</td>
        <td>1.000000</td>
        <td>0.044161</td>
      </tr>
      <tr>
        <th>BTC</th>
        <td>-0.014040</td>
        <td>-0.170538</td>
        <td>-0.006502</td>
        <td>0.750174</td>
        <td>0.035116</td>
        <td>0.079075</td>
        <td>0.227674</td>
        <td>0.127520</td>
        <td>0.044161</td>
        <td>1.000000</td>
      </tr>
    </tbody>
  </table>
</div>
<p>These correlation coefficients are all over the place.  Coefficients close to 1 or -1 mean that the series' are strongly correlated or inversely correlated respectively, and coefficients close to zero mean that the values are not correlated, and fluctuate independently of each other.</p>
<p>To help visualize these results, we'll create one more helper visualization function.</p>
<pre><code class="language-python">def correlation_heatmap(df, title, absolute_bounds=True):
    '''Plot a correlation heatmap for the entire dataframe'''
    heatmap = go.Heatmap(
        z=df.corr(method='pearson').as_matrix(),
        x=df.columns,
        y=df.columns,
        colorbar=dict(title='Pearson Coefficient'),
    )
    
    layout = go.Layout(title=title)
    
    if absolute_bounds:
        heatmap['zmax'] = 1.0
        heatmap['zmin'] = -1.0
        
    fig = go.Figure(data=[heatmap], layout=layout)
    py.iplot(fig)
</code></pre>
<pre><code class="language-python">correlation_heatmap(combined_df_2016.pct_change(), &quot;Cryptocurrency Correlations in 2016&quot;)
</code></pre>
<img id="cryptocurrency-correlations-2016" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/cryptocurrency-correlations-2016-v2.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>Here, the dark red values represent strong correlations (note that each currency is, obviously, strongly correlated with itself), and the dark blue values represent strong inverse correlations.  All of the light blue/orange/gray/tan colors in-between represent varying degrees of weak/non-existent correlations.</p>
<p>What does this chart tell us? Essentially, it shows that there was little statistically significant linkage between how the prices of different cryptocurrencies fluctuated during 2016.</p>
<p>Now, to test our hypothesis that the cryptocurrencies have become more correlated in recent months, let's repeat the same test using only the data from 2017.</p>
<pre><code class="language-python">combined_df_2017 = combined_df[combined_df.index.year == 2017]
combined_df_2017.pct_change().corr(method='pearson')
</code></pre>
<div class="dataframe">
  <table border="1">
    <thead>
      <tr style="text-align: right;">
        <th></th>
        <th>DASH</th>
        <th>ETC</th>
        <th>ETH</th>
        <th>LTC</th>
        <th>SC</th>
        <th>STR</th>
        <th>XEM</th>
        <th>XMR</th>
        <th>XRP</th>
        <th>BTC</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>DASH</th>
        <td>1.000000</td>
        <td>0.384109</td>
        <td>0.480453</td>
        <td>0.259616</td>
        <td>0.191801</td>
        <td>0.159330</td>
        <td>0.299948</td>
        <td>0.503832</td>
        <td>0.066408</td>
        <td>0.357970</td>
      </tr>
      <tr>
        <th>ETC</th>
        <td>0.384109</td>
        <td>1.000000</td>
        <td>0.602151</td>
        <td>0.420945</td>
        <td>0.255343</td>
        <td>0.146065</td>
        <td>0.303492</td>
        <td>0.465322</td>
        <td>0.053955</td>
        <td>0.469618</td>
      </tr>
      <tr>
        <th>ETH</th>
        <td>0.480453</td>
        <td>0.602151</td>
        <td>1.000000</td>
        <td>0.286121</td>
        <td>0.323716</td>
        <td>0.228648</td>
        <td>0.343530</td>
        <td>0.604572</td>
        <td>0.120227</td>
        <td>0.421786</td>
      </tr>
      <tr>
        <th>LTC</th>
        <td>0.259616</td>
        <td>0.420945</td>
        <td>0.286121</td>
        <td>1.000000</td>
        <td>0.296244</td>
        <td>0.333143</td>
        <td>0.250566</td>
        <td>0.439261</td>
        <td>0.321340</td>
        <td>0.352713</td>
      </tr>
      <tr>
        <th>SC</th>
        <td>0.191801</td>
        <td>0.255343</td>
        <td>0.323716</td>
        <td>0.296244</td>
        <td>1.000000</td>
        <td>0.417106</td>
        <td>0.287986</td>
        <td>0.374707</td>
        <td>0.248389</td>
        <td>0.377045</td>
      </tr>
      <tr>
        <th>STR</th>
        <td>0.159330</td>
        <td>0.146065</td>
        <td>0.228648</td>
        <td>0.333143</td>
        <td>0.417106</td>
        <td>1.000000</td>
        <td>0.396520</td>
        <td>0.341805</td>
        <td>0.621547</td>
        <td>0.178706</td>
      </tr>
      <tr>
        <th>XEM</th>
        <td>0.299948</td>
        <td>0.303492</td>
        <td>0.343530</td>
        <td>0.250566</td>
        <td>0.287986</td>
        <td>0.396520</td>
        <td>1.000000</td>
        <td>0.397130</td>
        <td>0.270390</td>
        <td>0.366707</td>
      </tr>
      <tr>
        <th>XMR</th>
        <td>0.503832</td>
        <td>0.465322</td>
        <td>0.604572</td>
        <td>0.439261</td>
        <td>0.374707</td>
        <td>0.341805</td>
        <td>0.397130</td>
        <td>1.000000</td>
        <td>0.213608</td>
        <td>0.510163</td>
      </tr>
      <tr>
        <th>XRP</th>
        <td>0.066408</td>
        <td>0.053955</td>
        <td>0.120227</td>
        <td>0.321340</td>
        <td>0.248389</td>
        <td>0.621547</td>
        <td>0.270390</td>
        <td>0.213608</td>
        <td>1.000000</td>
        <td>0.170070</td>
      </tr>
      <tr>
        <th>BTC</th>
        <td>0.357970</td>
        <td>0.469618</td>
        <td>0.421786</td>
        <td>0.352713</td>
        <td>0.377045</td>
        <td>0.178706</td>
        <td>0.366707</td>
        <td>0.510163</td>
        <td>0.170070</td>
        <td>1.000000</td>
      </tr>
    </tbody>
  </table>
</div>
<p>These are somewhat more significant correlation coefficients.  Strong enough to use as the sole basis for an investment? Certainly not.</p>
<p>It is notable, however, that almost all of the cryptocurrencies have become more correlated with each other across the board.</p>
<pre><code class="language-python">correlation_heatmap(combined_df_2017.pct_change(), &quot;Cryptocurrency Correlations in 2017&quot;)
</code></pre>
<img id="cryptocurrency-correlations-2017" src="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/plot-images/cryptocurrency-correlations-2017-v2.png" alt="Analyzing Cryptocurrency Markets Using Python">
<p>Huh. That's rather interesting.</p>
<h3 id="whyisthishappening">Why is this happening?</h3>
<p>Good question.  I'm really not sure.</p>
<p>The most immediate explanation that comes to mind is that <strong>hedge funds have recently begun publicly trading in crypto-currency markets</strong><sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup><sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>.  These funds have vastly more capital to play with than the average trader, so if a fund is hedging their bets across multiple cryptocurrencies, and using similar trading strategies for each based on independent variables (say, the stock market), it could make sense that this trend of increasing correlations would emerge.</p>
<h5 id="indepthxrpandstr">In-Depth - XRP and STR</h5>
<p>For instance, one noticeable trait of the above chart is that XRP (the token for <a href="https://ripple.com/">Ripple</a>), is the least correlated cryptocurrency.  The notable exception here is with STR (the token for <a href="https://www.stellar.org/">Stellar</a>, officially known as &quot;Lumens&quot;), which has a stronger (0.62) correlation with XRP.</p>
<p>What is interesting here is that Stellar and Ripple are both fairly similar fintech platforms aimed at reducing the friction of international money transfers between banks.</p>
<p>It is conceivable that some big-money players and hedge funds might be using similar trading strategies for their investments in Stellar and Ripple, due to the similarity of the blockchain services that use each token. This could explain why XRP is so much more heavily correlated with STR than with the other cryptocurrencies.</p>
<blockquote>
<p>Quick Plug - I'm a contributor to <a href="https://chippercash.com/">Chipper</a>, a (very) early-stage startup using Stellar with the aim of disrupting micro-remittances in Africa.</p>
</blockquote>
<h3 id="yourturn">Your Turn</h3>
<p>This explanation is, however, largely speculative.  <strong>Maybe you can do better</strong>.  With the foundation we've made here, there are hundreds of different paths to take to continue searching for stories within the data.</p>
<p>Here are some ideas:</p>
<ul>
<li>Add data from more cryptocurrencies to the analysis.</li>
<li>Adjust the time frame and granularity of the correlation analysis, for a more fine or coarse grained view of the trends.</li>
<li>Search for trends in trading volume and/or blockchain mining data sets.  The buy/sell volume ratios are likely more relevant than the raw price data if you want to predict future price fluctuations.</li>
<li>Add pricing data on stocks, commodities, and fiat currencies to determine which of them correlate with cryptocurrencies (but please remember the old adage that &quot;Correlation does not imply causation&quot;).</li>
<li>Quantify the amount of &quot;buzz&quot; surrounding specific cryptocurrencies using <a href="https://eventregistry.org/">Event Registry</a>, <a href="https://www.gdeltproject.org/">GDELT</a>, and <a href="https://trends.google.com/trends/">Google Trends</a>.</li>
<li>Train a predictive machine learning model on the data to predict tomorrow's prices.  If you're more ambitious, you could even try doing this with a recurrent neural network (RNN).</li>
<li>Use your analysis to create an automated &quot;Trading Bot&quot; on a trading site such as <a href="http://poloniex.com/">Poloniex</a> or <a href="https://www.coinbase.com/dashboard">Coinbase</a>, using their respective trading APIs.  Be careful: a poorly optimized trading bot is an easy way to lose your money quickly.</li>
<li><strong>Share your findings!</strong>  The best part of Bitcoin, and of cryptocurrencies in general, is that their decentralized nature makes them more free and democratic than virtually any other asset.  Open source your analysis, participate in the community, maybe write a blog post about it.</li>
</ul>
<p>An HTML version of the Python notebook is available <a href="https://cdn.patricktriest.com/blog/images/posts/crypto-markets/Cryptocurrency-Pricing-Analysis.html">here</a>.</p>
<p>Hopefully, now you have the skills to do your own analysis and to think critically about any speculative cryptocurrency articles you might read in the future, especially those written without any data to back up the provided predictions.</p>
<p>Thanks for reading, and please comment below if you have any ideas, suggestions, or criticisms regarding this tutorial.  If you find problems with the code, you can also feel free to open an issue in the Github repository <a href="https://github.com/triestpa/Cryptocurrency-Analysis-Python">here</a>.</p>
<p>I've got second (and potentially third) part in the works, which will likely be following through on some of the ideas listed above, so stay tuned for more in the coming weeks.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p><a href="http://fortune.com/2017/07/26/bitcoin-cryptocurrency-hedge-fund-sequoia-andreessen-horowitz-metastable/">http://fortune.com/2017/07/26/bitcoin-cryptocurrency-hedge-fund-sequoia-andreessen-horowitz-metastable/</a> <a href="#fnref1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p><a href="https://www.forbes.com/sites/laurashin/2017/07/12/crypto-boom-15-new-hedge-funds-want-in-on-84000-returns/#7946ab0d416a">https://www.forbes.com/sites/laurashin/2017/07/12/crypto-boom-15-new-hedge-funds-want-in-on-84000-returns/#7946ab0d416a</a> <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</div>]]></content:encoded></item><item><title><![CDATA[Async/Await Will Make Your Code Simpler]]></title><description><![CDATA[Or How I Learned to Stop Writing Callback Functions and Love Javascript ES8.]]></description><link>http://blog.patricktriest.com/what-is-async-await-why-should-you-care/</link><guid isPermaLink="false">598eaf94b7d6af1a6a795fd2</guid><category><![CDATA[Javascript]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Fri, 11 Aug 2017 16:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/async-await/async_await_header.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><h2 id="orhowilearnedtostopwritingcallbackfunctionsandlovejavascriptes8">Or How I Learned to Stop Writing Callback Functions and Love Javascript ES8.</h2>
<img src="https://cdn.patricktriest.com/blog/images/posts/async-await/async_await_header.png" alt="Async/Await Will Make Your Code Simpler"><p>Sometimes modern Javascript projects get out of hand.  A major culprit in this can be the messy handling of asynchronous tasks, leading to long, complex, and deeply nested blocks of code.  Javascript now provides a new syntax for handling these operations, and it can turn even the most convoluted asynchronous operations into concise and highly readable code.</p>
<h2 id="background">Background</h2>
<h4 id="ajaxasynchronousjavascriptandxml">AJAX (Asynchronous JavaScript And XML)</h4>
<p>First a brief bit of history.  In the late 1990s, Ajax was the first major breakthrough in asynchronous Javascript.  This technique allowed websites to pull and display new data after the HTML had been loaded, a revolutionary idea at a time when most websites would download the entire page again to display a content update.  The technique (popularized in name by the bundled helper function in jQuery) dominated web-development for all of the 2000s, and Ajax is the primary technique that websites use to retrieve data today, but with XML largely substituted for JSON.</p>
<h4 id="nodejs">NodeJS</h4>
<p>When NodeJS was first released in 2009, a major focus of the server-side environment was allowing programs to gracefully handle concurrency.  Most server-side languages at the time handled I/O operations by <em>blocking</em> the code completion until the operation had finished.  Nodejs instead utilized an event-loop architecture, such that developers could assign &quot;callback&quot; functions to be triggered once <em>non-blocking</em> asynchronous operations had completed, in a similar manner to how the Ajax syntax worked.</p>
<h4 id="promises">Promises</h4>
<p>A few years later, a new standard called &quot;Promises&quot; emerged in both NodeJS and browser environments, offering a powerful and standardized way to compose asynchronous operations.  Promises still used a callback based format, but offered a consistent syntax for chaining and composing asynchronous operations. Promises, which had been pioneered by popular open-source libraries, were finally added as a native feature to Javascript in 2015.</p>
<p>Promises were a major improvement, but they still can often be the cause of somewhat verbose and difficult-to-read blocks of code.</p>
<p><em>Now there is a solution.</em></p>
<p>Async/await is a new syntax (borrowed from .NET and C#) that allows us to compose Promises as though they were just normal synchronous functions without callbacks.  It's a fantastic addition to the Javascript language, added last year in Javascript ES7, and can be used to simplify pretty much any existing JS application.</p>
<h2 id="examples">Examples</h2>
<p>We'll be going through a few code examples.</p>
<blockquote>
<p>No libraries are required to run these examples. <strong>Async/await is fully supported in the latest versions of Chrome, Firefox, Safari, and Edge, so you can try out the examples in your browser console</strong>. Additionally, async/await syntax works in Nodejs version 7.6 and higher, and is supported by the Babel and Typescript transpilers, so it can really be used in any Javascript project today.</p>
</blockquote>
<h4 id="setup">Setup</h4>
<p>If you want to follow along on your machine, we'll be using this dummy API class.  The class simulates network calls by returning promises which will resolve with simple data 200ms after being called.</p>
<pre><code class="language-javascript">class Api {
  constructor () {
    this.user = { id: 1, name: 'test' }
    this.friends = [ this.user, this.user, this.user ]
    this.photo = 'not a real photo'
  }

  getUser () {
    return new Promise((resolve, reject) =&gt; {
      setTimeout(() =&gt; resolve(this.user), 200)
    })
  }

  getFriends (userId) {
    return new Promise((resolve, reject) =&gt; {
      setTimeout(() =&gt; resolve(this.friends.slice()), 200)
    })
  }

  getPhoto (userId) {
    return new Promise((resolve, reject) =&gt; {
      setTimeout(() =&gt; resolve(this.photo), 200)
    })
  }

  throwError () {
    return new Promise((resolve, reject) =&gt; {
      setTimeout(() =&gt; reject(new Error('Intentional Error')), 200)
    })
  }
}
</code></pre>
<p>Each example will be performing the same three operations in sequence: retrieve a user, retrieve their friends, retrieve their picture.  At the end, we will log all three results to the console.</p>
<h4 id="attempt1nestedpromisecallbackfunctions">Attempt 1 - Nested Promise Callback Functions</h4>
<p>Here is an implemention using nested promise callback functions.</p>
<pre><code class="language-javascript">function callbackHell () {
  const api = new Api()
  let user, friends
  api.getUser().then(function (returnedUser) {
    user = returnedUser
    api.getFriends(user.id).then(function (returnedFriends) {
      friends = returnedFriends
      api.getPhoto(user.id).then(function (photo) {
        console.log('callbackHell', { user, friends, photo })
      })
    })
  })
}
</code></pre>
<p>This probably looks familiar to anyone who has worked on a Javascript project.  The code block, which has a reasonably simple purpose, is long,  deeply nested, and ends in this...</p>
<pre><code class="language-javascript">      })
    })
  })
}
</code></pre>
<p>In a real codebase, each callback function might be quite long, which can result in huge and deeply indented functions.  Dealing with this type of code, working with callbacks within callbacks within callbacks, is what is commonly referred to as &quot;callback hell&quot;.</p>
<p>Even worse, there's no error checking, so any of the callbacks could fail silently as an unhandled promise rejection.</p>
<h4 id="attempt2promisechain">Attempt 2 - Promise Chain</h4>
<p>Let's see if we can do any better.</p>
<pre><code class="language-javascript">function promiseChain () {
  const api = new Api()
  let user, friends
  api.getUser()
    .then((returnedUser) =&gt; {
      user = returnedUser
      return api.getFriends(user.id)
    })
    .then((returnedFriends) =&gt; {
      friends = returnedFriends
      return api.getPhoto(user.id)
    })
    .then((photo) =&gt; {
      console.log('promiseChain', { user, friends, photo })
    })
}
</code></pre>
<p>One nice feature of promises is that they can be chained by returning another promise inside each callback.  This way we can keep all of the callbacks on the same indentation level. We're also using arrow functions to abbreviate the callback function declarations.</p>
<p>This variant is certainly easier to read than the previous, and has a better sense of sequentiality, but is still very verbose and a bit complex looking.</p>
<h4 id="attempt3asyncawait">Attempt 3 - Async/Await</h4>
<p>What if it were possible to write it without any callback functions? Impossible? <strong>How about writing it in 7 lines?</strong></p>
<pre><code class="language-javascript">async function asyncAwaitIsYourNewBestFriend () {
  const api = new Api()
  const user = await api.getUser()
  const friends = await api.getFriends(user.id)
  const photo = await api.getPhoto(user.id)
  console.log('asyncAwaitIsYourNewBestFriend', { user, friends, photo })
}
</code></pre>
<p>Much better. Calling &quot;await&quot; in front of a promise pauses the flow of the function until the promise has resolved, and assigns the result to the variable to the left of the equal sign.  This way we can program an asynchronous operation flow as though it were a normal synchronous series of commands.</p>
<p>I hope you're as excited as I am at this point.</p>
<blockquote>
<p>Note that &quot;async&quot; is declared at the beginning of the function declaration.  This is required and actually turns the entire function into a promise.  We'll dig into that later on.</p>
</blockquote>
<h2 id="loops">Loops</h2>
<p>Async/await makes lots of previously complex operations really easy.  For example, what if we wanted to sequentially retrieve the friends lists for each of the user's friends?</p>
<h3 id="attempt1recursivepromiseloop">Attempt 1 - Recursive Promise Loop</h3>
<p>Here's how fetching each friend list sequentially might look with normal promises.</p>
<pre><code class="language-javascript">function promiseLoops () {  
  const api = new Api()
  api.getUser()
    .then((user) =&gt; {
      return api.getFriends(user.id)
    })
    .then((returnedFriends) =&gt; {
      const getFriendsOfFriends = (friends) =&gt; {
        if (friends.length &gt; 0) {
          let friend = friends.pop()
          return api.getFriends(friend.id)
            .then((moreFriends) =&gt; {
              console.log('promiseLoops', moreFriends)
              return getFriendsOfFriends(friends)
            })
        }
      }
      return getFriendsOfFriends(returnedFriends)
    })
}
</code></pre>
<p>We're creating an inner-function that recursively chains promises for the fetching friends-of-friends until the list is empty.  Ugh.  It's completely functional, which is nice, but this is still an exceptionally complicated solution for a fairly straightforward task.</p>
<blockquote>
<p>Note -  Attempting to simplify the <code>promiseLoops()</code> function using <code>Promise.all()</code> will result in a function that behaves in significantly different manner.  The intention of this example is to run the operations <strong>sequentially</strong> (one at a time), whereas <code>Promise.all()</code> is used for running asynchronous operations <strong>concurrently</strong> (all at once). <code>Promise.all()</code> is still very powerful when combined with async/await, however, as we'll see in the next section.</p>
</blockquote>
<h3 id="attempt2asyncawaitforloop">Attempt 2 - Async/Await For-Loop</h3>
<p>This could be so much easier.</p>
<pre><code class="language-javascript">async function asyncAwaitLoops () {
  const api = new Api()
  const user = await api.getUser()
  const friends = await api.getFriends(user.id)

  for (let friend of friends) {
    let moreFriends = await api.getFriends(friend.id)
    console.log('asyncAwaitLoops', moreFriends)
  }
}
</code></pre>
<p>No need to write any recursive promise closures. Just a for-loop. Async/await is your friend.</p>
<h2 id="paralleloperations">Parallel Operations</h2>
<p>It's a bit slow to get each additional friend list one-by-one, why not do them in parallel? Can we do that with async/await?</p>
<p>Yeah, of course we can. It solves all of our problems.</p>
<pre><code class="language-javascript">async function asyncAwaitLoopsParallel () {
  const api = new Api()
  const user = await api.getUser()
  const friends = await api.getFriends(user.id)
  const friendPromises = friends.map(friend =&gt; api.getFriends(friend.id))
  const moreFriends = await Promise.all(friendPromises)
  console.log('asyncAwaitLoopsParallel', moreFriends)
}
</code></pre>
<p>To run operations in parallel, form an array of promises to be run, and pass it as the parameter to <code>Promise.all()</code>.  This returns a single promise for us to await, which will resolve once all of the operations have completed.</p>
<h2 id="errorhandling">Error Handling</h2>
<p>There is, however, one major issue in asynchronous programming that we haven't addressed yet: error handling.  The bane of many codebases, asynchronous error handling often involves writing individual error handling callbacks for each operation.  Percolating errors to the top of the call stack can be complicated, and normally requires explicitly checking if an error was thrown at the beginning of every callback.  This approach is tedious, verbose and error-prone.  Furthermore, any exception thrown in a promise will fail silently if not properly caught, leading to &quot;invisible errors&quot; in codebases with incomplete error checking.</p>
<p>Let's go back through the examples and add error handling to each.  To test the error handling, we'll be calling an additional function, &quot;api.throwError()&quot;, before retrieving the user photo.</p>
<h4 id="attempt1promiseerrorcallbacks">Attempt 1 - Promise Error Callbacks</h4>
<p>Let's look at a worst-case scenario.</p>
<pre><code class="language-javascript">function callbackErrorHell () {
  const api = new Api()
  let user, friends
  api.getUser().then(function (returnedUser) {
    user = returnedUser
    api.getFriends(user.id).then(function (returnedFriends) {
      friends = returnedFriends
      api.throwError().then(function () {
        console.log('Error was not thrown')
        api.getPhoto(user.id).then(function (photo) {
          console.log('callbackErrorHell', { user, friends, photo })
        }, function (err) {
          console.error(err)
        })
      }, function (err) {
        console.error(err)
      })
    }, function (err) {
      console.error(err)
    })
  }, function (err) {
    console.error(err)
  })
}
</code></pre>
<p>This is just awful.  Besides being really long and ugly, the control flow is very unintuitive to follow since it flows from the outside in, instead of from top to bottom like normal, readable code. Awful. Let's move on.</p>
<h4 id="attempt2promisechaincatchmethod">Attempt 2 - Promise Chain &quot;Catch&quot; Method</h4>
<p>We can improve things a bit by using a combined Promise &quot;catch&quot; method.</p>
<pre><code class="language-javascript">function callbackErrorPromiseChain () {
  const api = new Api()
  let user, friends
  api.getUser()
    .then((returnedUser) =&gt; {
      user = returnedUser
      return api.getFriends(user.id)
    })
    .then((returnedFriends) =&gt; {
      friends = returnedFriends
      return api.throwError()
    })
    .then(() =&gt; {
      console.log('Error was not thrown')
      return api.getPhoto(user.id)
    })
    .then((photo) =&gt; {
      console.log('callbackErrorPromiseChain', { user, friends, photo })
    })
    .catch((err) =&gt; {
      console.error(err)
    })
}
</code></pre>
<p>This is certainly better; by leveraging a single catch function at the end of the promise chain, we can provide a single error handler for all of the operations. However, it's still a bit complex, and we are still forced to handle the asynchronous errors using a special callback instead of handling them the same way we would normal Javascript errors.</p>
<h4 id="attempt3normaltrycatchblock">Attempt 3 - Normal Try/Catch Block</h4>
<p>We can do better.</p>
<pre><code class="language-javascript">async function aysncAwaitTryCatch () {
  try {
    const api = new Api()
    const user = await api.getUser()
    const friends = await api.getFriends(user.id)

    await api.throwError()
    console.log('Error was not thrown')

    const photo = await api.getPhoto(user.id)
    console.log('async/await', { user, friends, photo })
  } catch (err) {
    console.error(err)
  }
}
</code></pre>
<p>Here, we've wrapped the entire operation within a normal try/catch block.  This way, we can throw and catch errors from synchronous code and asynchronous code in the exact same way. Much simpler.</p>
<h2 id="composition">Composition</h2>
<p>I mentioned earlier that any function tagged with &quot;async&quot; actually returns a promise.  This allows us to really easily compose asynchronous control flows.</p>
<p>For instance, we can reconfigure the earlier example to return the user data instead of logging it.  Then we can retrieve the data by calling the async function as a promise.</p>
<pre><code class="language-javascript">async function getUserInfo () {
  const api = new Api()
  const user = await api.getUser()
  const friends = await api.getFriends(user.id)
  const photo = await api.getPhoto(user.id)
  return { user, friends, photo }
}

function promiseUserInfo () {
  getUserInfo().then(({ user, friends, photo }) =&gt; {
    console.log('promiseUserInfo', { user, friends, photo })
  })
}
</code></pre>
<br>
<p>Even better, we can use async/await syntax in the receiver function too, leading to a completely obvious, even trivial, block of asynchronous programing.</p>
<pre><code class="language-javascript">async function awaitUserInfo () {
  const { user, friends, photo } = await getUserInfo()
  console.log('awaitUserInfo', { user, friends, photo })
}
</code></pre>
<br>
<p>What if now we need to retrieve all of the data for the first 10 users?</p>
<pre><code class="language-javascript">async function getLotsOfUserData () {
  const users = []
  while (users.length &lt; 10) {
    users.push(await getUserInfo())
  }
  console.log('getLotsOfUserData', users)
}
</code></pre>
<br>
<p>How about in parallel? And with airtight error handling?</p>
<pre><code class="language-javascript">async function getLotsOfUserDataFaster () {
  try {
    const userPromises = Array(10).fill(getUserInfo())
    const users = await Promise.all(userPromises)
    console.log('getLotsOfUserDataFaster', users)
  } catch (err) {
    console.error(err)
  }
}
</code></pre>
<br>
<h2 id="conclusion">Conclusion</h2>
<p>With the rise of single-page javascript web apps and the widening adoption of NodeJS, handling concurrency gracefully is more important than ever for Javascript developers.  Async/await alleviates many of the bug-inducing control-flow issues that have plagued Javascript codebases for decades and is pretty much guaranteed to make any async code block significantly shorter, simpler, and more self-evident.  With near-universal support in mainstream browsers and NodeJS, this is the perfect time to integrate these techniques into your own coding practices and projects.</p>
<br>
<br>
<h4 style="text-align: center;">Join The Discussion on Reddit</h4>
<blockquote class="reddit-card"><a href="https://www.reddit.com/r/javascript/comments/6tdeys/asyncawait_will_make_your_code_simpler/?ref=share&ref_source=embed">Async/Await Will Make Your Code Simpler</a> from <a href="http://www.reddit.com/r/javascript">javascript</a></blockquote>
<blockquote class="reddit-card" data-card-created="1502680595"><a href="https://www.reddit.com/r/webdev/comments/6tfzvt/asyncawait_will_make_your_code_simpler/?ref=share&ref_source=embed">Async/Await Will Make Your Code Simpler</a> from <a href="http://www.reddit.com/r/webdev">webdev</a></blockquote>
<script async src="//embed.redditmedia.com/widgets/platform.js" charset="UTF-8"></script>
<style>

#code-iframe-container {
position:fixed;
bottom: 0;
left: 0;
z-index: 8888;
width: 100%;
overflow: hidden;
}

#code-iframe {
border: none;
margin: 0px;
transition: height 0.25s ease-in-out;
}

.editor-showing {
height: 90vh;
}

.editor-hidden {
height: 0px;
}

.read-next a, .gh-subscribe-dark, .site-footer {
z-index: 9999
}

.editor-toggle-button {
width: 100%;
text-align: center;
background: #111;
color: white;
letter-spacing: 1rem;
text-transform: uppercase;
font-size: 2.5rem;
padding: 1rem;
cursor: pointer;
display: flex;
flex-direction: row;
justify-content: center;
align-items: center;
}

.full-screen-button {
position: absolute;
right: 0;
padding: 24px;
}

.noscroll {
overflow:hidden;
}

@media only screen and (max-width: 500px) {
#code-iframe-container {
display:none;
}
}

</style>
<script>

window.addEventListener('DOMContentLoaded', () => {
// Only load editor on desktops
if (window.screen.availWidth > 500) {
document.getElementById('code-iframe-container').innerHTML += `<iframe id="code-iframe" class="editor-showing editor-hidden" width=100%></iframe>`

document.getElementById('code-iframe').src = "https://code.patricktriest.com?editor=monaco"
}

setTimeout(() => {
$('#code-iframe-container').toggleClass('hidden')
}, 5000)
})

function toggleEditor() {
$('#code-iframe').toggleClass('editor-hidden');
$('body').toggleClass('noscroll');
}
</script>
<div id="code-iframe-container" class="hidden"><div class="editor-toggle-button" onclick="toggleEditor()"><span>Toggle Code Editor</span><a class="full-screen-button" href="https://code.patricktriest.com" target="_blank" rel="noopener"> <img src="https://storage.googleapis.com/material-icons/external-assets/v4/icons/svg/ic_fullscreen_white_36px.svg" alt="Async/Await Will Make Your Code Simpler"></a></div></div></div>]]></content:encoded></item><item><title><![CDATA[Would You Survive the Titanic? A Guide to Machine Learning in Python]]></title><description><![CDATA[A hands-on introduction, in Python, to using machine learning techniques for your own projects and data sets.]]></description><link>http://blog.patricktriest.com/titanic-machine-learning-in-python/</link><guid isPermaLink="false">598eaf93b7d6af1a6a795fce</guid><category><![CDATA[Python]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Mon, 04 Jul 2016 15:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/titanic.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><blockquote>
<img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/titanic.jpg" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"><p><em>I originally wrote this post for the SocialCops engineering blog,<br>
check it out here - <a href="https://blog.socialcops.com/engineering/machine-learning-python/">Would You Survive the Titanic? A Guide to Machine Learning in Python</a></em></p>
</blockquote>
<p>What if machines could learn?</p>
<p>This has been one of the most intriguing questions in science fiction and philosophy since the advent of machines. With modern technology, such questions are no longer bound to creative conjecture. Machine learning is all around us. From deciding which movie you might want to watch next on Netflix to predicting stock market trends, machine learning has a profound impact on how data is understood in the modern era.</p>
<p>This tutorial aims to give you an accessible introduction on how to use machine learning techniques for your projects and data sets. In roughly 20 minutes, you will learn how to use Python to apply different machine learning techniques — from decision trees to deep neural networks — to a sample data set. This is a practical, not a conceptual, introduction; to fully understand the capabilities of machine learning, I highly recommend that you seek out resources that explain the low-level implementations and theory of these techniques.</p>
<p>Our sample dataset: passengers of the RMS Titanic. We will use an open data set with data on the passengers aboard the infamous doomed sea voyage of 1912. By examining factors such as class, sex, and age, we will experiment with different machine learning algorithms and build a program that can predict whether a given passenger would have survived this disaster.</p>
<h2 id="settingupyourmachinelearninglaboratory">Setting Up Your Machine Learning Laboratory</h2>
<p>The best way to learn about machine learning is to follow along with this tutorial on your computer. To do this, you will need to install a few software packages if you do not have them yet:</p>
<ul>
<li>Python (version 3.4.2 was used for this tutorial): <a href="https://www.python.org">https://www.python.org</a></li>
<li>SciPy Ecosystem (NumPy, SciPy, Pandas, IPython, matplotlib): <a href="https://www.scipy.org">https://www.scipy.org</a></li>
<li>SciKit-Learn: <a href="http://scikit-learn.org/stable/">http://scikit-learn.org/stable/</a></li>
<li>TensorFlow: <a href="https://www.tensorflow.org">https://www.tensorflow.org</a></li>
</ul>
<p>There are multiple ways to install each of these packages. I recommend using the “pip” Python package manager, which will allow you to simply run “pip3 install <packagename>” to install each of the dependencies: <a href="https://pip.pypa.io/en/stable/">https://pip.pypa.io/en/stable/</a>.</packagename></p>
<p>For actually writing and running the code, I recommend using IPython (which will allow you to run modular blocks of code and immediately view the output values and data visualizations) along with the Jupyter Notebook as a graphical interface: <a href="https://jupyter.org">https://jupyter.org</a>.</p>
<p>You will also need the Titanic dataset that we will be analyzing. You can find it here: <a href="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls">http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls</a>.</p>
<p>With all of the dependencies installed, simply run “jupyter notebook” on the command line, from the same directory as the titanic3.xls file, and you will be ready to get started.</p>
<h2 id="thedataatfirstglancewhosurvivedthetitanicandwhy">The Data at First Glance: Who Survived the Titanic and Why?</h2>
<p>First, import the required Python dependencies.</p>
<script src="https://gist.github.com/triestpa/3b384a15076aeb4ec9cc7bb8c5e494c7.js"></script>
<p>Note: This tutorial was written using TensorFlow version 0.8.0. Newer versions of TensorFlow use different import statements and names. If you’re on a newer version of TensorFlow, check out this Github issue or the latest TensorFlow Learn documentation for the newest import statements.</p>
<p>Once we have read the spreadsheet file into a Pandas dataframe (imagine a hyperpowered Excel table), we can peek at the first five rows of data using the head() command.</p>
<script src="https://gist.github.com/triestpa/63916ed9026f4d94d59453d53784703b.js"></script>
<p>The column heading variables have the following meanings:</p>
<ul>
<li><strong>survival</strong>: Survival (0 = no; 1 = yes)</li>
<li><strong>class</strong>: Passenger class (1 = first; 2 = second; 3 = third)</li>
<li><strong>name</strong>: Name</li>
<li><strong>sex</strong>: Sex</li>
<li><strong>age</strong>: Age</li>
<li><strong>sibsp</strong>: Number of siblings/spouses aboard</li>
<li><strong>parch</strong>: Number of parents/children aboard</li>
<li><strong>ticket</strong>: Ticket number</li>
<li><strong>fare</strong>: Passenger fare</li>
<li><strong>cabin</strong>: Cabin</li>
<li><strong>embarked</strong>: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)</li>
<li><strong>boat</strong>: Lifeboat (if survived)</li>
<li><strong>body</strong>: Body number (if did not survive and body was recovered)</li>
</ul>
<p>Now that we have the data in a dataframe, we can begin an advanced analysis of the data using powerful single-line Pandas functions. First, let’s examine the overall chance of survival for a Titanic passenger.</p>
<script src="https://gist.github.com/triestpa/4c8a7694a2b7fee5633d99b2a421d5ef.js"></script>
<p>The calculation shows that only 38% of the passengers survived. Not the best odds. The reason for this massive loss of life is that the Titanic was only carrying 20 lifeboats, which was not nearly enough for the 1,317 passengers and 885 crew members aboard. It seems unlikely that all of the passengers would have had equal chances at survival, so we will continue breaking down the data to examine the social dynamics that determined who got a place on a lifeboat and who did not.</p>
<p>Social classes were heavily stratified in the early twentieth century. This was especially true on the Titanic, where the luxurious first-class areas were completely off limits to the middle-class passengers in second class, and especially to those who carried a third class “economy price” ticket. To get a view into the composition of each class, we can group data by class, and view the averages for each column:</p>
<script src="https://gist.github.com/triestpa/b939b78f9c6b37d82f91f72dc36b9185.js"></script>
<p>We can start drawing some interesting insights from this data. For instance, passengers in first class had a 62% chance of survival, compared to a 25.5% chance for those in 3rd class. Additionally, the lower classes generally consisted of younger people, and the ticket prices for first class were predictably much higher than those for second and third class. The average ticket price for first class (£87.5) is equivalent to $13,487 in 2016.</p>
<p>We can extend our statistical breakdown using the grouping function for both class and sex:</p>
<script src="https://gist.github.com/triestpa/7eebb009c3529d3cfb132bd495a8f6f6.js"></script>
<p>While the Titanic was sinking, the officers famously prioritized who was allowed in a lifeboat with the strict maritime tradition of evacuating women and children first. Our statistical results clearly reflect the first part of this policy as, across all classes, women were much more likely to survive than the men. We can also see that the women were younger than the men on average, were more likely to be traveling with family, and paid slightly more for their tickets.</p>
<p>The effectiveness of the second part of this “Women and children first” policy can be deduced by breaking down the survival rate by age.</p>
<script src="https://gist.github.com/triestpa/775c689998337c7afafa9fc7cfe2511c.js"></script>
<p>Here we can see that children were indeed the most likely age group to survive, although this percentage was still tragically below 60%.</p>
<h2 id="whymachinelearning">Why Machine Learning?</h2>
<p>With analysis, we can draw some fairly straightforward conclusions from this data — being a woman, being in 1st class, and being a child were all factors that could boost your chances of survival during this disaster.</p>
<p>Let’s say we wanted to write a program to predict whether a given passenger would survive the disaster. This could be done through an elaborate system of nested if-else statements with some sort of weighted scoring system, but such a program would be long, tedious to write, difficult to generalize, and would require extensive fine tuning.</p>
<p>This is where machine learning comes in: we will build a program that learns from the sample data to predict whether a given passenger would survive.</p>
<h2 id="preparingthedata">Preparing The Data</h2>
<p>Before we can feed our data set into a machine learning algorithm, we have to remove missing values and split it into training and test sets.</p>
<p>If we perform a count of each column, we will see that much of the data on certain fields is missing. Most machine learning algorithms will have a difficult time handling missing values, so we will need to make sure that each row has a value for each column.</p>
<script src="https://gist.github.com/triestpa/257c261111b03fede4e2580017a21727.js"></script>
<p>Most of the rows are missing values for “boat” and “cabin”, so we will remove these columns from the data frame. A large number of rows are also missing the “home.dest” field; here we fill the missing values with “NA”. A significant number of rows are also missing an age value. We have seen above that age could have a significant effect on survival chances, so we will have to drop all of rows that are missing an age value. When we run the count command again, we can see that all remaining columns now contain the same number of values.</p>
<p>Now we need to format the remaining data in a way that our machine learning algorithms will accept.</p>
<script src="https://gist.github.com/triestpa/e10aff22cb9c2945142735dc0d315723.js"></script>
<p>The “sex” and “embarked” fields are both string values that correspond to categories (i.e “Male” and “Female”) so we will run each through a preprocessor. This preprocessor will convert these strings into integer keys, making it easier for the classification algorithms to find patterns. For instance, “Female” and “Male” will be converted to 0 and 1 respectively. The “name”, “ticket”, and “home.dest” columns consist of non-categorical string values. These are difficult to use in a classification algorithm, so we will drop them from the data set.</p>
<script src="https://gist.github.com/triestpa/0bce50e357c6c5f19811b975c5f9b604.js"></script>
<p>Next, we separate the data set into two arrays: “X” containing all of the values for each row besides “survived”, and “y” containing only the “survived” value for that row. The classification algorithms will compare the attribute values of “X” to the corresponding values of “y” to detect patterns in how different attributes values tend to affect the survival of a passenger.</p>
<p>Finally, we break the “X” and “y” array into two parts each — a training set and a testing set. We will feed the training set into the classification algorithm to form a trained model. Once the model is formed, we will use it to classify the testing set, allowing us to determine the accuracy of the model. Here we have made a 20/80 split, such that 80% of the dataset will be used for training and 20% will be used for testing.</p>
<h2 id="classificationthefunpart">Classification – The Fun Part</h2>
<p>We will start off with a simple decision tree classifier. A decision tree examines one variable at a time and splits into one of two branches based on the result of that value, at which point it does the same for the next variable. A fantastic visual explanation of how decision trees work can be found here: <a href="http://www.r2d3.us/visual-intro-to-machine-learning-part-1/">http://www.r2d3.us/visual-intro-to-machine-learning-part-1/</a>.</p>
<p>This is what a trained decision tree for the Titanic dataset looks like if we set the maximum number of levels to 3:</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/decision-tree.png" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"></p>
<p>The tree first splits by sex, and then by class, since it has learned during the training phase that these are the two most important features for determining survival. The dark blue boxes indicate passengers who are likely to survive, and the dark orange boxes represent passengers who are almost certainly doomed. Interestingly, after splitting by class, the main deciding factor determining the survival of women is the ticket fare that they paid, while the deciding factor for men is their age (with children being much more likely to survive).</p>
<p>To create this tree, we first initialize an instance of an untrained decision tree classifier. (Here we will set the maximum depth of the tree to 10). Next we “fit” this classifier to our training set, enabling it to learn about how different factors affect the survivability of a passenger. Now that the decision tree is ready, we can “score” it using our test data to determine how accurate it is.</p>
<script src="https://gist.github.com/triestpa/5858dc07caab1e33af10178fd1f236d5.js"></script>
<p>The resulting reading, 0.7703, means that the model correctly predicted the survival of 77% of the test set. Not bad for our first model!</p>
<p>If you are being an attentive, skeptical reader (as you should be), you might be thinking that the accuracy of the model could vary depending on which rows were selected for the training and test sets. We will get around this problem by using a shuffle validator.</p>
<script src="https://gist.github.com/triestpa/e326db921a5400428aeb33130fb3152b.js"></script>
<p>This shuffle validator applies the same random 20:80 split as before, but this time it generates 20 unique permutations of this split. By passing this shuffle validator as a parameter to the “cross_val_score” function, we can score our classifier against each of the different splits, and compute the average accuracy and standard deviation from the results.</p>
<p>The result shows that our decision tree classifier has an overall accuracy of 77.34%, although it can go up to 80% and down to 75% depending on the training/test split. Using scikit-learn, we can easily test other machine learning algorithms using the exact same syntax.</p>
<script src="https://gist.github.com/triestpa/b6b3db3ac3424b664b59fbbf48d19859.js"></script>
<p>The “Random Forest” classification algorithm will create a multitude of (generally very poor) trees for the data set using different random subsets of the input variables, and will return whichever prediction was returned by the most trees. This helps to avoid “overfitting”, a problem that occurs when a model is so tightly fitted to arbitrary correlations in the training data that it performs poorly on test data.</p>
<p>The “Gradient Boosting” classifier will generate many weak, shallow prediction trees and will combine, or “boost”, them into a strong model. This model performs very well on our data set, but has the drawback of being relatively slow and difficult to optimize, as the model construction happens sequentially so it cannot be parallelized.</p>
<p>A “Voting” classifier can be used to apply multiple conceptually divergent classification models to the same data set and will return the majority vote from all of the classifiers. For instance, if the gradient boosting classifier predicts that a passenger will not survive, but the decision tree and random forest classifiers predict that they will live, the voting classifier will chose the latter.</p>
<p>This has been a very brief and non-technical overview of each technique, so I encourage you to learn more about the mathematical implementations of all of these algorithms to obtain a deeper understanding of their relative strengths and weaknesses. Many more classification algorithms are available “out-of-the-box” in scikit-learn and can be explored here: <a href="http://scikit-learn.org/stable/modules/ensemble">http://scikit-learn.org/stable/modules/ensemble</a>.</p>
<h2 id="computationalbrainsanintroductiontodeepneuralnetworks">Computational Brains — An Introduction to Deep Neural Networks</h2>
<p>Neural networks are a rapidly developing paradigm for information processing based loosely on how neurons in the brain processes information. A neural network consists of multiple layers of nodes, where each node performs a unit of computation and passes the result onto the next node. Multiple nodes can pass inputs to a single node and vice versa.</p>
<p>The neural network also contains a set of weights, which can be refined over time as the network learns from sample data. The weights are used to describe and refine the connection strengths between nodes. For instance, in our Titanic data set, node connections transmitting the passenger sex and class will likely be weighted very heavily, since these are important for determining the survival of a passenger.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/nueral-net.png" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"></p>
<p>A Deep Neural Network (DNN) is a neural network that works not just by passing data between nodes, but by passing data between layers of nodes. Each layer of nodes is able to aggregate and recombine the outputs from the previous layer, allowing the network to gradually piece together and make sense of unstructured data (such as an image). Such networks can also be heavily optimized due to their modular nature, allowing the operations of each node layer to be parallelized en masse across multiple CPUs and even GPUs.</p>
<p>We have barely begun to skim the surface of explaining neural networks. For a more in depth explanation of the inner workings of DNNs, this is a good resource: <a href="http://deeplearning4j.org/neuralnet-overview.html">http://deeplearning4j.org/neuralnet-overview.html</a>.</p>
<p>This awesome tool allows you to visualize and modify an active deep neural network: <a href="http://playground.tensorflow.org">http://playground.tensorflow.org</a>.</p>
<p>The major advantage of neural networks over traditional machine learning techniques is their ability to find patterns in unstructured data (such as images or natural language). Training a deep neural network on the Titanic data set is total overkill, but it’s a cool technology to work with, so we’re going to do it anyway.</p>
<p>An emerging powerhouse in programing neural networks is an open source library from Google called TensorFlow. This library is the foundation for many of the most recent advances in machine learning, such as being used to train computer programs to create unique works of music and visual art (<a href="https://magenta.tensorflow.org/welcome-to-magenta">https://magenta.tensorflow.org/welcome-to-magenta</a>). The syntax for using TensorFlow is somewhat abstract, but there is a wrappercalled “skflow” in the TensorFlow package that allows us to build deep neural networks using the now-familiar scikit-learn syntax.</p>
<script src="https://gist.github.com/triestpa/57eb4118ddff8350bef2f59b03a971e9.js"></script>
<p>Above, we have written the code to build a deep neural network classifier. The “hidden units” of the classifier represent the neural layers we described earlier, with the corresponding numbers representing the size of each layer.</p>
<script src="https://gist.github.com/triestpa/9db250404adde1fbf9a12232e38fa7fc.js"></script>
<p>We can also define our own training model to pass to the TensorFlow estimator function (as seen above). Our defined model is very basic. For more advanced examples of how to work within this syntax, see the skflow documentation here: <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/skflow">https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/skflow</a>.</p>
<p>Despite the increased power and lengthier runtime of these neural network models, you will notice that the accuracy is still about the same as what we achieved using more traditional tree-based methods. The main advantage of neural networks — unsupervised learning of unstructured data — doesn’t necessarily lend itself well to our Titanic dataset, so this is not too surprising.</p>
<p>I still, however, think that running the passenger data of a 104-year-old shipwreck through a cutting-edge deep neural network is pretty cool.</p>
<h2 id="thesearenotjustdatapointstheyrepeople">These Are Not Just Data Points. They’re People.</h2>
<p>Given that the accuracy for all of our models is maxing out around 80%, it will be interesting to look at specific passengers for whom these classification algorithms are incorrect.</p>
<script src="https://gist.github.com/triestpa/0ccb42341d6f45a3acd456360d527b14.js"></script>
<p>The above code forms a test data set of the first 20 listed passengers for each class, and trains a deep neural network against the remaining data.</p>
<p>Once the model is trained we can use it to predict the survival of passengers in the test data set, and compare these to the known survival of each passenger using the original dataset.</p>
<script src="https://gist.github.com/triestpa/567a25e1cfae530e4cbfd095c6bd085c.js"></script>
<p>The above table shows all of the passengers in our test data set whose survival (or lack thereof) was incorrectly classified by the neural network model.</p>
<p>Sometimes when you are dealing the data sets like this, the human side of the story can get lost beneath the complicated math and statistical analysis. By examining passengers for whom our classification model was incorrect, we can begin to uncover some of the most fascinating, and sometimes tragic, stories of humans defying the odds.</p>
<p>For instance, the first three incorrectly classified passengers are all members of the Allison family, who perished even though the model predicted that they would survive. These first class passengers were very wealthy, as can be evidenced by their far-above-average ticket prices. For Betsy (25) and Loraine (2) in particular, not surviving is very surprising, considering that we found earlier that over 96% of first class women lived through the disaster.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/family.jpg" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"><small style="text-align: center; display: block;" markdown="1">From left to right: Hudson (30), Bess (25), Trevor (11 months), and Loraine Allison (2).</small></p>
<p>So what happened? A surprising amount of information on each Titanic passenger is available online; it turns out that the Allison family was unable to find their youngest son Trevor and was unwilling to evacuate the ship without him. Tragically, Trevor was already safe in a lifeboat with his nurse and was the only member of the Allison family to survive the sinking.</p>
<p>Another interesting misclassification is John Jacob Astor, who perished in the disaster even though the model predicted he would survive. Astor was the wealthiest person on the Titanic, an impressive feat on a ship full of multimillionaire industrialists, railroad tycoons, and aristocrats. Given his immense wealth and influence, which the model may have deduced from his ticket fare (valued at over $35,000 in 2016), it seems likely that he would have been among of the 35% of men in first class to survive. However, this was not the case: although his pregnant wife survived, John Jacob Astor’s body was recovered a week later, along with a gold watch, a diamond ring with three stones, and no less than $92,481 (2016 value) in cash.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/astor.jpg" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"><small style="text-align: center; display: block;" markdown="1">John Jacob Astor IV</small></p>
<p>On the other end of the spectrum is Olaus Jorgensen Abelseth, a 25-year-old Norwegian sailor. Abelseth, as a man in 3rd class, was not expected to survive by our classifier. Once the ship sank, however, he was able to stay alive by swimming for 20 minutes in the frigid North Atlantic water before joining other survivors on a waterlogged collapsible boat and rowing through the night. Abelseth got married three years later, settled down as a farmer in North Dakota, had 4 kids, and died in 1980 at the age of 94.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/titanic-ml/olas.jpg" alt="Would You Survive the Titanic? A Guide to Machine Learning in Python"><small style="text-align: center; display: block;" markdown="1">Olaus Jorgensen Abelseth</small></p>
<p>Initially I was disappointed by the accuracy of our machine learning models maxing out at about 80% for this data set. It’s easy to forget that these data points each represent real people, each of whom found themselves stuck on a sinking ship without enough lifeboats. When we looked into data points for which our model was wrong, we can uncover incredible stories of human nature driving people to defy their logical fate. It is important to never lose sight of the human element when analyzing this type of data set. This principle will be especially important going forward, as machine learning is increasingly applied to human data sets by organizations such as insurance companies, big banks, and law enforcement agencies.</p>
<h2 id="whatnext">What next?</h2>
<p>So there you have it — a primer for data analysis and machine learning in Python. From here, you can fine-tune the machine learning algorithms to achieve better accuracy on this data set, design your own neural networks using TensorFlow, discover more fascinating stories of passengers whose survival does not match the model, and apply all of these techniques to any other data set. (Check out this Game of Thrones dataset: <a href="https://www.kaggle.com/mylesoneill/game-of-thrones">https://www.kaggle.com/mylesoneill/game-of-thrones</a>). When it comes to machine learning, the possibilities are endless and the opportunities are titanic.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology]]></title><description><![CDATA[Monitoring air pollution throughout Delhi using low-cost IoT devices attached to to auto-rickshaws.]]></description><link>http://blog.patricktriest.com/how-we-built-our-iot-devices-to-track-air-pollution-in-delhi/</link><guid isPermaLink="false">598eaf94b7d6af1a6a795fd1</guid><category><![CDATA[Internet Of Things]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Fri, 29 Apr 2016 15:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/delhi.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><blockquote>
<img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/delhi.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><p><em>This blog post is a synthesis of two articles I originally wrote for the SocialCops engineering blog.  Check them out here - <a href="https://blog.socialcops.com/engineering/tracking-air-pollution-in-delhi/">Tracking Air Pollution in Delhi, One Auto Rickshaw at a Time</a></em>,   <a href="http://blog.socialcops.com/open-data/built-iot-devices-track-air-pollution-delhi">How We Built Our IoT Devices to Track Air Pollution in Delhi<br>
</a>*</p>
</blockquote>
<h2 id="themostpollutedcityonearth">The Most Polluted City On Earth</h2>
<p>Delhi is one of the largest and fastest growing cities on the planet. Home to over 24 million people, one of the unifying challenges for this huge and diverse population is also one of the most basic necessities for life: clean air. Delhi was recently granted the dubious title of the <a href="http://indiatoday.intoday.in/story/delhi-on-top-of-air-pollution-list-according-to-who-report/1/650633.html">“World’s Most Polluted City”</a> by the World Health Organization.</p>
<p>Air pollution has major health consequences for those living in heavily polluted areas. High levels of particulates and hazardous gases can increase the risk of heart disease, asthma, bronchitis, cancer, and more. The WHO estimates that there are <a href="http://www.who.int/mediacentre/news/releases/2014/air-pollution/en/">7 million</a> premature deaths linked to air pollution every year, and the Union Environment Ministry estimates that <a href="http://www.ndtv.com/delhi-news/80-people-die-in-delhi-everyday-from-air-pollution-parliament-is-told-784541">80 people die every day</a> from air pollution in Delhi.</p>
<h2 id="howitstarted">How it Started</h2>
<p>The first week I joined SocialCops, I was given a “hack week” project to present to the rest of the company. My project was simple and open-ended: “build something cool”. As a newcomer to Delhi, I was concerned by the infamous air pollution continually hovering over the city. I decided to build two IoT air pollution sensing devices (one for our office balcony and one for inside our office) to determine how protected we were from the pollution while inside. Over the following weeks, this simple internal hack turned into a much more ambitious project — monitoring air pollution throughout Delhi by attaching these IoT devices to auto rickshaws.</p>
<h2 id="airpollutionsensorsautorickshaws">Air Pollution Sensors + Auto Rickshaws</h2>
<p>Traditional particulate matter measurement devices use very advanced scales and filters to measure the exact mass of ambient particles below a certain size. As such, these devices are prohibitively expensive (<a href="https://www.google.co.in/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cad=rja&amp;uact=8&amp;ved=0ahUKEwjD1Oevu7PMAhWEFZQKHTgXD_4QFggbMAA&amp;url=http%3A%2F%2Fwww.cpcb.nic.in%2Fone.ppt&amp;usg=AFQjCNHx3EQEO105gS3m4Nb4QxDfU4dw9w&amp;bvm=bv.120853415,d.dGo">₹1.1 crore</a> or $165,000) and fixed in a single location.</p>
<p>We took a different approach.</p>
<p><img src="https://storage.googleapis.com/cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image01.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center;">Autorickshaws at the Saket Metro Station</small></p>
<p>Auto rickshaws are a very popular source of transportation in Delhi. Popularly called “autos”, these vehicles can be found all over Delhi at all times of day and night, making them an ideal place to deploy our sensors. Unlike traditional air quality readings that sample from one location repeatedly, the sensors deployed for this project sample data for air pollution in Delhi directly from traffic jams, markets, and residential neighborhoods all over the city.</p>
<h2 id="theinternetofairpollutionmonitoringthings">The Internet of (Air Pollution Monitoring) Things</h2>
<p>We have developed a custom internet-connected device to attach to take pollution readings from autos. Each device contains an airborne particle sensor, a GPS unit, and a cellular antenna to send the data over 2g networks in real time; we were able to construct each device for about ₹6,500 ($100) each. The greater mobility and reduced cost of these devices comes at a cost: the particle sensor we are using is less accurate than those used by traditional pollution monitors. The reason is that our sensor determines the number of airborne particles by reading the ambient air opaqueness, instead of measuring the precise mass of the collected particles.</p>
<p>Our solution for this drop in precision is to increase the sample size. A lot.</p>
<p>Each device takes two readings per minute. With five devices deployed, the pollution reading for each hour is an average of 600 data points, and the AQI for each day is calculated from almost 15,000 distinct readings. The up-time for each device is not 100%, as the auto-rickshaw drivers generally drive for 12 hours per day, and we are not always able to transfer the device to another auto driver between shifts. However, the resulting data has still proven sufficient for our experimental purposes.</p>
<p><img src="https://storage.googleapis.com/cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image00.png" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center;">SocialCops Delhi Air Pollution Dashboard</small></p>
<h2 id="harddecisionsoverhardware">Hard Decisions Over Hardware</h2>
<p>We chose the hardware for this pilot project based on ease of construction, cost, and hackability. There were a few decisions to confront first: what data to collect, how to collect it, and how to retrieve it for further analysis. We decided to focus on collecting data for particulate matter (PM) concentration since this is the primary pollutant in Delhi. There are other factors that influence air pollution, such as humidity, but we can pull most of this data from established external sources and integrate it with our primary data.</p>
<p>To track devices and analyze data in real time, we decided to send data via 2G networks using a GPRS (General Packet Radio Service) antenna and SIM card. We decided to go with the LinkIt One development board for the initial deployment, due to its compatibility with the Arduino IDE and firmware libraries and its included GPS and GPRS antennas. (We originally intended to also use its included battery but decided not to, as explained later in this post.)</p>
<p>For the pollution-sensing module, we decided to use the Shinyei PPD42NS because of its low cost and general dependability and durability. This sensor measures the ambient air opacity using an LED, lens, and photodiode. The attached microcontroller can read this opacity and calculate the number of particles per .01 cubic feet by reading the LPO (Low Pulse Occupancy) outputted by the sensor.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image02.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center">The prototype hardware, mid-assembly</small></p>
<p>This sensor module is relatively imprecise, as are all dust sensors in this same price range. That’s why one of our main goals with this experiment is to determine whether aggregating a large enough number of readings from this type of sensor can provide pollution data that is as (or more) accurate than the data from traditional air pollution stations. The stations cost ₹1.1 crore ($165,000) each. The final cost of our hardware package, including the case and external power bank, was about ₹6,500 ($100).</p>
<h2 id="firmingupthefirmware">Firming up the Firmware</h2>
<p>The firmware for this experiment was written in C/C++ using the Arduino IDE, which the LinkIt One is designed to be directly compatible with. The basic flow of the firmware is simple: sample data from the pollution sensor for 30 seconds, calculate the particles per .01 cubic foot from these values, retrieve the GPS location, upload the data to the server, and repeat.</p>
<p>One scenario we needed to prepare for was the IoT device being temporarily unable to connect to the cellular data network. To account for this, we implemented a caching system — if the device can’t connect to the server, it logs current readings to its local 10 MB of flash memory. Every 30 seconds, the device takes a reading and attempts to upload that data to the server. If it is able to connect, then it also scans the local filesystem for any cached data; if there is locally stored data, it is uploaded and deleted from the local storage. This feature allows the device to operate virtually anywhere, regardless of connectivity, and also provides interesting insights into cellular dark zones in Delhi.</p>
<p>With the firmware functionality complete, one of the immediate problems was that the device would occasionally crash while uploading readings. Unlike debugging a more traditional software program, there was no stack-trace to look through to find the cause. The only clue we had was that the light on the microcontroller would turn from green to red, and the device would stop uploading data. Like most microcontrollers, the LinkIt One has very limited SRAM, leading us to suspect that the crash may be due an out-of-memory error. The first optimization we made was using the F() macro to cache constant strings in flash memory during compilation instead of storing them in virtual memory at runtime. This optimization simply required replacing, for example, Serial.println(“string”) with Serial.println(F(“string”)).</p>
<p>This optimization was a good practice, but it still did not solve the device crashes. To further optimize the memory usage, we focused on the largest variable being stored in virtual memory during runtime: the JSON string storing the data to be uploaded to the server. By allocating memory more carefully for this string and ensuring that only one instance of this string existed in SRAM at any time, we were able to close any remaining memory leaks and solve the problem of unpredictable device crashes.</p>
<h2 id="backendandsecurity">Backend and Security</h2>
<p>To store the data being uploaded by each IoT device, we built a backend and API using Node.js, Express, and MongoDB. Beyond standard server-side security protocols (cryptographically secure login, restrictive firewalls, rate limiting, etc.), it was also important to build endpoint-based authentication and field verification for incoming device data.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image01.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center">Autorickshaws at the Saket Metro station.</small></p>
<p>A unique cryptographically secure key value is hard coded into the firmware for each IoT device. When a reading is uploaded to the server, this key value is included along with the device ID and authenticated against a corresponding key value on the server for that device. This technique of hard coding a password directly into the firmware provides a high degree of security that ensures our database is not corrupted with unauthorized data.</p>
<h2 id="powerproblems">Power Problems</h2>
<p>One of our major obstacles was figuring out how to provide enough power for the device to ensure that it would transmit data 24/7. Since the device uses GPS and GPRS antennas and the PPD42NS dust sensor requires constant back-to-back readings to compute an accurate value, the device is power hungry for a microcontroller-based project.</p>
<p>The LinkIt One comes with a rechargeable 1000 mAh battery. In our testing, this battery was only able to power the device for about three hours of operation. Even worse, the battery often had difficulty re-charging or holding a charge once it had been fully discharged, leading us to believe that the battery was becoming over-discharged and damaged by being run for too long. Further compounding this problem, the LinkIt One does not have the capability to programmatically trigger a shutdown, making it impossible to prevent this damage from occurring.</p>
<p>Having discarded the included battery as an option, we began testing the IoT device using low-cost mobile power banks designed for smartphones. These power banks generally come in capacities between 3000 mAh and 15,000 mAh (1-5x an average smartphone battery) and can be purchased for under $20 each. One battery we tested included a solar panel for recharging, but unfortunately, the solar panel wasn’t able to recharge the battery quickly enough. We ended up settling on a reputable 10,000 mAh rechargeable battery, which can run the device for 33 hours straight.</p>
<h2 id="brawnoverbeauty">Brawn Over Beauty</h2>
<p>Delhi is not a forgiving environment for hardware devices, especially when they are mounted in auto rickshaws in the height of summer when the temperatures regularly top 40℃ (104℉) or 45℃ (113℉) during heat waves. During our initial prototyping phase, the devices were encased within cardboard boxes, which made it very easy to quickly adjust the component placement and encasing structure (wire hole locations, etc).</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image03.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center">One of the devices after being deployed for a week.</small></p>
<p>One order of 100 x 100 x 50 mm enclosure boxes and a power drill later, we assembled our new pilot encasings. Protected by white hard plastic with the dust sensor component mounted externally for unrestricted access to ambient air, the device is not pretty. It is, however, durable enough to survive extremely demanding operating conditions, and looks rough enough to reduce the risk of theft that comes associated with shiny new electronics.</p>
<h2 id="fromthelabtothestreets">From the Lab to the Streets</h2>
<p>Beyond the core technology, one of the most difficult aspects of this pilot project was the logistical challenge of actually deploying these devices on auto rickshaws. Our priority for the deployment was twofold — to keep the device hardware safe and to have the IoT devices transmitting data 24/7 for the duration of the experiment.</p>
<p>Recruiting drivers was easy. We simply went down to the nearby metro station and asked the auto drivers if they would be interested in participating in the pilot project. By paying a modest daily and weekly stipend, we are giving the drivers a supplemental source of income and financial incentive to keep the device safe. Since the battery only lasts for 33 hours, the drivers have to return to our office every day to collect their pay and to swap out their battery for a fully charged one.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/image00.jpg" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"><small style="display: block; text-align: center">The SocialCops team preparing the first device for deployment.</small></p>
<p>To keep track of the devices and the incoming data, we built an administrative dashboard. This dashboard enables us to view incoming data, as well as a live map of device locations. Since we have drivers’ phone numbers, we can contact drivers if we see a device malfunctioning on the dashboard.</p>
<h2 id="movingforwardagainstairpollutionindelhi">Moving Forward Against Air Pollution in Delhi</h2>
<p>This experiment could be deployed at a greater scale in the future, with 100 sensors deployed across a city, for example. The base cost of the required hardware would still be considerably less than acquiring the traditional equipment used to measure air pollution. With 100 sensors deployed, the average pollution level for each day could be calculated from over 250,000 individual readings, and the sensors could be deployed to a wider area of the city.</p>
<p>For the next iteration of these pollution sensors, we already have a few concrete goals in mind. Our first priority is creating a sustainable and scalable system for powering and managing the IoT devices while they are deployed. Replacing five batteries per day is manageable for this deployment, but we want to scale up to having 100+ devices on the streets at the same time. One option is powering the devices from the autos themselves, the same way you could charge a cellphone in a car. The complication here is that auto rickshaws are generally turned off while the drivers are parked and waiting for customers. We would still need to include a power bank that (unlike the current power solution) supports pass-through charging to power the device and charge the battery simultaneously.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/iot-delhi-pollution/map2.png" alt="Tracking Air-Pollution In Delhi Using Low-Cost IoT Technology"></p>
<p>Another goal is to perform further research on alternative carriers for the devices like buses and cabs. Minimizing the recurring costs (the drivers’ stipends) will be essential in forming a sustainable system for deploying these devices. Finally, it would be best to move away from DIY-focused prototyping hardware such as the LinkIt One, and to form partnerships with hardware suppliers to produce and encase each device scalably with more standardization and lower variable costs.</p>
<p>New technological capabilities enabled by the Internet of Things have the potential to transform how we assess and alleviate some of the world’s most pressing problems. By further developing and refining these low-cost air pollution sensors, we hope to establish a sustainable model for collecting relevant and detailed public health data in hard-to-reach areas.</p>
</div>]]></content:encoded></item><item><title><![CDATA[The Internet of Coffee:  Building a Wifi Coffee Maker]]></title><description><![CDATA[IoT technology can be utilized for some very important causes such as  energy conservation, identity protection, urban monitoring, and…making coffee.]]></description><link>http://blog.patricktriest.com/the-internet-of-coffee-building-a-wifi-coffee-maker/</link><guid isPermaLink="false">598eaf94b7d6af1a6a795fd0</guid><category><![CDATA[Internet Of Things]]></category><category><![CDATA[Guides]]></category><dc:creator><![CDATA[Patrick Triest]]></dc:creator><pubDate>Tue, 15 Mar 2016 15:00:00 GMT</pubDate><media:content url="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/coffee.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><blockquote>
<img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/coffee.jpg" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"><p><em>I originally wrote this post for the SocialCops engineering blog,<br>
check it out here - <a href="https://blog.socialcops.com/engineering/our-experience-building-wifi-coffee-maker/">The Internet of Coffee:  Building a Wifi Coffee Maker</a></em></p>
</blockquote>
<p>Here at SocialCops, we’ve begun using Internet-of-Things technology to research solutions for some of the world’s most pressing problems: monitoring pollution, conserving energy, protecting against identity fraud, and…making coffee.</p>
<p>“[If we could] turn on the coffee machine via slack, we’d be set,” Christine wistfully wrote one afternoon on our #coffee Slack channel. Being located in Delhi, good chai is readily available to us day and night, but sometimes a good cup of coffee in the morning (and afternoon, and evening) is required to keep us going. Starting a fresh brew during our commutes and having a full pot waiting at the office once we arrive sounds like a fantastical dream, but here at SocialCops we thrive on turning these types of dreams into reality.</p>
<h2 id="settingupourwificoffeemachine">Setting Up Our WiFi Coffee Machine</h2>
<p>The basic hardware setup was simple: we got a wifi-enabled Particle Photon micro-controller and hooked it up to a 5v electrical relay to control the power supply to the coffee machine.</p>
<p><em>Warning: this project deals with dangerous high-voltage electrical currents, so NEVER try to follow these steps if any of the components are plugged in, or if you are unsure of what you are doing.</em></p>
<p>First we cut an extension cord in half, connected the “hot wire” from each half to the relay, and spliced the ground and neutral wires from the two sides back together.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/image04.jpg" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"><small style="display:block;text-align:center;">Relay Wiring.</small></p>
<p>Then we hooked our microcontroller up to the relay, using the D1 pin on the Photon to communicate with the IN1 port on the relay, sending 5v power from the VIN pin of the Photon to the VCC port on the relay, and connecting the GND on the relay to GND on the Photon in order to complete the circuit.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/image02.jpg" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"><small style="display:block;text-align:center;">Relay Wiring.</small></p>
<p>Finally we packaged the Photon and relay together into a small plastic enclosure, and cut holes for power cords into the sides of the container. By plugging one end of the extension cord into the wall, and the other side into the coffee machine, we were now able to control power to the coffee machine by sending HTTP requests over WiFi to the Particle Photon microcontroller!</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/image01.jpg" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"></p>
<p>There were a few extra wires in this setup, resulting in too much clutter on our kitchen counter, so we packaged the whole setup into a larger cardboard box. Taking advantage of the extra area on top of the box, we gave CoffeeBot a face, complete with LEDs for eyes to monitor the hardware status.</p>
<p><img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/image00.jpg" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"></p>
<h2 id="entercoffeebot">Enter CoffeeBot</h2>
<p>Our team uses Slack for online communication; one of Slack’s many features is the ability to create Slackbots, or automated users that can respond to chat messages and carry out advanced tasks.</p>
<p>Using BotKit, a NodeJS module for programming Slackbot behaviors, we built CoffeeBot. CoffeeBot is a capable of receiving naturally worded messages on Slack, deciphering them, and sending the corresponding commands to the coffee machine.</p>
<p>For instance:<br>
<img src="https://cdn.patricktriest.com/blog/images/posts/wifi-coffee/image03.png" alt="The Internet of Coffee:  Building a Wifi Coffee Maker"></p>
<p>Other features of CoffeeBot include:</p>
<ul>
<li>Getting the status of the machine and how long ago the last brew was started</li>
<li>Posting to Slack when the coffee’s ready</li>
<li>Automatically switching off the coffee machine after an hour, in order to save energy.</li>
<li>And a few other secret behaviors</li>
</ul>
<h2 id="caffeinatedcontinuance">Caffeinated Continuance</h2>
<p>Now every morning before leaving for work, we can start the coffee machine directly from Slack! In the future we want to add more advanced features, such as automatically loading coffee grounds and water, and measuring how many cups of coffee the office consumes per week. Someday we might even have a robot to deliver the fresh coffee directly to our desks, which admittedly sounds like a crazy and fantastical dream right now.</p>
<p>But you know how we feel about fantastical dreams.</p>
</div>]]></content:encoded></item></channel></rss>