<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Andrew Nisbet - Web Developer</title>
    <description>Data crawling, analytics, and visualisation.</description>
    <link>http://localhost:4000/</link>
    <atom:link href="http://localhost:4000/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 15 Jan 2017 10:10:09 -0800</pubDate>
    <lastBuildDate>Sun, 15 Jan 2017 10:10:09 -0800</lastBuildDate>
    <generator>Jekyll v3.3.0</generator>
    
      <item>
        <title>Choosing a Suburb by Bike Commute Time</title>
        <description>&lt;p&gt;I recently moved to a new city, and I'm looking to live close enough to work that I can bike. Although I can eyeball on a map the straight-line commute distance for a potential house, often the actual time you'd spend biking it is quite different. Dedicated bike paths are super fast to ride, bypassing stop signs and traffic lights. While airports and highways can result in time-consuming detours.&lt;/p&gt;

&lt;p&gt;So I put together this map overlaying contours of equal commute time onto the streets and suburbs around Mountain View.&lt;/p&gt;
&lt;!-- break --&gt;





&lt;div class=&quot;wide-container&quot;&gt;
	&lt;div id=&quot;map-contour&quot; class=&quot;map&quot; style=&quot;height: 600px;&quot;&gt;&lt;/div&gt;
    &lt;div class=&quot;points-legend&quot; style=&quot;width: 500px; margin-left: auto; margin-right: auto;&quot;&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#FF18A7&quot;&gt;━&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;15 minutes travel time&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#FF18A7&quot;&gt;━&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;30&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#FF18A7&quot;&gt;━&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;45&lt;/span&gt;&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There's a few neat things that stand out:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;I can live further away if I go south or northwest. This is probably because of bike-friendly routes: Steven's Creek bike path heads directly south from work, and there's a couple of bike routes from Mountain View to Palo Alto with barriers to cars.&lt;/li&gt;
	&lt;li&gt;It takes 15 minutes to bike to downtown Mountain View, even though it's less than 2km in a straight line. Within that distance there are two major highways as well as train tracks for both Caltrain and VTA, all with limited crossings for cyclists.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code for this project is on &lt;a href=&quot;https://github.com/ajnisbet/biking-to-work&quot;&gt;GitHub&lt;/a&gt;. My approach was to build a grid of points on the roads around work, compute the bike time for each one, and take contours of the resulting data. &lt;/p&gt;


&lt;h2&gt;Map Data&lt;/h2&gt;

&lt;p&gt;I needed two different geographical data sources:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;Directions data, for querying the real travel time between two points.&lt;/li&gt;
    &lt;li&gt;Map data, for finding valid road points to query.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Google has an API for this sort of thing, but the number of requests is limited and you don't have any control over how the travel times are calculated.&lt;/p&gt;

&lt;p&gt;Instead I used &lt;a href=&quot;https://github.com/Project-OSRM/osrm-backend&quot;&gt;Project OSRM&lt;/a&gt;, an open source routing project that uses &lt;a href=&quot;https://www.openstreetmap.org/about&quot;&gt;OpenStreetMap&lt;/a&gt; data to give travel directions. I pulled all my road bike journeys from Strava and used the average speed for the bike travel times. I also assigned a large penalty for intersections, as I've found that many traffic lights around here don't recognise bikes and have long cycle times.&lt;/p&gt;

&lt;h2&gt;Building a Grid&lt;/h2&gt;

&lt;p&gt;I used a square grid of uniformly spaced latitude and longitude values. Because of the Earth's not-quite-spherical shape, a uniform grid results in two errors (which unhelpfully work in the same direction so don't cancel out):&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Longitude values are closer together at higher latitudes.&lt;/li&gt;
	&lt;li&gt;The Earth's radius gets smaller at high latitudes, making longitude values even closer together.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fortunately, the difference isn't much for this project, it works out to about 0.5% over a 40km grid.&lt;/p&gt;

&lt;p&gt;A bigger problem is that when you pick a point randomly on a map, there's a good chance it won't be somewhere you can bike to. Silicon Valley is full of stuff like buildings, airports, and ponds. So I fed the grid into OSRM and replaced each point with the nearest valid biking location. This mostly worked, though I had to manually tidy up lots of maintenance tracks around the shoreline that are incorrectly tagged as public access. My next project is to figure out how to contribute to OpenStreetMap.&lt;/p&gt;

&lt;p&gt;Shifting my grid like this loses the uniformity. Large forbidden areas like the NASA airbase result in heaps of points getting redistributed just outside the boundaries with much higher density than the rest of the grid. This isn't really an issue, but mostly to keep the plots clean I pruned the resulting grid by dropping any points that were too close to another point:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;for point in points
    for neighbour in points
        if euclidian_distance(point, neighbour) &lt; threshold
            remove neighbour from points&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here's the development of the grid in an area with some forest and a lake. There's a hikers-only trail in the bottom-left that OSRM correctly avoids snapping to, with the rest of the roads and tracks evenly covered by points.&lt;/p&gt;

&lt;div class=&quot;wide-container&quot;&gt;
    &lt;div class=&quot;img-third&quot;&gt;
        &lt;img src=&quot;/img/map-grid.png&quot; alt=&quot;&quot;&gt;
        &lt;div class=&quot;caption&quot;&gt;Uniform grid.&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;img-third&quot;&gt;
        &lt;img src=&quot;/img/map-snapped.png&quot; alt=&quot;&quot;&gt;
        &lt;div class=&quot;caption&quot;&gt;Snapped points.&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;img-third&quot;&gt;
        &lt;img src=&quot;/img/map-declustered.png&quot; alt=&quot;&quot;&gt;
        &lt;div class=&quot;caption&quot;&gt;After pruning.&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h2&gt;Travel Times&lt;/h2&gt;
&lt;p&gt;I fed the evenly-spaced grid into OSRM to find the travel times: &lt;/p&gt;

&lt;div class=&quot;wide-container&quot;&gt;
    &lt;div id=&quot;map-points&quot; class=&quot;map&quot; style=&quot;height: 600px;&quot;&gt;&lt;/div&gt;
    &lt;div class=&quot;points-legend&quot;&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#006837&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;Up to 5 minutes travel time&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#1a9850&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;10&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#66bd63&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;15&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#a6d96a&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;20&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#d9ef8b&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;25&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#fee08b&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;30&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#fdae61&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;35&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#f46d43&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;40&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#d73027&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;45&lt;/span&gt;&lt;/div&gt;
        &lt;div class=&quot;legend-entry&quot;&gt;&lt;span class=&quot;legend-color&quot; style=&quot;color:#a50026&quot;&gt;●&lt;/span&gt; &lt;span class=&quot;legend-unit&quot;&gt;50&lt;/span&gt;&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h2&gt;Contouring&lt;/h2&gt;

&lt;p&gt;The resulting contours are overly detailed, and have weird varying scale thanks to the non-square grid and areas without points. I tidied up the lines by linearly interpolating back to a square grid, and applying a Gaussian blur to the travel times.&lt;/p&gt;

&lt;div class=&quot;wide-container&quot;&gt;
    &lt;div class=&quot;img-half&quot;&gt;
        &lt;img src=&quot;/img/contour-interp.png&quot; alt=&quot;&quot;&gt;
        &lt;div class=&quot;caption&quot;&gt;Interpolated contours.&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class=&quot;img-half&quot;&gt;
        &lt;img src=&quot;/img/contour-smooth.png&quot; alt=&quot;&quot;&gt;
        &lt;div class=&quot;caption&quot;&gt;After smoothing.&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h2&gt;Stray Observations&lt;/h2&gt;
&lt;ul&gt;
    &lt;li&gt;OpenStreetMap is a fantastic project. I can't think of any open dataset that comes close in terms of size or quality.&lt;/li&gt;
    &lt;li&gt;OSRM is pretty neat too. It calculated 1000s of directions per second. Over HTTP, in a Docker container, in a VirtualBox machine, on an old laptop. It would be interesting to do a comparison with the Google Directions API to see how different the travel times are.&lt;/li&gt;
&lt;/ul&gt;

&lt;/article&gt;



&lt;script&gt;
mapboxgl.accessToken = 'pk.eyJ1IjoiYWpuaXNiZXQiLCJhIjoicFQ5RHp2NCJ9.4J7HVDpTxQ5dKxLEfCIKiA';
var mapContour = new mapboxgl.Map({
    container: 'map-contour',
    style: 'mapbox://styles/ajnisbet/citqj5lp600002imumhsavzqn',
    center: [-122.055410, 37.38],
    zoom: 10.5,
});
var mapPoints = new mapboxgl.Map({
    container: 'map-points',
    style: 'mapbox://styles/ajnisbet/citupvt0t002v2hlb20q6rozd',
    center: [-122.055410, 37.38],
    zoom: 10.5,
});


function addSuitcaseAndScale(map) {
    map.addControl(new mapboxgl.Scale({position: 'bottom-right'}));
    map.on('load', function () {
        map.addSource(&quot;points&quot;, {
            &quot;type&quot;: &quot;geojson&quot;,
            &quot;data&quot;: {
                &quot;type&quot;: &quot;FeatureCollection&quot;,
                &quot;features&quot;: [{
                    &quot;type&quot;: &quot;Feature&quot;,
                    &quot;geometry&quot;: {
                        &quot;type&quot;: &quot;Point&quot;,
                        &quot;coordinates&quot;: [-122.055210, 37.387582]
                    },
                    &quot;properties&quot;: {
                        &quot;icon&quot;: &quot;suitcase&quot;
                    }
                }]
            }
        });

        map.addLayer({
            &quot;id&quot;: &quot;points&quot;,
            &quot;type&quot;: &quot;symbol&quot;,
            &quot;source&quot;: &quot;points&quot;,
            &quot;layout&quot;: {
                &quot;icon-image&quot;: &quot;{icon}-15&quot;
            }
        });
    });
    
};

addSuitcaseAndScale(mapContour);
addSuitcaseAndScale(mapPoints);
&lt;/script&gt;</description>
        <pubDate>Mon, 03 Oct 2016 00:00:00 -0700</pubDate>
        <link>http://localhost:4000/blog/bike-radius</link>
        <guid isPermaLink="true">http://localhost:4000/blog/bike-radius</guid>
        
        
        <category>meta</category>
        
      </item>
    
      <item>
        <title>Modelling Animal Home Ranges</title>
        <description>&lt;p&gt;The goal of this project was to develop a mathematical model for animal behaviour that would result in realistic and stable home ranges. In this writeup I’ll explain my approach and the different models I tried.&lt;/p&gt;

&lt;!-- break --&gt;

&lt;p&gt;The code for the project is in &lt;a href=&quot;https://github.com/ajnisbet/home-range-modelling&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;

&lt;p&gt;The home range of a herd of animals is simply the area where the animals regularly spend their time. Home ranges cover hunting grounds, nesting sites and breeding sites. This information is useful for researchers in understanding the group movement of animals.&lt;/p&gt;

&lt;p&gt;My aim was to develop a model of individual animal movement that would result in a home range emerging.&lt;/p&gt;

&lt;p&gt;A random walk model was used for its proven accuracy in simulating animal movement. The direction of the movement would be biased be the environment and presence of other animals.&lt;/p&gt;

&lt;h2 id=&quot;defining-a-home-range&quot;&gt;Defining a Home Range&lt;/h2&gt;

&lt;p&gt;Before home range models can be evaluated, a definition for a home range is needed.&lt;/p&gt;

&lt;p&gt;Data on real animals is typically collected in the form of distinct location coordinates at given times.  The models developed for this project produce the same type of data: animal locations in time and space. In defining a home range the time data was discarded, leaving a set of coordinates.&lt;/p&gt;

&lt;p&gt;Two main methods exist for relating this spatial data to a well-defined home range shape: minimum convex polygon fitting, and kernel density estimation.&lt;/p&gt;

&lt;h3 id=&quot;minimum-convex-polygon&quot;&gt;Minimum Convex Polygon&lt;/h3&gt;

&lt;p&gt;The simplest method is to construct the smallest possible convex polygon (shown in black) around the locations visited by the animal (shown in blue):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/long_walk_hull.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is a well-defined method that doesn’t require subjectively choosing parameters. However, because the home range includes all locations, it tends to be larger than necessary. This effect is particularly noticeable if the animal ever wanders (perhaps before forming a stable home range):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/long_walk_hull_wander.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Wandering can be excluded by ignoring initial or uncommon points, but this introduces additional complexity and parameters, defeating the advantages of the method. Because this simulation project looks at animals converging to a stable home range, minimum convex polygon is not suitable as it cannot handle the initial travelling to a home range.&lt;/p&gt;

&lt;h3 id=&quot;kernel-density-estimation&quot;&gt;Kernel Density Estimation&lt;/h3&gt;

&lt;p&gt;It seems reasonable that an animal may occasionally move outside of its home range. If this is the case, a better approach to home range definition would be to consider where an animal spends most of its time, rather than all of it. Alternatively, this can be interpreted as how likely it is to find an animal at a given position.&lt;/p&gt;

&lt;p&gt;This concept leads to a more robust method of defining home ranges: Kernel Density Estimation (KDE). Given a set of points, KDE produces a probability density function for the entire domain. Sampling the density function results in a set of points with similar distribution to those fed into the estimation.&lt;/p&gt;

&lt;p&gt;Here’s what a KDE looks like on the same wandering walk that the convex hull failed to represent:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/long_walk_hull_kde_red.svg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;where darker red regions more likely to contain the animal, and therefore more likely to be in the home range. Choosing a threshold probability value gives a single line:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/long_walk_hull_contour.svg&quot; /&gt; 
which appears to be a more accurate representation of a home range.&lt;/p&gt;

&lt;h2 id=&quot;random-walk-models&quot;&gt;Random Walk Models&lt;/h2&gt;

&lt;p&gt;Each location in the grid can be assigned a quality &lt;code class=&quot;highlighter-rouge&quot;&gt;q(x,y)&lt;/code&gt;. This value represents the desirability of a certain position as perceived by the animal, and may incorporate factors such as food availability, predator territory, and climate. These qualities are combined into an environment matrix &lt;code class=&quot;highlighter-rouge&quot;&gt;Q&lt;/code&gt;, with &lt;code&gt;Q&lt;sub&gt;xy&lt;/sub&gt;&amp;nbsp;=&amp;nbsp;q(x,y)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The environment quality is the only parameter for the walk, and it acts to bias the movement direction. At each timestep the animal moves to a neighbouring grid cell with a probability proportional to the neighbouring cells’ respective quality.&lt;/p&gt;

&lt;p&gt;So now the model of animal movement becomes a method for defining the quality of the environment.&lt;/p&gt;

&lt;h3 id=&quot;environment-quality&quot;&gt;Environment Quality&lt;/h3&gt;

&lt;p&gt;2D environments were created by taking random noise:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/noise_map.svg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;and applying Gaussian blur:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/smooth_map.svg&quot; /&gt;&lt;/p&gt;

&lt;p&gt;to make the variation more realistic. In this example, dark areas have high quality and attract the animals - which might represent an area with plentiful food or good shelter. Animals are unlikely to move towards the lighter white areas, which might represent a predator’s territory or dangerous terrain.&lt;/p&gt;

&lt;p&gt;Let’s release a herd of 10 animals into the centre one of these environments and see what happens:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/random_good.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;They travel together to a local maximum, with their travel history shown again in blue. The model is a form of gradient descent, where the environment is the objective function and the random walk adds stochasticity to the optimisation.&lt;/p&gt;

&lt;p&gt;However, this nice convergence is not always seen. Multiple animal convergence depends on the steepness, size, and uniqueness of the local maximum. Here’s different random environment, with the final positions of the animals shown in red:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/dispersed_quality.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Because of the two local maxima found on either side of the starting point, the herd splits and animals end up in different areas with a fractured home range. This seems like an unlikely result; it’s difficult to imagine a small group of tightly packed herding animals splitting in opposite directions.&lt;/p&gt;

&lt;p&gt;It seems that simply heading towards desirable areas isn’t enough for home ranges to emerge.&lt;/p&gt;

&lt;h2 id=&quot;olfactory-model&quot;&gt;Olfactory Model&lt;/h2&gt;

&lt;p&gt;Some form of animal interaction is needed, which is often realised in nature by leaving some sort of scent. This scent diffuses and decays over time, which can be considered as a dynamic environment quality that changes each iteration.&lt;/p&gt;

&lt;p&gt;I assumed the animals produced scent at their current position at a constant rate. The scent was then spread according to the 2D diffusion equation&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/diffusion_equation.png&quot; style=&quot;max-width: 300px;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;which is total overkill for this project, but I wanted to get some use out of my fluid dynamics course.&lt;/p&gt;

&lt;p&gt;The resulting model causes the animals to remain close to each other because they are attracted to the scent of the other animals. They tend to follow each other because of the scent of their tracks. And because the scent is slowly diffusing they remain largely in the same place.&lt;/p&gt;

&lt;p&gt;Here’s how the scent attraction quality looks like at a particular point in time:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/diffusion_group.svg&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;combined-model&quot;&gt;Combined model&lt;/h3&gt;

&lt;p&gt;The next logical step is to combine the environment quality model with scent attraction. This is done by combining the two qualities:&lt;/p&gt;
&lt;center&gt;
	&lt;code&gt;Q&amp;nbsp;=&amp;nbsp;(Q&lt;sub&gt;environment&lt;/sub&gt;)&lt;sup&gt;m&lt;sub&gt;1&lt;/sub&gt;&lt;/sup&gt;&amp;nbsp;+(Q&lt;sub&gt;scent&lt;/sub&gt;)&lt;sup&gt;m&lt;sub&gt;2&lt;/sub&gt;&lt;/sup&gt;
	&lt;/code&gt;
&lt;/center&gt;
&lt;p&gt;with a weighting &lt;code&gt;m&lt;/code&gt; for each one. The result looks good:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/combined2.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This is the same environment that the animals failed on with the basic environment model. And now they remain together in a stable home range. Success!&lt;/p&gt;

&lt;h2 id=&quot;stray-observations&quot;&gt;Stray Observations:&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;This is essentially a 2D optimisation problem, which pops up a lot in mathematics. It’s cool to see that animals have to deal with this stuff too.&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sat, 10 Sep 2016 00:00:00 -0700</pubDate>
        <link>http://localhost:4000/blog/home-range-modelling</link>
        <guid isPermaLink="true">http://localhost:4000/blog/home-range-modelling</guid>
        
        
        <category>modelling</category>
        
      </item>
    
      <item>
        <title>A Break From Personal Projects</title>
        <description>&lt;p&gt;One of the first personal projects I worked on (&lt;a href=&quot;https://www.bookfetch.co.nz&quot;&gt;bookfetch&lt;/a&gt;) was a massively beneficial for me. What I was trying to do was so far beyond my skill level that I learned an enormous amount about programming and web development. I used these skills and experiences to land my first few jobs in web development with no formal background in software.&lt;/p&gt;

&lt;p&gt;Since then I’ve constantly had some other personal project on the go, mostly in search of a passive income. I’ve spent hundreds of hours on these projects, but I have little to show for that time. The projects I’ve started haven’t been challenging enough for me to learn from, but were too ambitious to easily succeed.&lt;/p&gt;


&lt;p&gt;Going forward, I’d like to spend time working on small blog-post-sized exercises. It’s much easier to find something this size that I find interesting and that I can learn from, while avoiding spending time managing servers and writing boilerplate.&lt;/p&gt;




&lt;!-- break --&gt;
&lt;p&gt;This is a collection of all those sunsetted projects. &lt;/p&gt;


&lt;/article&gt;

&lt;div class=&quot;sunset-project-wrapper&quot;&gt;
	&lt;div class=&quot;sunset-project&quot;&gt;
		&lt;div class=&quot;sunset-meta&quot;&gt;
			&lt;div class=&quot;sunset-title&quot;&gt;Snip&lt;/div&gt;
			I often bookmark sites with good design or UX, and I wanted to be able to organise that visually without dealing with dead links. There would be an extension to take screenshots of websites, and then a Dribbble-like website to manage them.
		&lt;/div&gt;
		&lt;div class=&quot;sunset-images&quot;&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/snip-f.png&quot;&gt;&lt;img src=&quot;/img/snip-f.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/snip-bb.png&quot;&gt;&lt;img src=&quot;/img/snip-bb.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/div&gt;


&lt;div class=&quot;sunset-project-wrapper&quot;&gt;
	&lt;div class=&quot;sunset-project&quot;&gt;
		&lt;div class=&quot;sunset-meta&quot;&gt;
			&lt;div class=&quot;sunset-title&quot;&gt;Sussed&lt;/div&gt;
			A helpdesk/CMS for freelancers. Design liberally inspired by Stripe and Invision.
		&lt;/div&gt;
		&lt;div class=&quot;sunset-images&quot;&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/sussed-old.png&quot;&gt;&lt;img src=&quot;/img/sussed-old.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/sussed-home.png&quot;&gt;&lt;img src=&quot;/img/sussed-home.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/sussed-register.png&quot;&gt;&lt;img src=&quot;/img/sussed-register.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/lightdesk-tickets.png&quot;&gt;&lt;img src=&quot;/img/lightdesk-tickets.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/sussed-tickets.png&quot;&gt;&lt;img src=&quot;/img/sussed-tickets.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/sussed-ticket.png&quot;&gt;&lt;img src=&quot;/img/sussed-ticket.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/div&gt;



&lt;div class=&quot;sunset-project-wrapper&quot;&gt;
	&lt;div class=&quot;sunset-project&quot;&gt;
		&lt;div class=&quot;sunset-meta&quot;&gt;
			&lt;div class=&quot;sunset-title&quot;&gt;Spindel&lt;/div&gt;
			A company like ScrapingHub offering custom datasets. I still have the scrapers running, and would like to explore some of the datasets when I have time.
		&lt;/div&gt;
		&lt;div class=&quot;sunset-images&quot;&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/spindel.png&quot;&gt;&lt;img src=&quot;/img/spindel.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/scrapers.png&quot;&gt;&lt;img src=&quot;/img/scrapers.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/services.png&quot;&gt;&lt;img src=&quot;/img/services.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;sunset-project-wrapper&quot;&gt;
	&lt;div class=&quot;sunset-project&quot;&gt;
		&lt;div class=&quot;sunset-meta&quot;&gt;
			&lt;div class=&quot;sunset-title&quot;&gt;Stockyard&lt;/div&gt;
			Free stock photo aggregator.
		&lt;/div&gt;
		&lt;div class=&quot;sunset-images&quot;&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/syis.png&quot;&gt;&lt;img src=&quot;/img/syis.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/syi.png&quot;&gt;&lt;img src=&quot;/img/syi.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/div&gt;

&lt;div class=&quot;sunset-project-wrapper&quot;&gt;
	&lt;div class=&quot;sunset-project&quot;&gt;
		&lt;div class=&quot;sunset-meta&quot;&gt;
			&lt;div class=&quot;sunset-title&quot;&gt;Portfolio&lt;/div&gt;
			So many portfolio redesigns, mostly borrowed from others.
		&lt;/div&gt;
		&lt;div class=&quot;sunset-images&quot;&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/pbbb.png&quot;&gt;&lt;img src=&quot;/img/pbbb.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
			&lt;div class=&quot;sunset-image&quot;&gt;
				&lt;a href=&quot;/img/pchch.png&quot;&gt;&lt;img src=&quot;/img/pchch.png&quot; alt=&quot;&quot;&gt;&lt;/a&gt;
			&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/div&gt;
















</description>
        <pubDate>Fri, 09 Sep 2016 00:00:00 -0700</pubDate>
        <link>http://localhost:4000/blog/sunset</link>
        <guid isPermaLink="true">http://localhost:4000/blog/sunset</guid>
        
        
        <category>meta</category>
        
      </item>
    
      <item>
        <title>Passing Multichoice Exams Without Studying</title>
        <description>&lt;p&gt;One of my psychology lecturers required our class to buy the textbook &lt;em&gt;Writing for Psychology&lt;/em&gt; (&lt;a href=&quot;https://books.google.se/books/about/Writing_for_Psychology.html?id=WboUAQAACAAJ&quot;&gt;O’Shea et al., 2006&lt;/a&gt;). He referred to it as “The Bible”. The book talked about things like how to structure a written argument and format a research proposal, and it was the key to passing undergraduate psychology assignments.&lt;/p&gt;

&lt;p&gt;The otherwise serious textbook devoted half a page to multiple-choice exam technique, with advice such as “pick the longest answer” and “if all else fails, choose C”. Like all good humour, it was hard to tell if the authors were joking or not. And the information was not referenced, even though the book has an entire chapter on how to reference information. But it got me wondering if there really was any pattern to multichoice tests that could be exploited to give the test taker an edge.&lt;/p&gt;

&lt;p&gt;I decided to test my idea by throwing a couple of machine learning techniques at some sample questions.&lt;/p&gt;

&lt;!-- break --&gt;

&lt;p&gt;Best case scenario would be to find a model that was both simple enough to use in real life (“always choose the longest answer”), and interpretable enough to reveal something about the psychological biases of a test writer (“lies come to us more easily that the truth”). But I’d settle for p &amp;lt; 0.05.&lt;/p&gt;

&lt;p&gt;The code and a presentation of the results are on &lt;a href=&quot;https://github.com/ajnisbet/multichoice-modelling&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;finding-a-dataset&quot;&gt;Finding a Dataset&lt;/h2&gt;

&lt;p&gt;Rather than overfitting a model to a particular style of exam (“half the answers to Prof. Smith’s behavioural economics tests are C”), I wanted my results to be applicable to multiple choice tests in general. So I was looking for a dataset of multichoice questions and answers that was typical of a wide range of tests.&lt;/p&gt;

&lt;p&gt;I settled on the &lt;a href=&quot;https://www.asi.edu.au/programs/australian-science-olympiads/&quot;&gt;Australian Science Olymiads&lt;/a&gt;, which I remember taking at school. Here’s the sort of question they ask:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/sample-question.png&quot; style=&quot;max-width: 100%; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The organisation publishes PDFs of past exams and solutions on their website. When I downloaded the data in 2014, there were 400 questions and 1800 answers, spread over 19 exams. The exams cover three subjects: biology, chemistry, and physics. This ought to be broad enough to reduce picking up patterns in a single subject.&lt;/p&gt;

&lt;p&gt;I was also hoping to improve individual representativeness by ensuring the exams were written by different people. To begin with, it seems unlikely that a single person would write the exams for three diverse subjects over size years. But we can go one step further by looking at the &lt;code class=&quot;highlighter-rouge&quot;&gt;author&lt;/code&gt; field of the PDF documents: the 19 papers show 9 unique authors, with several papers listing no author at all. Now that’s no guarantee of distinct authors, I could be detecting the person who compiled each document, or set up the computer. But short of full on writing style analysis (which apparently is called &lt;a href=&quot;https://en.wikipedia.org/wiki/Stylometry&quot;&gt;stylometry&lt;/a&gt;), I’m pretty confident that the results won’t pick up on the biases and preferences of a single person.&lt;/p&gt;

&lt;h2 id=&quot;data-wrangling&quot;&gt;Data Wrangling&lt;/h2&gt;

&lt;p&gt;In an extreme display of naivety and laziness, I decided to  parse the question and answer data out of the PDFs programmatically, rather than doing it by hand. What should have been a couple of hours of data entry ended up taking me several days of programming, and I still ended up with a fairly large error rate.&lt;/p&gt;

&lt;p&gt;First I tried parsing the PDF binary, but was bitten by the “presentation over semantics” philosophy of PDFs. The biggest problem was getting the answer labels to correspond with the answer text. The extracted text would look something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a.
Provision of shade for their root systems.
b.
c.
Elimination of excess water that is entering via the roots.
To allow for leaf damage by insects.
Acquisition of as much light as possible for photosynthesis.
d.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which is parsable, but there were about five different variations of this in each document requiring special parsing logic. All to render an identically-formatted result. So I supplemented the parsing method with optical character recognition, taking whichever result didn’t throw an error and favouring OCR to break conflicting results.&lt;/p&gt;

&lt;p&gt;Finally, there were lots of questions that were too abstract to be parsed by any method:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/chemistry-question.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23); border: 10px solid #fff&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In total, I managed to parse 80% of the questions into a structured digital format. About 10% were impossible to parse, and the remaining 10% is where my program failed.&lt;/p&gt;

&lt;h2 id=&quot;feature-extraction&quot;&gt;Feature extraction&lt;/h2&gt;

&lt;p&gt;For each answer I extracted 32 features from both the answer text and the question text, such as&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Number of words&lt;/li&gt;
  &lt;li&gt;Average word length&lt;/li&gt;
  &lt;li&gt;Inverse question logic: the question is worded something like: &lt;em&gt;Which of the following statements is INCORRECT?&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;Value rank: for numerical answers, the size of the number relative to the other answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I avoided adding metadata about the exams, like the subject or the author. Missing values were replaced with mean values from the other questions. As labels I had whether each answer was correct or incorrect, from the provided answer keys.&lt;/p&gt;

&lt;h2 id=&quot;modelling&quot;&gt;Modelling&lt;/h2&gt;

&lt;p&gt;A regression analysis showed no significant correlation for any one parameter, nor for all the parameters as a whole. This likely rules out the existence of a simple rule-of-thumb model.&lt;/p&gt;

&lt;p&gt;So next I tried a neural network with a single hidden layer of 16 nodes. I suspected there would be some nonlinear interactions between the features I selected - in particular a feature of the question interacting with a feature from the answer. Neural networks can automatically tune for higher order features, however using a neural network practically rules out a psychologically interpretable result.&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;If the labels were chosen randomly, you’d expect to score 21%. The neural network had a test score of 26% averaged over ten runs. The model is 5 percentage-points better than random choice. So if you had to guess 20 questions in an exam, you would get one more correct with the neural network model that randomly guessing.&lt;/p&gt;

&lt;p&gt;That’s a lot of effort for a small increase. As far as time allocation goes, it’s probably not an improvement on studying. And also not an improvement on actual cheating. However the results seem significant, despite the small effect size, suggesting that there are exploitable biases in multichoice exams.&lt;/p&gt;

&lt;h2 id=&quot;hindsight&quot;&gt;Hindsight&lt;/h2&gt;

&lt;p&gt;I did this project a few years ago, and have learnt a lot about this sort of thing since then. Here’s what I’d do differently today:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Manually define nonlinear features. The nonlinearity was my motivation for using a neural network, but there were only a few interactions that I had in mind. I might have been better off just assigning the ones I could think of, which would allow for use of simple interpretable methods like regression.&lt;/li&gt;
  &lt;li&gt;Use a NLP approach. Rather than looking for relationships between features and labels, another approach I’d like to try is treating the data as two different corpuses (correct and incorrect answers) and trying to classify unseen answers into one or the other. You’d miss out on some of the more complicated features like numerical value and anything to do with the question, but it might capture writing style better than I could design into a feature.&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sat, 05 Mar 2016 00:00:00 -0800</pubDate>
        <link>http://localhost:4000/blog/multichoice</link>
        <guid isPermaLink="true">http://localhost:4000/blog/multichoice</guid>
        
        
        <category>data</category>
        
      </item>
    
      <item>
        <title>Static Site Hosting Speed Test</title>
        <description>&lt;p&gt;I’m running this blog as a static site, which offers awesome performance and security for very little effort. Static sites also give you the option of hosting on a static host, which tend to be faster, cheaper, and easier to manage than traditional servers. The question is which host to use?&lt;/p&gt;

&lt;p&gt;I compared the performance of various static hosting offerings from Google, Amazon, GitHub, and CloudFlare - some of the biggest players in the hosting business. 
&lt;!-- break --&gt;&lt;/p&gt;

&lt;p&gt;The analysis is saved in this &lt;a href=&quot;https://github.com/ajnisbet/static-speedtest/blob/master/analysis.ipynb&quot;&gt;IPython Notebook&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;the-hosts&quot;&gt;The Hosts&lt;/h2&gt;

&lt;p&gt;Here are the services I tested:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;amazon-s3&quot;&gt;Amazon S3&lt;/h4&gt;
    &lt;p&gt;&lt;a href=&quot;https://aws.amazon.com/s3/&quot;&gt;S3&lt;/a&gt; is an online file service from Amazon, and it’s super cheap to host a static site.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;amazon-cloudfront&quot;&gt;Amazon Cloudfront&lt;/h4&gt;
    &lt;p&gt;&lt;a href=&quot;https://aws.amazon.com/cloudfront/&quot;&gt;CloudFront&lt;/a&gt; is a content delivery network (CDN) that sits in front of other Amazon offerings (for this test I used S3). CDNs your files on servers in multiple locations which should mean better speeds and uptime for users. CloudFront is slightly more pricey than S3, but Amazon will give you free SNI SSL for your custom domain which is pretty neat.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;google-appengine&quot;&gt;Google AppEngine&lt;/h4&gt;
    &lt;p&gt;Google &lt;a href=&quot;https://cloud.google.com/appengine/&quot;&gt;AppEngine&lt;/a&gt; is a PaaS offering, but it’s really easy to set up for static hosting. You get a ton of customisation options, and it’s easy to make parts of your site dynamic later if you need. It comes with a free quota which will cover a significant amount of static traffic.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;google-cloud-storage&quot;&gt;Google Cloud Storage&lt;/h4&gt;
    &lt;p&gt;&lt;a href=&quot;https://cloud.google.com/storage/&quot;&gt;GCS&lt;/a&gt;: basically S3 by Google. Similar features and pricing&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;cloudflare&quot;&gt;CloudFlare&lt;/h4&gt;
    &lt;p&gt;&lt;a href=&quot;https://www.cloudflare.com/&quot;&gt;CloudFlare&lt;/a&gt; is a CDN that will cache requests going to another service. The performance of the host behind CloudFlare shouldn’t matter - for this test I chose AppEngine and Google Cloud Storage because they were the easiest to set up.  CloudFlare comes with a functional free plan including SSL, though you’ll need a paid plan for any serious customisation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;h4 id=&quot;github-pages&quot;&gt;GitHub Pages&lt;/h4&gt;
    &lt;p&gt;GitHub’s &lt;a href=&quot;https://pages.github.com/&quot;&gt;Pages&lt;/a&gt; serves a git repository on your domain. It’s free, and sites are hosted via &lt;a href=&quot;https://www.fastly.com/&quot;&gt;Fastly’s&lt;/a&gt; CDN. However, Pages has few options for customisation: you can’t set headers, choose cache times, or use SSL. And although using git for deployment seems like a good idea, trying to do something like keeping draft posts private will have you wishing for separation between version control and deployment.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of these providers are available in multiple regions, so I spun up services in the US and Europe where possible. This gave a total of 10 hosts, each on a separate subdomain. To each one I uploaded an draft version of my homepage: a 3KB HTML file.&lt;/p&gt;

&lt;h2 id=&quot;method&quot;&gt;Method&lt;/h2&gt;
&lt;p&gt;I expected that the three CDN-backed options (CloudFront, CloudFlare, and GitHub Pages) would outperform the rest. To test this, I set up a script to request my homepage from one of the 10 hosts every 10 seconds. I recorded how long it took to download the page, as well as the response headers and any errors.&lt;/p&gt;

&lt;p&gt;The script ran on servers located in West US, East US, and Europe. The idea was to roughly represent the distribution of “Western” internet users, so that  values averaged over all the regions would be a first-order approximation of real usage.&lt;/p&gt;

&lt;p&gt;I rented 3 VPSs each from Digital Ocean and Linode to lessen the effect of one host being particularly close to a requesting server’s datacentre. After running the script for a day to warm up servers and fill caches, I measured 14 days of data.&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
&lt;p&gt;In total, almost 600,000 requests were made. This works out at 4000 requests per day for each host, which seems like reasonable traffic for a personal blog.&lt;/p&gt;

&lt;p&gt;To evaluate the hosts, I’ll compare error rates, average performance, and worst-case performance.&lt;/p&gt;

&lt;h3 id=&quot;errors&quot;&gt;Errors&lt;/h3&gt;

&lt;p&gt;First, let’s look at when the server responded with a code other than &lt;code class=&quot;highlighter-rouge&quot;&gt;200 OK&lt;/code&gt; (in all plots, lower is better):&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/img/all_errors.png&quot;&gt;&lt;img src=&quot;/img/all_errors.png&quot; alt=&quot;Errors by host&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aside from GitHub Pages, all hosts perform really well. The 3 errors experienced by and AppEngine EU and S3 US still correspond to an uptime of 99.995%, and these errors are distributed in time so a browser refresh would fix any issues.&lt;/p&gt;

&lt;p&gt;GitHub pages had a massive number of 404 errors, all within a 10 minute period:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/img/gh-error-rate.png&quot;&gt;&lt;img src=&quot;/img/gh-error-rate.png&quot; alt=&quot;GitHub Pages errors&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All requests from all servers failed during this time, so no users would have been able to access the site. To GitHub’s credit, the issue was &lt;a href=&quot;https://status.github.com/messages/2016-01-23&quot;&gt;reported&lt;/a&gt; on their status page. However, they seem to have a similar issue every month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winners&lt;/strong&gt;: AppEngine US, CloudFlare, GCS.&lt;/p&gt;

&lt;h3 id=&quot;average-speed&quot;&gt;Average Speed&lt;/h3&gt;
&lt;p&gt;Time to Last Byte (TTLB) is the time taken to fetch all the data of a webpage, so is a pretty good proxy for when a user will see a completely downloaded site. (I also recorded Time to First Byte, but they weren’t significantly different for such a small payload.)&lt;/p&gt;

&lt;p&gt;The requests are made from well-connected datacentre rather than something more realistic like a mobile network. These timing values represent a lower bound for real life performance, though the ordering of the results should remain the same even under high latencies.&lt;/p&gt;

&lt;p&gt;Here’s the TTLB for each host:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/img/ttlb-host.png&quot;&gt;&lt;img src=&quot;/img/ttlb-host.png&quot; alt=&quot;TTLB by host&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It looks like S3’s Europe region is really struggling! It’s possible that this is made worse by the majority of the requests coming all the way from the US, so here’s the TTLB from each host to just the European requesting servers:
&lt;a href=&quot;/img/ttlb-eu.png&quot;&gt;&lt;img src=&quot;/img/ttlb-eu.png&quot; alt=&quot;TTLB by host, EU requesting servers&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even for requests within Europe, the US regions for S3 and AppEngine perform better than the EU regions, and there’s little difference for CGS. Perhaps these EU regions were created for data privacy reasons, though the AppEngine setup page hints at performance:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/img/appengine-region.png&quot;&gt;&lt;img src=&quot;/img/appengine-region.png&quot; alt=&quot;AppEngine EU setup&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ll exclude S3 EU for the rest of speed analysis, as it’s scaling the plots too much. Let’s try the TTLB again:
&lt;a href=&quot;/img/ttlb-host-no-s3.png&quot;&gt;&lt;img src=&quot;/img/ttlb-host-no-s3.png&quot; alt=&quot;TTLB by host&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s better. Now we can see that CloudFlare and GitHub Pages are both coming close to the all-important &lt;a href=&quot;https://www.nngroup.com/articles/response-times-3-important-limits/&quot;&gt;100ms threshold&lt;/a&gt; for human perception of instantaneous change, with little variation shown by the quantiles. Amazon’s offerings (S3 US and CloudFront) lag behind, along with AppEngine’s EU region.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winners:&lt;/strong&gt; CloudFlare, Pages.&lt;/p&gt;

&lt;h3 id=&quot;edge-case-responses&quot;&gt;Edge Case Responses&lt;/h3&gt;
&lt;p&gt;Average times are a useful metric for overall performance, but the extremes are important too. A high 90th percentile response time may not affect average TTLB, but could cause that 10% of your userbase to get frustrated and leave.&lt;/p&gt;

&lt;p&gt;Here’s the 90th percentile TTLB for the hosts: 
&lt;a href=&quot;/img/ttlb-90.png&quot;&gt;&lt;img src=&quot;/img/ttlb-90.png&quot; alt=&quot;90th percentile TTLB&quot; /&gt;&lt;/a&gt;
10% of users wait longer than this to make a request. And with a typical website requiring 10s of requests for scripts and other files to render a complete page, a large delay in any one of these critical assets will delay the entire page.&lt;/p&gt;

&lt;p&gt;I feel like we’ve seen this plot already. GitHub pages performs extremely well, with Amazon and AppEngine Europe brining up the rear. It’s unusual to see such a large difference between the two CloudFlare hosts, perhaps this indicates a high level of variability in the data.&lt;/p&gt;

&lt;p&gt;We can look at the 99th percentile too:
&lt;a href=&quot;/img/ttlb-99.png&quot;&gt;&lt;img src=&quot;/img/ttlb-99.png&quot; alt=&quot;90th percentile TTLB&quot; /&gt;&lt;/a&gt;
The numbers are bigger, though these events occur much more rarely. The CloudFlares are together again. Actually I’m surprised by how well all the hosts perform, 99% of requests being completed under a second.&lt;/p&gt;

&lt;p&gt;CloudFront has risen to the highest response times, as it seems to refresh its cache fairly frequently: 2% of request miss the cache (one every 20 minutes), compared to 0.11% for CloudFlare and 0.3% for GitHub Pages.  A cache refresh incurs the regular latency of CloudFront plus a round trip to the S3 backend.&lt;/p&gt;

&lt;p&gt;Finally, the proportion of requests that take longer than an instantaneous-feeling 100ms:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/img/short-requests.png&quot;&gt;&lt;img src=&quot;/img/short-requests.png&quot; alt=&quot;% short requests&quot; /&gt;&lt;/a&gt;
Our leaders return slow-feeling responses in only a quarter of requests. Behold the power of static sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winners&lt;/strong&gt;: Cloudflare, GitHub Pages, GCS.&lt;/p&gt;

&lt;h2 id=&quot;results-1&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;CloudFlare is the clear winner of this test, even on the free plan. Considering the effect size of the results, you probably also wouldn’t notice any meaningful difference with Google Cloud Storage or AppEngine’s US region.&lt;/p&gt;

&lt;p&gt;GitHub pages was just as fast as CloudFlare but seems to have downtime issues. I also feel the lack of configuration would be a problem in a lot of use cases.&lt;/p&gt;

&lt;p&gt;S3 is slow, but has a lot else going for it as the backbone of Amazon’s AWS infrastructure. I was surprised how poorly CloudFront performs though, given how frequently I see it deployed.&lt;/p&gt;

&lt;p&gt;So now this blog is hosted with CloudFlare, backed by Google AppEngine for ease of deployment.&lt;/p&gt;

</description>
        <pubDate>Fri, 05 Feb 2016 00:00:00 -0800</pubDate>
        <link>http://localhost:4000/blog/static-site-speedtest</link>
        <guid isPermaLink="true">http://localhost:4000/blog/static-site-speedtest</guid>
        
        
        <category>data</category>
        
      </item>
    
      <item>
        <title>Initial Commit</title>
        <description>&lt;p&gt;We are fortunate in the web development industry that the quality of our work can be so easily assessed. Instead of relying on references and qualifications, it’s easy to share source code and deployed web work.&lt;/p&gt;

&lt;p&gt;What’s missing is the human component: the reasoning and explanation behind what we do. I think this is important for demonstrating competence, and a blog seems an excellent place to share that information.&lt;/p&gt;

&lt;!-- break --&gt;

&lt;p&gt;But mostly this blog is an excuse to experiment with data, writing, and maths. Both for professional development and because I genuinely enjoy these things. For some reason, writing about something you’ve done seems to make the whole activity justifiable amid other deadlines and commitments.&lt;/p&gt;

&lt;p&gt;Not sure why that is. Maybe I’ll write a blog post about it.&lt;/p&gt;
</description>
        <pubDate>Thu, 04 Feb 2016 00:00:00 -0800</pubDate>
        <link>http://localhost:4000/blog/initial-commit</link>
        <guid isPermaLink="true">http://localhost:4000/blog/initial-commit</guid>
        
        
        <category>news</category>
        
      </item>
    
  </channel>
</rss>
