Reposting from kylehodgson.com. The focus of this article is technology, including ADSB.
Since 2020, aircraft flying in North America and Europe must broadcast out Automatic Dependent Surveillance-Broadcast (ADSB) data*. I’ve been working on a project, SkyZero.io since September that makes use of this data. Since the aircraft broadcasts are by design readable by anyone with the right radio hardware and software, several networks have cropped up that aggregate flight data. The bigger commercial players are FlightAware and FlightRadar24, and there are also a number of community networks running the gamut from pseudo commercial to fairly open source, ADSB.lol, ADSB.fi, airplanes.live among them.
What these networks allow users (like SkyZero) to do is discover what aircraft are in the air, where they are, how fast they’re going, their altitude and so on. Some require users to install an ADSB feeder (usually a Rasperry Pi with a USB radio) to get data, some simply charge for API access.
At some point, I started analyzing the differences between these networks, and in many cases I found that while each network had a certain picture of what was happening, that if you could assemble multiple networks it would often fill in gaps. At this point, SkyZero.io now collects data from five different networks.
SkyZero.io tech stack
I’ve built the project using TimeScale (now TigerData), a Postgres company that’s created some amazing open source extensions to Postgres and a great hosting experience around it. For time series data its many times faster than the usual players. This allows you to combine time series features (which TimeScale provides via the Hyper Table concept) with other Postgres features such as world class SQL implementation, JSON storage, GIS – basically, everything you’d need to start an application that’s tracking the locations of aircraft over time.
Combine TigerData‘s postgres with python, the excellent Polars library, and the Observable Framework with Leaflet for visualizations, and SkyZero is born.
While the tech is fun, and involved some learning, the biggest learning has been around aviation, and the ins and outs of ADSB data.
Interleaving data from multiple ADSB networks
For instance, a current challenge – I’ve noticed several flights in my database exceeding 2,000 kilometres – flights of an electric trainer aircraft with a maximum of 90 minutes duration and maybe 200km range. This is obviously incorrect – but how did it sneak in?
I took a moment to look in to this – and the issue comes from the very strength I set out to build in to the project, namely the fact that my data comes from several sources. Isn’t it always the way, your superpower is also your kryptonite?
For instance, below is a set of what I call “flight path points” – details about where an aircraft was, when, how fast it was going etc. What I noticed was big jumps in distance in between flight path points from one API vs another in the flights that have these large discrepancies.
Things look pretty normal until we get to row six above – as we change over from API B to API C, we get a big jump in the distance between the coordinates. Somehow, in a matter of 13 seconds the aircraft has jumped 173 kilometres! This happens multiple times, too, as we interleave records from API B and C. This could be feeders reporting their location incorrectly, clocks not being synchronized, so many possibilities.
I’m somewhat used to the networks reporting things differently; sometimes API A will only have low precision GPS data – in this case we’d see lat/lng pairs like 38.4,-0.5
instead of 38.424911,-0.462799
. This can happen as aircraft are broadcasting in “pairs” of radio packets, called “even” and “odd”, and if a feeder doesn’t receive both, then they won’t have a full precision GPS.
This definitely can cause big jumps too, as you just end up taking the center of a low precision data point, which might be hundreds of meters away from where the aircraft actually is – but that’s why I filter low precision GPS data out when calculating distance. As you can see, all the above records are fairly high precision GPS data.
To start I’ll look in to developing an algorithm to detect that the problem has happened, this will enable me to remove these flights from the data set. Then I’ll work on some sort of fix to neutralize the jumps – likely by ignoring records that cause these kinds of distortions.
PostgreSQL to everyone else: hold my beer
The query result in the table above is a nice use of many PostgreSQL features all in one query:
I just sat down and hacked out this query to do the analysis around the distance issues, and it evolved organically – so its not the most elegant thing in the world. But it is a decent example of just how much you can do with PostgreSQL. Lag over partition (window analysis), JSONB data, time series, GIS distance, relational joins – its really nice having so many world class tools in such a well organized tool box.
*There are a few exceptions; EU does not require ultralights to broadcast ADSB-out for instance (though many do anyway). More notably, drones are also not required to perform ADSB out, though a separate framework called RID is required.