Redo Airport Score


This morning I have redone everything mentioned in the post 1/8. The last version used an air route dataset of 37,595 records, turned out to be an earlier version. The latest version has 67,240 records.

As a quick reminder, "airport score" is a measurement created based on global airport network. It shows how centered each city is in the global air traffic network. In short, the airport score for each point on earth (or each city) is the sum product of airport weights and inverse distances to the point from all the airports. The airport weights are obtained through eigenvector centrality using a dataset of all the airline linkages in the world.

Method

1. There are 3,300 airports in total, for each city, calculate the distances x to each airport, apply an inverse function of the distance f(x) = 1/(1+x)^p to penalize airports that are further away. Different values of the exponent (p) would produce quite different ranking results. I chose p to be 200 through parameter selection. It produced a ranking that aligns best with GAWC ranking of global cities.

2. Time this f(x) with weights (w) and then sum over the 3,300 airports to get the score for a city. The weights (w) are the relative importance of each airport in the airports' network obtained through eigenvector centrality. Using R package "igraph" can obtain these weights easily. The "importance of each node" is a stationary probability distribution in a network. The intuition behind is that the relative weight of each node is the probability that a traveler who is always traveling in the network among nodes appears in this location.

Improvement

I tried the dataset with 67,240 records once before but the results did not make any sense. Today I realized where the mistake is. That dataset has longitude listed before latitude but I took it for granted that it is latitude - longitude, as in almost everywhere. Ok, this is an excuse.
1. I should not take things for granted
2. Should check the source first

Result

The old version seems incomplete as it focuses more on European cities. The right result from the latest air route dataset has correctly put Atlanta International Airport (ATL) as the top 1 airport. (Hartsfield-Jackson Atlanta International Airport is the world's busiest airport by total movements). The Heathrow airport in London is the second, followed by O'Hare in Chicago and John F. Kennedy in New York. And in terms of city score, New York is on the top of the list. More American cities and Shanghai are in the top 10.

In the map, we can also see higher weights from North America and Asia:

Animation


New ranks of airports and cities in terms of importance in the network.

Compared to the old version:

The city score in Europe has been weakened too (around German and Belgium), compared to the 1/5 post. Animation on the world map:

------
More technical details on how to program the animation in R were discussed in the older post.
Another post discussed how the geological distances were calculated.

Comments

Popular posts from this blog

How to Draw Heatmap with Colorful Dendrogram

Power-law distribution (Pareto)& Zipf's Law: connection and how to fit the distribution of global city population

eXtreme Gradient Boosting (XGBoost): Better than random forest or gradient boosting