Posts

Showing posts from May, 2018

How to batch analysis multiple data files in R

Image
Step-by-step instructions on how to import & process multiple data files in R (based on the accuracy analysis of 200 cities) Technical details Challenges Visualization of results on the map I have some data collected for each city, each generated as one data file. And I need to run same analysis for all the 200 data files.  I will go through the whole process in one post. It is not too complicated, which I suppose is why most posts I found online only address part of the issue. Summary of the process: 1. Write a program for a single data file that can do all the analysis correctly.  2. Combine functions to get a final function which will produce a single line of output for all the results I need.  3. Use functional ( lappy ) or a loop to iterate over indices of the batch of files, attach every line of results into a data frame or write into an output .csv file.  4. Start with small batches and trach which data file causes an error in the program, modify th

A linear regression model from Urban Compactness with R-squared equal one

Image
What is the average travel distance in a city calculated from the compactness index?  The compactness of an area (e.g. urban extent) is the ratio between two average distances: d / D . There are two ways to get the numerator d : The average distance of any random point to the center of the circle, d = (128/45pi)*R = 0.9054*R The average distance of any random point to another point in the circle, d = 2/3*R (P.S., this one is easy to calculate, it is the integration of 2 π r/ πR^2  * r = 2r^2/R^2 from 0 to R. The idea is, for any r within range [0, R], the probability that a point on the disc is on that circle is 2 π r/ πR^2, then we integrate r from 0 to R.  See this post .) The circle is called the "Equal-Area circle" which has the equal area as the shape in the study (e.g. urban extent). The denominator D is the average distance of a random point to the center or to another point in the real shape area. The 1st approach (random distance to center) is called pr