Apps crash. Apps leak data. They get pwned, get new features and get updates. Apps have a complex lifecycle, and we want to understand as much of it as possible. In this post, I present our understanding on the frequency of application updates in the Google Play store. Understanding application update frequencies has a myriad of usages. Some of these include:
- Determine the frequency of examining Google Play apps for vulnerabilities
- Assessing how active the development of a particular app is
- Calculating how much mobile internet bandwidth will be used by app updates
Our primary objective of this research is the first point. We are more concerned about the application update cycles in general rather than that of any one particular application.
Getting the date when an app was last updated is as easy as scraping the Google Play store. In the Tadditional informationU section there is a field called TUpdatedU that gives the date of the last application update. To determine the trend for a particular application, we need to get dates for all the updates of the application over a period of time. However, since we did not do this scraping every day for a long period of time, we cannot pin down exact timings for each app update and hence cannot model the frequency for each app update. This does not affect us, as our overall objective is to understand the timing of updates.
For this research, I randomly selected several apps from the Play store and noted the date they were last updated. I had a theory that popular apps should be getting more updates so I made sure than we collect equal number of samples (100) from each category of app popularity. (Number of Downloads for an app were used as psuedo-representative of app popularity). Number of Downloads is not the exact download count of an app but a range in which the number of downloads should fall. We take the lower limit of this range as the download count for a app –
1B+. In total we have 20 non-overlapping and non-uniform ranges. Since the data provided by Google is in non-uniform ranges, we stick to these ranges. Also, there are a small number of apps in the
1B+ downloads category, so we selected all of them.
Next, I computed the number of days since last update for all the apps. If there is a relationship between categories and update frequency, we should be able to capture it with this metric. Ideally, we should collect this data over several random days before making any conclusions.
ggplot(data,aes(x=reorder(numDownloads,as.numeric(as.character(numDownloads))), y=daysSinceLastUpdate))+geom_boxplot() +theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x="Download Counts",y="Days since last update", title='Distribution of days since last update for Android play store apps')
The drop in the median time is clearly visible in the graph after
500+ download category and
50000+ categories. This definitely means that different categories have different mean
days since last update but for the sake of science we did an
ANOVA analysis to check if the difference is statistically significant.
p-value came out to be
2e-16 (anything less than 0.05 supports the alternative hypothesis, which in our case means that different categories have different update timings).
F-stat is equal to
73.54 (The further away is F-stat value from 1, the more it supports alternative hypothesis). These values allow us to easily reject the null hypothesis and imply that different categories do not have the same mean for the
days since last update. Of course, I went into pairwise t-tests, but I will not bore you with more details. Here are the final findings:
On average there is a Play store update (lower end of the 95% confidence interval) :
- every 10 days for apps with more than 100K+ downloads
- every 17 days for apps with less than 100K but more than 1000 downloads
- every 95 days for apps with less than 1000
PS: 1. Google seems to have abandoned the Google street app
1B+ category as it has not got an update since Oct 2012, we removed this app from our calculations as an outlier. 2. I used the statistical software, R, to prepare the graphs and the analysis.