First steps with the iNaturalist API
How many users in NaturalistaUY are Uruguayan?
I was interested to know how many of the users generating records in NaturalistaUY (the Uruguayan site of iNaturalist), are, in fact, Uruguayan users. With Rodrigo Montiel, we are assessing the profile of observers in Uruguay (check his repo here). So, we need to detect those users that are not from Uruguay and remove the data generated by them from our dataset. This was a perfect use case to test the iNat API, which, so far, I haven’t explored.
Basically, this API enables you to query the iNaturalist database by using different methods, parameters and values. For instance, you can search and fetch data on observers, such as number of observations (observation_count
) or number of species observed (species_count
), by providing a username (user_login
). The result is a JSON that you can parse and analyse.
Let’s get on with it!
To start with an example, I’m going to use my own username (flo_grattarola
) as value for the user_login
parameter to do a query, using GET /observations/observers
(see it in the API), which ‘returns observers of observations matching the search criteria and the count of observations and distinct taxa of rank species they have observed’.
So, by running the following query:
'https://api.inaturalist.org/v1/observations/observers?user_login=flo_grattarola'
We get as output, the following data in JSON format:
{
"total_results": 1,
"page": 1,
"per_page": 500,
"results": [
{
"user_id": 736016,
"observation_count": 3649,
"species_count": 1246,
"user": {
"id": 736016,
"login": "flo_grattarola",
"spam": false,
"suspended": false,
"created_at": "2017-12-15T15:54:34+00:00",
"login_autocomplete": "flo_grattarola",
"login_exact": "flo_grattarola",
"name": "Florencia Grattarola",
"name_autocomplete": "Florencia Grattarola",
"orcid": "https://orcid.org/0000-0001-8282-5732",
"icon": "https://static.inaturalist.org/attachments/users/icons/736016/thumb.jpeg?1513353273",
"observations_count": 3649,
"identifications_count": 5779,
"journal_posts_count": 1,
"activity_count": 9429,
"species_count": 1419,
"universal_search_rank": 3649,
"roles": [
"curator"
],
"site_id": 28,
"icon_url": "https://static.inaturalist.org/attachments/users/icons/736016/medium.jpeg?1513353273"
}
}
]
}
Great! This query gives us all the data we need, especially the observation_count
value for each user.
Assessing users in NaturalistaUY
First, we need to download the data from naturalista.uy (ours was downloaded on 2022-10-27). Then, we calculate the number of observations made by each user in Uruguay (observation_count_UY
), and keep the user_id
and user_login
variables. You can do this by using functions group_by()
and count()
. You could also do this using the API, but in our case, we already had the data.
We found a total of 1,788 users in NaturalistaUY. The user with largest number of records has 4,755 observations and on average users have uploaded 29.9 records to iNat. Here’s a glance of the data:
user_id | user_login | observation_count_UY |
---|---|---|
11503 | noelia | 158 |
1255162 | romigaleota | 204 |
1368063 | goncrisdi | 125 |
1469479 | jorgejuanrueda | 132 |
1569449 | beln15 | 133 |
2348924 | ceciliapomboposente | 105 |
2640130 | patriciabidondo | 152 |
2988066 | weba69 | 165 |
3262606 | bert_in_the_skirt | 201 |
4109374 | vanesssa_v | 147 |
Create a function to run the query for multiple users
The idea now is to be able to run the test query from above for all the users of NaturalistaUY, get their observation_count
s and compare them with the number of observations these users have in Uruguay (i.e., proportion of observations recorded in Uruguay vs in the rest of the world).
The function get_observers_num_observations()
takes a list of users (user_login
) and returns a tibble with user_id
, user_login
and observation_count_iNat
. This last count is the total number of observations of the users in the platform.
An important consideration to using this API is that we could overflow it by querying all users together, as iNat limits the API usage to a max of 100 requests per minute. So, we need to create a delay in the fetching process. We will do this by pausing the query for 10 seconds every ten rows, using Sys.sleep()
.
Here’s the function:
library(httr)
library(jsonlite)
library(tidyverse)
get_observers_num_observations <- function(user_login_list){
observers_num_observations <- tibble(user_id = numeric(),
user_login = character(),
observation_count_iNat = numeric())
num_results <- 1 # se usa para dormir la llamada a la API y para imprimir en consola el progreso
for (user_login in user_login_list) {
if ((num_results %% 10) + 10 == 10) {
Sys.sleep(10) # La API necesita un delay porque si no da error. Cada 10 users, el código para 10 segundos
}
call <- paste0("https://api.inaturalist.org/v1/observations/observers?user_login=", user_login)
get_json_call <- GET(url = call) %>%
content(as = "text") %>% fromJSON(flatten = TRUE)
if (is.null(get_json_call)) {
observer_num_observations <- tibble(user_id = NA,
user_login = user_login,
observation_count_iNat = NA)
observers_num_observations <- rbind(observers_num_observations, observer_num_observations)
cat(num_results, 'usuario:', user_login, '--> NOT FOUND', '\n')
}
else {
results <- as_tibble(get_json_call$results)
observer_num_observations <- tibble(user_id = results$user_id,
user_login = results$user.login,
observation_count_iNat = results$observation_count)
observers_num_observations <- rbind(observers_num_observations, observer_num_observations)
cat(num_results, 'usuario:', user_login, '--> DONE', '\n')
}
num_results <- nrow(observers_num_observations) + 1
}
return(observers_num_observations)
}
It is probably written in a too complicated way, but it does the job 🥹
Let’s run it
To run the function, we need to provide a list of users’ user_login
s. Let’s use as an example the previous list.
[1] "noelia" "romigaleota" "goncrisdi"
[4] "jorgejuanrueda" "beln15" "ceciliapomboposente"
[7] "patriciabidondo" "weba69" "bert_in_the_skirt"
[10] "vanesssa_v"
When we run it, the function prints in the console the users’ user_login
that it is assessing, so we can have an idea of the progress.
NatUY_users_assessment <- get_observers_num_observations(NatUY_users_selection$user_login)
If the user is found it will print the name and --> DONE
, while if it’s not found, it will return --> NOT FOUND
.
1 usuario: noelia --> DONE
2 usuario: romigaleota --> DONE
3 usuario: goncrisdi --> DONE
4 usuario: jorgejuanrueda --> DONE
5 usuario: beln15 --> DONE
6 usuario: ceciliapomboposente --> DONE
7 usuario: patriciabidondo --> DONE
8 usuario: weba69 --> DONE
9 usuario: bert_in_the_skirt --> DONE
10 usuario: vanesssa_v --> DONE
How do results look?
In the end, we get a table with the counts.
user_id | user_login | observation_count_iNat |
---|---|---|
11503 | noelia | 162 |
1255162 | romigaleota | 4218 |
1368063 | goncrisdi | 23141 |
1469479 | jorgejuanrueda | 7001 |
1569449 | beln15 | 133 |
2348924 | ceciliapomboposente | 158 |
2640130 | patriciabidondo | 387 |
2988066 | weba69 | 1728 |
3262606 | bert_in_the_skirt | 204 |
4109374 | vanesssa_v | 179 |
Finally, we merge the results with our original table and count the proportion of records from each user that were recorded in Uruguay. We can even make a guess of who is Uruguayan (those that did more than %30 of their observations in Uruguay), see variable esUruguaye
.
user_id | user_login | observation_count_UY | observation_count_iNat | proporcion | esUruguaye |
---|---|---|---|---|---|
11503 | noelia | 158 | 162 | 97.531 | si |
1255162 | romigaleota | 204 | 4218 | 4.836 | no |
1368063 | goncrisdi | 125 | 23141 | 0.540 | no |
1469479 | jorgejuanrueda | 132 | 7001 | 1.885 | no |
1569449 | beln15 | 133 | 133 | 100.000 | si |
2348924 | ceciliapomboposente | 105 | 158 | 66.456 | si |
2640130 | patriciabidondo | 152 | 387 | 39.276 | si |
2988066 | weba69 | 165 | 1728 | 9.549 | no |
3262606 | bert_in_the_skirt | 201 | 204 | 98.529 | si |
4109374 | vanesssa_v | 147 | 179 | 82.123 | si |
Conclusions
From the total of 1,788 users, 1,282 are Uruguayans (i.e., have recorded more than 1/3 of their observations in Uruguay), while 517 are not. We also found 11 users that seem to have deleted their accounts in the platform and, thus, they were not found.