First steps with the iNaturalist API

How many users in NaturalistaUY are Uruguayan?

I was interested to know how many of the users generating records in NaturalistaUY (the Uruguayan site of iNaturalist), are, in fact, Uruguayan users. With Rodrigo Montiel, we are assessing the profile of observers in Uruguay (check his repo here). So, we need to detect those users that are not from Uruguay and remove the data generated by them from our dataset. This was a perfect use case to test the iNat API, which, so far, I haven’t explored.

Basically, this API enables you to query the iNaturalist database by using different methods, parameters and values. For instance, you can search and fetch data on observers, such as number of observations (observation_count) or number of species observed (species_count), by providing a username (user_login). The result is a JSON that you can parse and analyse.

Let’s get on with it!

To start with an example, I’m going to use my own username (flo_grattarola) as value for the user_login parameter to do a query, using GET /observations/observers (see it in the API), which ‘returns observers of observations matching the search criteria and the count of observations and distinct taxa of rank species they have observed’.

So, by running the following query:

'https://api.inaturalist.org/v1/observations/observers?user_login=flo_grattarola'

We get as output, the following data in JSON format:

{
  "total_results": 1,
  "page": 1,
  "per_page": 500,
  "results": [
    {
      "user_id": 736016,
      "observation_count": 3649,
      "species_count": 1246,
      "user": {
        "id": 736016,
        "login": "flo_grattarola",
        "spam": false,
        "suspended": false,
        "created_at": "2017-12-15T15:54:34+00:00",
        "login_autocomplete": "flo_grattarola",
        "login_exact": "flo_grattarola",
        "name": "Florencia Grattarola",
        "name_autocomplete": "Florencia Grattarola",
        "orcid": "https://orcid.org/0000-0001-8282-5732",
        "icon": "https://static.inaturalist.org/attachments/users/icons/736016/thumb.jpeg?1513353273",
        "observations_count": 3649,
        "identifications_count": 5779,
        "journal_posts_count": 1,
        "activity_count": 9429,
        "species_count": 1419,
        "universal_search_rank": 3649,
        "roles": [
          "curator"
        ],
        "site_id": 28,
        "icon_url": "https://static.inaturalist.org/attachments/users/icons/736016/medium.jpeg?1513353273"
      }
    }
  ]
}


Great! This query gives us all the data we need, especially the observation_count value for each user.

Assessing users in NaturalistaUY

First, we need to download the data from naturalista.uy (ours was downloaded on 2022-10-27). Then, we calculate the number of observations made by each user in Uruguay (observation_count_UY), and keep the user_id and user_login variables. You can do this by using functions group_by() and count(). You could also do this using the API, but in our case, we already had the data.

We found a total of 1,788 users in NaturalistaUY. The user with largest number of records has 4,755 observations and on average users have uploaded 29.9 records to iNat. Here’s a glance of the data:

user_iduser_loginobservation_count_UY
11503noelia158
1255162romigaleota204
1368063goncrisdi125
1469479jorgejuanrueda132
1569449beln15133
2348924ceciliapomboposente105
2640130patriciabidondo152
2988066weba69165
3262606bert_in_the_skirt201
4109374vanesssa_v147

Create a function to run the query for multiple users

The idea now is to be able to run the test query from above for all the users of NaturalistaUY, get their observation_counts and compare them with the number of observations these users have in Uruguay (i.e., proportion of observations recorded in Uruguay vs in the rest of the world).

The function get_observers_num_observations() takes a list of users (user_login) and returns a tibble with user_id, user_login and observation_count_iNat. This last count is the total number of observations of the users in the platform.

An important consideration to using this API is that we could overflow it by querying all users together, as iNat limits the API usage to a max of 100 requests per minute. So, we need to create a delay in the fetching process. We will do this by pausing the query for 10 seconds every ten rows, using Sys.sleep().

Here’s the function:

library(httr)
library(jsonlite)
library(tidyverse)

get_observers_num_observations <- function(user_login_list){
  observers_num_observations <- tibble(user_id = numeric(),
                                       user_login = character(),
                                       observation_count_iNat = numeric())

  num_results <- 1  # se usa para dormir la llamada a la API y para imprimir en consola el progreso

  for (user_login in user_login_list) {

    if ((num_results %% 10) + 10 == 10) {
      Sys.sleep(10) # La API necesita un delay porque si no da error. Cada 10 users, el código para 10 segundos
    }

    call <- paste0("https://api.inaturalist.org/v1/observations/observers?user_login=", user_login)

    get_json_call <- GET(url = call) %>%
      content(as = "text") %>% fromJSON(flatten = TRUE)

    if (is.null(get_json_call)) {
      observer_num_observations <- tibble(user_id = NA,
                                          user_login = user_login,
                                          observation_count_iNat = NA)
      observers_num_observations <- rbind(observers_num_observations, observer_num_observations)
      cat(num_results, 'usuario:', user_login, '--> NOT FOUND', '\n')
    }
    else {
      results <- as_tibble(get_json_call$results)
      observer_num_observations <- tibble(user_id = results$user_id,
                                          user_login = results$user.login,
                                          observation_count_iNat = results$observation_count)

      observers_num_observations <- rbind(observers_num_observations, observer_num_observations)
      cat(num_results, 'usuario:', user_login, '--> DONE', '\n')
    }
    num_results <- nrow(observers_num_observations) + 1
  }
  return(observers_num_observations)
}

It is probably written in a too complicated way, but it does the job 🥹

Let’s run it

To run the function, we need to provide a list of users’ user_logins. Let’s use as an example the previous list.

 [1] "noelia"              "romigaleota"         "goncrisdi"          
 [4] "jorgejuanrueda"      "beln15"              "ceciliapomboposente"
 [7] "patriciabidondo"     "weba69"              "bert_in_the_skirt"  
[10] "vanesssa_v"         

When we run it, the function prints in the console the users’ user_login that it is assessing, so we can have an idea of the progress.

NatUY_users_assessment <- get_observers_num_observations(NatUY_users_selection$user_login)

If the user is found it will print the name and --> DONE, while if it’s not found, it will return --> NOT FOUND.

1 usuario: noelia --> DONE
2 usuario: romigaleota --> DONE
3 usuario: goncrisdi --> DONE
4 usuario: jorgejuanrueda --> DONE
5 usuario: beln15 --> DONE
6 usuario: ceciliapomboposente --> DONE
7 usuario: patriciabidondo --> DONE
8 usuario: weba69 --> DONE
9 usuario: bert_in_the_skirt --> DONE
10 usuario: vanesssa_v --> DONE

How do results look?

In the end, we get a table with the counts.

user_iduser_loginobservation_count_iNat
11503noelia162
1255162romigaleota4218
1368063goncrisdi23141
1469479jorgejuanrueda7001
1569449beln15133
2348924ceciliapomboposente158
2640130patriciabidondo387
2988066weba691728
3262606bert_in_the_skirt204
4109374vanesssa_v179

Finally, we merge the results with our original table and count the proportion of records from each user that were recorded in Uruguay. We can even make a guess of who is Uruguayan (those that did more than %30 of their observations in Uruguay), see variable esUruguaye.

user_iduser_loginobservation_count_UYobservation_count_iNatproporcionesUruguaye
11503noelia15816297.531si
1255162romigaleota20442184.836no
1368063goncrisdi125231410.540no
1469479jorgejuanrueda13270011.885no
1569449beln15133133100.000si
2348924ceciliapomboposente10515866.456si
2640130patriciabidondo15238739.276si
2988066weba6916517289.549no
3262606bert_in_the_skirt20120498.529si
4109374vanesssa_v14717982.123si

Conclusions

From the total of 1,788 users, 1,282 are Uruguayans (i.e., have recorded more than 1/3 of their observations in Uruguay), while 517 are not. We also found 11 users that seem to have deleted their accounts in the platform and, thus, they were not found.

That’s all folks!

Florencia Grattarola
Florencia Grattarola
Postdoc Researcher

Uruguayan biologist doing research in macroecology and biodiversity informatics.