eBird Data Challenge 20


datachallengepromo

Help spread the word. Share this image on Facebook and Whatsapp!

The eBird database for India is now very large, and is growing rapidly. But what can be done with this information? Are you interested in helping generate new ideas, patterns and results from the observations the database contains? If so, read on, or click on a heading below to jump directly to the section.

Results now available!

Background

Despite a long history of ornithology in India, many aspects of current bird distribution, abundance and movements are still poorly known. In early 2014, birdwatchers in India started using the eBird platform to record their sightings, and the information aggregated so far forms one of the largest online and publicly available databases on any aspect of Indian biodiversity (see The Database, below).

Collating information into one place is only the beginning. The next task is to convert the raw data into meaningful information which can lead to a better understanding of Indian birds, for education, research and conservation.

The Challenge

The challenge is simple:

  1. Think of a question you wish to answer or a problem you would like to tackle (see the Ideas section below).
  2. Download all eBird data from India, or only a subset (you can specify region, species, and date range if you wish).
  3. Use the eBird data to address this question or problem. You can do so by creating one or more maps, graphs or other kind of output (e.g. animation).
  4. Send your findings to us (see Entering the Challenge).

Who can get involved

Anyone interested in birds and/or data!

  • If you have an idea for a question/problem to apply to the data AND you have the technical skills to tackle this, you’re all ready to go ahead!
  • If you are a birdwatcher or a conservation group and have a question/problem you are interested in, but don’t have the training or skills to work with data, please submit your suggested question/problem to the Question Pool. Perhaps someone else who has the technical ability will take up your suggested question and tackle it.
  • If you have the technical skills, but don’t know what question would be meaningful, take a look at the suggestions in the Question Pool – if one of these strikes your fancy, let us know in the comments below and we’ll put you in touch with the suggester so that you can work together on this.

Remember, the challenge can be taken up by individuals as well as groups – so don’t hesitate to join forces. Collaboration is good!

Entering the Challenge

To enter the challenge, first register your team. Your team can be just you, or a larger group, but there should be a single contact person specified. We ask that you register so that we can be sure to keep you informed about any updates to the challenge, and to let you know of ideas contributed into the Question Pool. The deadline for registering is extended to 20 December 2016.

Once you have registered, download the data (see The Database below) and work on tackling the problem you would like to solve. Your output can be in any form: from one or more maps or charts, to an animation, all the way to an interactive website or app! The main idea is to state a clear question or purpose of your work, to describe in detail how you processed the raw data to arrive at your output, and then to reflect on what the output tells us about the original purpose.

Entries must be submitted by  31 December 2016 (extended from 15 December) and will consist of the following pieces of information:

  1. About you and your group (names, contact details, etc.)
  2. The question or problem that you set out to tackle
  3. A detailed description of how you processed the data, accompanied by  programming scripts, if you used them
  4. A link to where we can see or download the output
  5. Your thoughts on the answers to the question or problem stated in #2, based on the output you have generated.

Evaluation Process

Entries will be evaluated by a panel of jury members, whose composition is as follows:

Entries will be evaluated on the following criteria:

  • Novelty, creativity and importance of the question or problem being tackled.
  • Care and thought put into data processing and quality control.
  • Design of the output produced, such that it be understandable to and usable by a broad audience.
  • The connection made between the output and the original question or problem.

Prizes!

As a small incentive for helping understand Indian birds better, we are offering an overall prize for the best entry (to be announced, worth approx. Rs 10,000), plus two special mention prizes (each worth approx Rs 5,000). Details will be announced soon.

Ideas

A large variety of questions can potentially be tackled with the information in the eBird India database. Some brief examples are given below – they would have to be made more detailed and specific for this challenge – as well as some more detailed examples on the next page.

  • Gaps. Where and when are there gaps in information on birds? These can be gaps in both space and time (e.g. season). From this can we set some priorities to encourage birdwatchers to fill important gaps?
  • Birder behaviour. Where, when and with whom do birders go birdwatching? How far do they travel, and what are their eBird listing habits? Understanding how birdwatchers behave can lead to better design of citizen science projects like eBird.
  • Individual species. What is their distribution and seasonality? Can we detect local movements? Are there influences of habitat?* In flocking or colonial species, how does the number of birds counted vary by month or season? What is the breeding season of different species, and does this change geographically? Is there adequate information from different sites/regions to answer these sorts of questions?
  • Species diversity. How does the number of species change over time and space? What aspects of habitat or weather might influence this?* In a specific region, how many lists or (birds counted) does it take to get to the total number of bird species?
  • Locations or regions. What species are found in a specified location or region? Can we assess the adequacy of the data available to answer this question?
  • Detecting potential errors. In a large database like eBird, some errors are bound to creep in, and some (perhaps many) seeming errors are in fact not errors at all. Are there ways to flag possible errors such that observers can be requested to provide more information, to strengthen the database as a whole?

*For these questions, eBird data alone may not be sufficient, and other data sources may need to be brought in.

Who knows, perhaps you’ll come up with a great idea and find an interesting answer, and your work could be published in a formal outlet!

The Database

The eBird database currently contains just over 4 million records of birds from India. However, the fundamental unit of observation in eBird is not a single record, but rather a list (and it’s possible that a list contains only a single record). There are around 200,000 lists from India, and many, if not most, useful analyses of eBird data are conducted at the level of a list, rather than that of a record. For example, to look at where eBird data come from, one would plot a map of list locations. Or, if we wanted to plot the distribution of a species, we would ask not only which lists contain the species, but also which lists do not. (For this, we would choose to analyse only ‘complete’ lists, that is where all species seen or heard were reported.)

Most of the data in eBird from India come from 2014 and later, but a number of people have uploaded older records as well. Do consider exploring and using these historical records if they can help to answer your question.

When you download the eBird data, each row in the database corresponds to an observation, with all observations that come from a single list sharing a common ID code (called SAMPLING.EVENT.IDENTIFIER), so that you can collapse the data to the level of the list if you so wish. Various metadata about each list are also available in other columns, including the location (with State, District, latitude and longitude), start time, duration, distance covered, eBird protocol followed, whether the list is complete, and so on. These metadata fields are very important to consider when summarising and analysing eBird data.

Please also carefully read the terms of use document that comes together with the data download. In brief, eBird permits use of the data for research and education; any commercial purpose must receive written permission from eBird. Please do not send the downloaded data to others; anyone else interested should please download the data for themselves directly from eBird.

Getting Started

eBird data are made publicly available every quarter through the eBird Basic Dataset (EBD). You can request access to the dataset, and when doing so, please state that this is for use in the eBird-India data challenge. When you are given access, you will receive an email. This may take 2-3 days at most; and do check your spam folder as well!

When downloading the data you can specify which region, species or date range you wish to download. The download format is a tab-separated text file. If you download the full India data, the file is some 110Mb in size (as of September 2016). Do consider if you want to download data for a single State instead, or for specific species. On the download form, there is also a checkbox to download “unvetted data”. These are records that have been flagged as unusual, but have not yet been verified. For most purposes, one should not use the unvetted data; and in general, any records marked with a ‘1’ in the ‘APPROVED’ column should be ignored/removed.

The download package contains a detailed description of each of the columns in the tab-separated file; please read these carefully.

Please note that the data is in spreadsheet format, but several standard spreadsheet software programs (including Excel) will not be able to open a file with 4 million rows. For this reason, if you want to work on the full India dataset, you will need to use more flexible software like R, Python, Matlab, etc. In the coming days we will post some tips for how to handle large volumes of eBird data. If you need specific help, please let us know in the comments below, or on the Facebook event page.

Some Cautions

When analysing data from eBird, do take some time to think about the quality and accuracy of data that you need. eBird has a detailed set of quality control processes, and over 70 volunteer reviewers help with eBird data quality in India. However, despite best efforts, in a database this size, there will be errors that creep in; and there are also other reasons to carefully think about which parts of the database you should use. You may have to subset the overall dataset to include only those lists/records that meet your needs and quality requirements. Some specific cautions:

  • Location accuracy. The geographical precision of lists in eBird varies from list to list. For example, even though each list is associated with a specific latitude and longitude, the birds recorded on that list may come from a very large area. This can usually be discovered by looking at the distance travelled field. If you need high location precision for your project, you may have to filter out ‘Travelling’ lists that cover a large distance, and/or those where distance is not specified. Similarly, lists are often tagged to a particular hotspot location, which means that the precise place where a list was created may be several kilometres from the lat-long of the hotspot. The geographical precision of such lists is likely to be relatively low.
  • Complete lists. If you intend to look not only at the presence of a species, but also absence (or more accurately non-detection), then you will want to use only those lists that are ‘Complete’ (ie, where all species seen have been reported). This would be needed if, for example, you wanted to examine the frequency of reporting of a species over space or time. ‘Incidental’ or ‘Casual’ lists are by definition incomplete, and should be removed from such analyses. For all other lists, you would want to use only those in which the observer has stated that all species have been reported ( ALL.SPECIES.REPORTED=1 in the data file).
  • Detection probability. Even while using only ‘complete’ lists, as described above, please keep in mind that the absence of a species from a list doesn’t mean that it was necessarily absent from the area. Absence from the list could mean that the species was simply not detected and identified by the observer, even though it was actually present in the area. The degree to which this can affect your results depends on the probability of detection of the species, which in turn depends on a number of things, including how easy the species is to detect and the duration/distance of the list. Although interesting analyses can be done even when ignoring the complication of varying detection probability, please do keep in mind that true absences cannot be inferred from a list.
  • Possible errors in effort. Some lists will appear odd, for example being of very short duration (e.g. 5 min) but reporting very many species (e.g. 60 species). One can also see the opposite – long duration (e.g. 2 hrs) or distance lists (e.g. 5 km) with very few species (e.g. 3 species), even though the list is marked ‘complete’. This is most likely a mistake made while uploading the list, and you may want to look for and exclude such lists, depending on your purpose.

Resources

If you have suggestions for more resources to add in this section, please let us know at skimmer@birdcount.in.


Leave a Comment

20 thoughts on “eBird Data Challenge

  • Hiren Khambhayta

    Views for Suggestion no 5, Mike ebird have started profile page for each members but some way to contact is missing. Might b email I’d or any social media links or such. Will help to contact the checklist submitted for new locations who travel for birding.

    • Hiren Khambhayta

      Or for getting more data for any sort of research, like from last try for learning year around migration of Indian skimmer, had seen they move from Gujarat to chambal during Oct end through Rajasthan. But not much details from Rajasthan. Might have contacted few local birders who had recorded previously.

    • Lakshmikant Rajaram Neve

      Yes,communication / discussion with eBirders w.r.t.checklist, self checklist,any ID issues are not discuss.Some method is required.Regarding monthly challenge…declared by eBird,great participation by Birders,eBird of month declared …But what’s the outcome of eBird Monthly challenge not discuss…some method or links is essential.

  • Lakshmikant Rajaram Nevine

    The Common Name should be assign for each of IBA by eBird / BIRD COUNT OF INDIA or any other related agency wih definite identified boundries.When any eBird submit checklist from said IBA a previously decided Name of IBA should be display for eBirder.
    For example— Hatnur Dam is recent & only one IBA in Jalgaon District-Maharashtra.But eBirders submit their checklist like Hatnur dam,Hatnur dam & surrounding area,Tandalwadi,Damleft,Dam right,cannal side,IBA etc.
    Due to which IBA or Hotspot not identified some of the lists of same area.If you assign a Fix or particular ID for each IBA with decided boundaries /area future data analysis is correct for The said IBA.
    I myself use different names previously for different locations in same IBA …HATNUR.And always hesitate how I can give same name as IBA at different locations in same area / locations… sometimes more than 5KM apart from each other.When we analyse the IBA as one patch/area different locations( in same IBA) used by eBirder should identify as Single ID decided by eBird &not by eBirder.

    • Bird Count India

      Thank you Sir. One of the problems is that we do not have boundaries of all IBAs in digital form, otherwise what you suggest would be possible. At the moment, hotspot naming is not under the user’s control — but any personal location (ie, not suggested or added to a hotspot) is under the user’s control. For personal locations, the name given is less important than giving the precise geographical location (lat-long).

      In the future (we hope soon), eBird will move away from designating hotspots as point locations, and instead delineate hotspot boundaries — when this happens, all lists (including lists from personal locations) with latlong within the hotspot boundary will be aggregated into that hotspot for output. It will also be possible to have ‘daughter’ hotposts within a single ‘parent’ hotspot. For example, Hatnur Dam hotspot will be able to include Hatnur Backwaters as a daughter.

      • Lakshmikant Rajaram Nevine

        This IBA HOTSPOT issue is important for analysis of bird count,migration,Nesting & Nesting Behaviour, Inter – dependency of birds,etc.Importance is increased when Habitat is Mixed like water body,Farm land,Near by river bed,wet land,Jungle area,Grass Land & much more.

    • Bird Count India

      Sir, is this what you submitted?

      The Common Name should be assign for each of IBA by eBird / BIRD COUNT OF INDIA with definite identified boundries.When any eBird submit checklist from said IBA a previously decided Name of IBA should be display for eBirder.

      This was received by us on 2 Nov, but not added to the list of questions as it is a suggestion about an eBird feature rather than a suggestion for how to analyse the data. Please correct me if I’m wrong. Thanks!

  • LAKSHMIKANT RAJARAM NEVE

    How should I get male,female or juvenile population of a particular bird specis ref to area/location/habitat/field.How I locate breeding locations of migrated birds in India for observations?

    It is very difficult to identify & update data as male,female,juvenile in birds.more difficult Nonbr/br stages.Checklist should be link with respective male,female species photographs & audio calls.

    • Lakshmikant Rajaram Neve

      There is data collection method for each bird species in the tabular format as male/female/juvenile etc with breeding codes.99% checklists without this information.Yes it is very difficult also to submit all such information in field since you have to record bird after fractions of second of presence infront of you.
      But if any immediate link provisions or any offline data methods are there next to bird name to access photographs /audio calls/etc to identity male/female/juvenile/br-nonbr plumage…eBirder will take interest to log information as male/female after comparing the immediate data available infront of bird name.
      We have photographs & audio calls in media sections & one can access with name of species.Same data can be available with link infront of species name in checklist.Due to ready avability of such information eBirder will take interest to identify /login specific information.
      Here is our future need that photographs & audio calls should segregated with male/female marks.Then only it is easy for eBirder to observe/access / record /update information as male/female, from which respective population information is possible.

      Same thing can possible with breeding codes,I think so.If Breeding code is updated properly.

      Replied what is in my mind.Guide me accordingly.

  • Nathan

    Any updates as to when the results will be processed? Would be interesting to also see all the wonderful submissions to this challenge made public once this is done!

    • Aditya Nayak

      Thanks Ajay for the wishes and Birdcount India for the data challenge! All the entries look great, would it be possible to link the pdfs/images of all the entries so that it is accessible to us?