Documenting butterflies with the help of citizen science in Darjeeling-Sikkim Himalaya, India

: The availability of information on the distribution and occurrence of different species in a landscape is crucial to developing an informed conservation and management plan, however such information in the Himalaya is often limited. Citizen science, which builds on the knowledge and interest of communities to contribute to science, can be a solution to this problem. In this study, we used butterflies as a model taxon in the Darjeeling-Sikkim Himalaya which shows how citizen science can aid in documenting biodiversity. The study employed both citizen science, and researcher-survey approaches to collect data, and the collective effort resulted in 407 species, which is the highest by any study carried out in the region. Results show that citizen science can be helpful as a supplementary tool for data collection in biodiversity documentation projects, and can aid in adding to the diversity and distribution records of species, including those that are unique, rare, seasonal, and nationally protected. Citizen science outreach was used to muster potential participants from the local community to participate in the study. Thus, it is advisable for citizen science projects to find means to recruit a larger pool of contributors, and citizen science outreach can be key to their success.


INTRODUCTION
Citizen Science (CS), which is an approach of involving the public in scientific research, has long been used to supplement collection of data required to answer research questions (Spear et al. 2017), or to document rare events in nature (Greenwood 2007). In recent years there has been an increase in the trend of using CS as a tool in research, documentation, and monitoring (Feldman et al. 2021), with a number of projects using this approach to create awareness, and as a means to engage with the local communities. This has been facilitated by the availability and development of userfriendly applications on smartphones (Land-Zandstra et al. 2016), improved internet facilities, affordable rates for internet access, and most importantly the growing popularity and the scope of CS activity (Curtis 2014). In addition, funding opportunities to implement CS related outreach activities may also have positively influenced this sharp rise (Johnson et al. 2014). The biggest advantage of using CS as a data collection tool is the assumption that vast amount of data can be collected by this approach, as the citizen scientists that this approach targets are mostly local communities who have yearlong access to areas not feasible for researchers to frequently survey or monitor due to limited time & financial constraints (Dickinson et al. 2010).
Participants of CS projects can consist of volunteers from all age-groups, different walks of life, and can be involved in a variety of roles at different stages of the study (Tulloch et al. 2013;Theobald et al. 2015). CS projects can be used in almost every field of research, ranging from marine science (van der Velde et al. 2017), to geography (Trojan et al. 2019), and from astronomy (Odenwald 2018) to biology (Greenwood 2007). This wide range of usability of CS and engagement of enthusiastic citizen scientists has enabled data collection over long periods, and covering larger gradients (Poisson et al. 2020). The use of CS in biodiversity documentation and monitoring is an example of one such long term CS engagement, and this has been dominated by projects involving a few taxa (like birds, butterflies, moths, and dragonflies), probably due to their aesthetic appeal which interests a lot of citizens to participate and contribute (Callaghan et al. 2021). However, despite the interest and the willingness of the citizens to participate and contribute in these projects, a major challenge that hinders the progress of CS projects is the difficulty to incorporate CS data into a research framework (Tulloch et al. 2013) due to the questionable issues associated with the data in terms of accuracy and precision, spatial, temporal resolution, robustness, and access (Hyder et al. 2015). Yet studies have shown that data collected through CS can be crucial for both the scientific community and decision makers (Paul et al. 2014).
Another challenge associated with CS projects is that not everyone is motivated to contribute to these CS projects due to lack of interest or material incentives (Land-Zandstra et al. 2016). The only benefit that the participants of these CS projects have is the opportunity to contribute to the world of science, public information and conservation (Silvertown 2009). Thus, CS projects that require large sample sizes must assess and understand the shared interest and unique motivations that drive their target citizen scientists to participate (Rotman et al. 2012;Wright et al. 2015), and also find means to incite motivation in them to participate (Schulwitz et al. 2021). This is where CS outreach comes into play. CS outreach brings in interested people under one platform and enables them to potentially participate in data collection (Silvertown 2009;Schulwitz et al. 2021). However, effectiveness of CS outreach needs to be tested rigorously in different fields of research, in different localities, and in studies involving different groups of participating volunteers.
As part of a project, "Key ecosystem services and biodiversity components in socio-ecological landscapes of Darjeeling-Sikkim Himalaya: deriving management & policy inputs and developing mountain biodiversity information system", an online Mountain Biodiversity Database and Information System or MBDIS (www. mbdis.in) was developed. A large part of the data in MBDIS came from CS activities implemented by the project. MBDIS was developed to be a comprehensive and interactive web-based database of biodiversity found in the Darjeeling-Sikkim Himalaya, so that students, academicians, researchers and practitioners working on biodiversity of the region could benefit from the information available here. A major component of MBDIS was to train and muster the participation of local community members to contribute photographic observations of biodiversity on already existing webbased citizen science portals. Targeted to involve local community members and nature enthusiasts from the region as citizen scientists, the project aimed at engaging them to generate new point records of biodiversity from the region, so as to create a baseline data that is accessible to anyone working on or interested to learn about the biodiversity of Darjeeling-Sikkim Himalaya.
The CS approach as a tool to collect biodiversity information is still a relatively new concept in the Himalaya, but has the potential to be an important tool J TT in biodiversity documentation (Devictor et al. 2010), as a large swathe of land falls outside the protected area regime in human-modified and -dominated landscapes where the communities are an important source of information. The Himalaya is one of the richest places on earth in terms of species diversity, however these landscapes are still poorly explored, and are vulnerable to increasing anthropogenic pressures, land-use change, and climate change. Thus, developing informed conservation and management plans require distribution and ecological information on species (Tobler et al. 2008), which is relatively scarce in the Himalaya.
The concept of CS is gaining rapid popularity in India and it is estimated that more than 25 CS projects in ecology are operational in India (Sharma 2019 Jaiswara et al. 2022). Similarly, distribution and locality record available on web-based CS platforms are cited and resulted in scientific publications (for example, The Biodiversity Atlas -India projects have resulted in more than 20 publications). Thus, highlighting the potential and importance of data gathered by citizen scientists in India.
Here, we present how CS can help in biodiversity documentation by adding to the data collected by the researchers. We also explore the effectiveness of CS outreach activities in mustering the participation of local communities and nature enthusiasts in such projects. The study uses butterfly observations as a proxy for this purpose, the reasons being: (1) butterflies are one of the most popular taxa among the local communities, (2) butterflies can be easily photographed by the local communities using camera phones, and thus can be uploaded into citizen science portals, (3) butterflies are one of the most diverse taxa in the Darjeeling-Sikkim Himalaya with 691 species (Haribal 1992;Kamrakar et al. 2021). Therefore, this paper also aims to add to the limited literature on distribution, diversity, and status of butterflies in the Darjeeling-Sikkim Himalaya.

Study area
The study was conducted in multiple sites across the Darjeeling-Sikkim Himalaya that fall outside the protected areas (Figure 1), which are characterized by traditional agricultural systems, historical tea plantations, and residential areas, interspersed by differently-managed forests. The landscape is an integral part of the Eastern Himalayan region of the Himalaya Biodiversity hotspot, and comprises the two hill districts (Darjeeling & Kalimpong) of West Bengal, and the Himalayan state of Sikkim in India. The region is also an important transboundary landscape sharing its boundary with Nepal, Bhutan, and China. The elevation here ranges 250->5000 m, and is traversed by three important river systems, Teesta, Rangeet and Balasan.

Data collection
Data collected during the study included GPS location, date, identity of the observer, photograph of the observation, and/or the species identity of the butterfly observed. These were collected by two different approaches: CS, and researcher surveys. Overall data were collected until 15 February 2021, while researcher survey data were collected between October 2018 and September 2021. Later, for comparative analysis, CS data was filtered to match the survey-location and time period of the researcher survey data.

Citizen Science Approach
In the initial stages of the study, information on different local institutions (like village council, clubs, committees, and local NGOs) actively working in the region were collected to identify key informants and organize inception cum awareness workshops in different villages (n = 22), prior to data collection. These workshops were organized as community consultations with a purpose to discuss the key components of the study, and also to seek coordination and partnership with interested groups and local institutions (as done by Pradhan & Khaling 2023). These partners were then approached in the later part of the project to organize CS outreach events in the landscape. CS outreach activities conducted during the study (n = 15) included CS workshops (n = 9), butterfly walks (n = 4), and butterfly documentation events (n = 2), and these were carried out in multiple locations across the study area ( Figure 1).
CS outreach activities were used to muster the participation of local community members during data collection. Here, CS outreach refers to the workshops, 22774 J TT butterfly-walks, and online documentation events (discussed in following paragraphs), that were organized with an aim to reach out to interested local community members in different localities across the landscape. Data collected using the CS approach included all observations uploaded on iNaturalist (www.inaturalist. org) (identified up to species level) from within the study area. In all these activities, the local communities were neither forced, nor paid in any way to contribute to the documentation process. Hence, the participation mustered by the project was fully dependent on personal interest of the local community.
CS workshops: These were conducted in nine spatially different villages across the study area (Figure 1), targeting school students, teachers and community members, with an objective to train them on how to photograph biodiversity and contribute their observations to iNaturalist, which is an online citizen science platform. Each of these workshops had a theory session, which was followed by a hands-on session, where participants were taken for a short field visit, where they were assisted with registration, and other technicalities associated with uploading photographic observations they recorded in the field.
Butterfly walks: These were organized in four different villages across the study area ( Figure 1), with an aim to muster participation of the local community members in documenting butterflies in their respective villages. During this event, participants were taken to a field location, where they were assisted by members of the research team on how to photograph butterflies, and how to upload their observations on iNaturalist. Each of these events lasted for 3-4 hours in the field.
Butterfly documentation events: These events were

J TT
organized during the Big Butterfly Month (a national butterfly documentation event in India held during the month of September) of 2020 and 2021, where the local communities across the study area were supplied with written and video instructions on how to photographically document butterflies and contribute them to iNaturalist. The butterfly documentation events were carried out through online medium due to COVID-19 related lockdown and safety restrictions that were in place during this period in India. These events were carried out across the entire landscape, and information about them were spread through local contacts of the project team, and through social media.

Researcher survey approach
Two researchers of the project team conducted surveys to document butterflies in different sites across the study area ( Figure 1). All species of butterflies encountered by the researchers in these locations were recorded along with their GPS coordinates. Additionally, butterflies were photographed whenever possible to aid in confirmation of species identities. Butterflies were identified using field guides (Kehimkar 2016), and webbased resources (www.ifoundbutterflies.org). To avoid confusion and double counts of the same species while data curation and analyses, taxonomic nomenclature used by iNaturalist was followed during the study.

Data Analysis
All the observations of butterflies from the Darjeeling-Sikkim Himalaya currently available on iNaturalist (accessed on 15 February 2023) were downloaded (n=5026) and those that have been identified to species level (n = 3,746) were filtered out. Since the two researchers conducting opportunistic surveys for this study are also active on this CS platform, observations added (n = 564) by them were removed from the final dataset, leaving only those records contributed by the local communities (n = 3,182). Among these, 101 were added before our project began (in October 2018), 1,291 during the project period, and 1,790 records after the project period (after September 2021) To create the researcher survey dataset (data collected by the researchers), the researchers directly submitted their data as excel sheets on MBDIS. The dataset contained a checklist of species recorded in spatially different sites, and was also accompanied by polygons of sampling locations in each study site.
A point-in-polygon analysis was performed in QGIS to find out how many of the CS records from the study area fell within the study site polygons (with a 500 m buffer). This was used to compare the datasets created from the CS approach and researcher survey approach. 294 CS observations were determined to fall within the study site polygons.
A circular polygon of 1-km radius was prepared around the workshop and butterfly walk locations, and CS records within these polygons were taken to evaluate the extent to which local communities participated in the outreach events. Similarly, to determine the level of engagement resulting from the butterfly documentation events, observations from the study area that were added on iNaturalist during the online documentation events in September of 2020 & 2021 were tabulated.
To understand the distribution of observations across the study area, and the level of engagement of individual citizen scientists, the study area was divided into grids measuring 5x5km, and the number of observations made in each grid, as well as the number of grids covered by individual participants were enumerated.

CS and Researcher data
By a combined effort of CS and researcher surveys, 331 species of butterflies across six families were recorded from the socio-ecological landscapes of Darjeeling-Sikkim Himalaya (407 species, including those contributed outside the study period) (Table 1). Localities in the landscape from where these species were recorded can be seen in Figure 1.
Eighty-six species of the total recorded species of butterflies are protected in India, among which 12 species are protected under schedule I and 74 species are protected under schedule II under Wildlife Protection Act I972 (Amended through Wild Life (Protection) Amendment Act, 2022). Of the protected species, 66 species (38 within the study period) were recorded by the citizen scientists, while only 27 were recorded by the researchers.
The CS approach documented 1,717 observations resulting in 260 species belonging to six families within the study period, which increases to 4,307 observations (357 species) when we include records before and after the project period (Table 1). During the current study, the most common species observed and submitted by the citizen scientists from the study area was the Indian Tortoiseshell Aglais caschmirensis, which was observed 54 times by 37 participants, Popinjay Stibochiona nicea observed 31 times by 21 participants, Red Lacewing Butterfly Cethosia biblis, 28 times by 15 participants, CS-Citizen Science | RS-Researcher Survey | WPAA (2022)-Wildlife (Protection) Amendment Act (2022) | --unrecorded or unlisted | *-recorded during the project period | #-recorded outside project period | PS-recorded from a project site | BW-recorded after butterfly walks | WS-recorded after workshop | OErecorded during the online documentation event.

J TT
Straight-banded Treebrown Lethe verma, 28 times by 22 participants and Punchinello Zemeros flegyas, 28 times by 22 participants. Similarly, the researcher survey approach was able to document 233 of the 265 species belonging to six families across the study area, during the study period. Again, Indian Tortoiseshell was the most common species which was observed in all sites surveyed by the researchers. Among the 331 species that were recorded during the study period, the CS dataset was found to have recorded 107 species that were unique from the researcher dataset, while 71 unique species were recorded by the researcher survey. This may be due to the limited number of sites that the researchers could survey within the study period, while CS data were collected from a larger spatial area. A point in polygon analysis was performed to compare the two datasets collected from the same study sites (with a 500 m buffer) and from within the same time period. The results showed 427 observations made by 33 CS participants, which amounted to 131 species, with 32 species unique from the researcher data.

CS outreach and participation
One-hundred-and-seventy community members participated as citizen scientists in the butterfly

J TT
documentation project on iNaturalist during the course of the current study. Forty participants contributed to the database more than 10 times (Figure 2), with the highest record of 178 submissions from the same participant (out of which 120 have been identified to species level, till date). A majority of the citizen scientists in the current study, contributed their observations from a limited number of spatial locations. Yet, a few participants appeared to record and submit observations from multiple locations, with four participants submitting their observations from more than 11 spatial locations ( Figure 3). Three-hundred-and-eighty community members participated in nine CS workshops, while the four butterfly walks and two online documentation events had participation of 63 and 81 community members, respectively. The workshops and walks yielded 84 and 492 observations respectively, while 1,187 observations were made during the online documentation events.
The CS outreach during the study in Darjeeling-Sikkim Himalaya resulted in 62.26% (amounting to 181 species) of all CS observations made from the study area during the study period, with 15.11% (92 species) of these being recorded from sites after at least one CS outreach event was organized, while 47.14% of observations (175 species) were contributed during the butterfly documentation events (Table 1 & 2). Results also show that the number of observations of butterflies contributed to iNaturalist from Darjeeling-Sikkim Himalaya sharply increased during the study period, and is still increasing even after the life of the project (Figure 4). Since the end of the project, 144 users have  contributed butterfly observations from the region, of which 127 users joined iNaturalist after the end of the project.

DISCUSSION
The use of citizen science approaches in biodiversity documentation is gaining pace in both rural and urban settings across the globe, with the most effective programs targeting to engage local communities (Pandya 2012). However, the reliability of the CS datasets is still a topic of discussion among the scientific community (Chatzigeorgiou et al. 2016). The current study, which

Usefulness of CS in documenting butterflies across Darjeeling-Sikkim Himalaya
The current study was conducted in one of the global biodiversity hotspots and uses one of the most diverse taxa here, the butterflies, for this purpose. Butterflies are one of the most diverse taxa in the Himalaya, and Darjeeling-Sikkim Himalaya, where the study was carried out, is a hotspot for butterfly diversity, harboring 46% of all butterflies found in India (Sharma et al. 2020). There have been numerous studies to document the diversity of butterflies in these landscapes across both protected & non-protected areas, however no single study has been able to report even close to 50% of its butterfly diversity, the main challenges being the topographical, temporal, logistical and financial constraints to carry out surveys at a larger scale. This is where CS is very useful. The current study used the traditional researcher survey approach (where the number of researchers carrying out surveys, and number of sites that could be covered by them were limited due to logistical and financial constraints), and the CS approach (where the main challenge was to reach out to, and recruit as many potential citizen scientists as possible). Thus, with a mixed approach, the study was able to document approximately 48% (331 species) of total reported butterfly diversity from the region, which is higher than that reported by any other study conducted in the Darjeeling-Sikkim Himalaya till date, with the previous highest being 43% (268 species) recorded by Sharma et al. (2020). CS alone contributed 43% of the total, while also recording 107 species that were unique from the researcher dataset. The high number of unique, rare, seasonal, and nationally protected butterflies observed by the citizen scientists in the current study, suggests that CS can be an important tool when conducting distribution studies in data deficient corners of the world, as supported by Amano et al. (2016). This is also in line with other studies that suggest CS can effectively supplement data collection in a documentation project of a large scale (Spear et al. 2017). However, the result is contrary to belief that professional surveys report more endangered species and species of special interest for research (Galvan et al. 2021), and may be due to the limited number of professionals used in the current study. The study also reiterates the fact that CS as the only data collection tool (without the use of professionals) may not be able to fully deliver the desired outcomes in a biodiversity documentation project (Pernat et al. 2021).
The use of CS data (in breeding ecology of birds, monitoring migration of birds, bird counts, etc.,) has resulted in a number of publications in recent years (Donnelly et al. 2014;Arjun & Roshnath 2018; State of India's Birds 2020), thus providing evidence on the usefulness of CS data in scientific studies. However, these publications have often been criticized by the scientific community for using CS data due to issues associated with their value and quality. Some of the major challenges of incorporating CS in large projects include lack of organized structure, haphazard coverage, repeat counts, and lack of coordination (Rahmani et al. 2003). Yet, a number of studies have advocated that these challenges can be resolved with better research design, adequate training of citizen scientists, and ground truthing (Bird et al. 2014). Thus, in light of these debates happening across the scientific community, this study adds to the limited literature that supports the theory that large-scale long-term monitoring of biodiversity can be answered through the CS approach. This is especially true when the collection of data from a large area by researchers alone, requires vast amounts of budget, time and effort (Dickinson et al. 2010). However, success of these CS-based projects will depend on the extent of volunteer engagement and training, also called CS outreach (Mason & Arathi 2019).

CS outreach and participation
The current study used outreach materials, theory sessions, field-based training, and online events, as a part of CS outreach activities to overcome the challenges of recruiting citizen scientists across a large spatial area. Here, CS outreach activities conducted prior to data collection was found to be an important step in mustering the participation of target citizen scientists, which in this study were the local community members. Similar observations were made by Feldman et al. (2018). CS outreach has been found to be effective in reaching out to, and generating interest among the potential participants, and is thus useful in mustering local participation (example van der Velde et al. 2017). Among the CS outreach activities used in the current study, butterfly walks (which involved fieldbased training) were found to be the most effective in mustering local participation. Similar activities have been reported to be successful by other studies (example Matteson et al. 2012). Additionally, online butterfly documentation events which were supplemented with pinpoint instructions, were found to be an effective outreach event capable of reaching out to a larger

J TT
number of participants across a larger spatial area, and they hugely contributed to the final CS dataset. Online documentation events have also been found to be hugely successful in acquiring large amounts of data elsewhere (Moskowitz & Haramaty 2013), however these have been associated with the highest number of dropouts, meaning the citizen scientists who participate in these events eventually stop contributing once the event period is over (Aristeidou et al. 2021). This suggests that such events are not helpful in ensuring long term participation in science.
The outreach activities carried out during the study was able to create awareness among the local community members on the importance of biodiversity documentation, while also providing a platform for them to contribute to science. The impact made by the study, and the willingness of the participants to participate in such CS projects, can be observed from the fact that the number of observations uploaded on iNaturalist from the landscape sharply increased during the study period, and is still increasing even after the life of the project. However, despite the observable success of the CS outreach in terms of the number of observations, it was found that a large portion of data were contributed by precious few participants, while the majority contributed only a few records. This result exhibits a long tail distribution, as has been reported by other similar CS projects (Segal et al. 2015). Also, a select few participants were found to be contributing data records from multiple locations, while an average participant would only contribute data from a small area, suggesting that a participant is more interested in documenting biodiversity from locality that is easily accessible to the participant. This may also be due to the differences in levels of skill sets and motivation (West et al. 2021). These further suggests the need to reach out to a larger pool of citizen scientists from different corners of the landscape when planning a similar biodiversity documentation project in future, in order to find these precious few who can champion the documentation process, further emphasizing that reaching out to the right audience makes an immense difference to the success of a CS project.

Conservation implications
Developing informed conservation and management plans require distribution and ecological information on species (Tobler et al. 2008), which in the Himalayas are limited. The current study shows how CS can contribute to adding important locality records of rare and lesser known butterflies species, which would remain undocumented without local participation. Thus, CS which effectively accentuates the potential of local communities as knowledge partners, can be a solution to this challenge of limited information on biodiversity. However, this requires good planning, execution, and need for an efficient CS outreach program, has been suggested here. CS outreach, apart from being a means to recruit citizen scientists as data contributors, also has an immense potential in creating awareness, and can be effective in bridging the gap between humans and nature. The role of knowledge-building programs that promote CS, is important in creating positive influence on attitudes and behavior towards biodiversity has also been recently highlighted from the same landscape (Pradhan & Yonle 2022). This further adds to the importance of CS in conservation.

Study perspectives
The study presents how citizen participation in a biodiversity documentation project can aid in adding to the diversity and distribution records of different species, including those that are unique, rare, seasonal, and nationally protected. In the current study, the participation of the citizens was purely interest-based and depended on the participant's interest to learn and record biodiversity from his/her locality. Through this study, the participants gained knowledge and awareness on the local biodiversity, and were provided with a platform where he/she could contribute important biodiversity data. Some of the citizen scientists whose participation was acquired during the study period are still actively contributing to the platform, which shows that they would participate and contribute again. Thus, provided that similar future projects manage to reach out to interested sections of the community, the citizens would be willing to participate in such projects in the future.
Although the goal of the study was to muster as many CS participants as possible from the study area, the current study could only muster limited participation of local communities due to logistical, financial, and time constraints. Also, limited internet connectivity and lack of camera phones with a number of interested participants, hindered the community participation. Hence, if similar studies are carried out in future, CS outreach events that encourage the participation of local communities and help reach out to interested participants, should be organized in multiple locations, and in different seasons. These outreach activities can also be planned in such a way that different events target different potential groups, like students, J TT teachers, farmers, nature guides, etc. This would help in maximizing the number of participants, and thus will maximize the number of observations from within the study area. Similarly, gathering basic information about a participant like, gender, age, occupation, education, etc., would give meaningful insights into the attitude, behavior, and motivation of the participating citizens.

CONCLUSION
CS can be an important tool to fill the spatial gaps in global biodiversity information, and thus can have a crucial role in the data deficient and poorly explored parts of the Himalaya, a global biodiversity hotspot. The study found that conducting CS outreach activities at the field-level prior to data collection, and online events that have the potential to reach out to a larger pool of citizen scientists is beneficial for the overall success of a CS project. The results of the current study show that the CS approach can be a useful supplemental tool in collecting distribution data, as citizen scientists (local communities in this study) have yearlong access to sampling sites. Thus, the study advises other biodiversity documentation projects in data deficient areas to try and accommodate the CS approach in data collection. Finally, MBDIS that aims to incorporate both CS and researcher data in the Darjeeling-Sikkim Himalaya can have immense potential to bring together both the scientific as well as nature enthusiasts of the region under one platform, thus creating an opportunity for the local communities to contribute and learn about the biodiversity of the region.