DNA barcoding of a lesser-known catfish , Clupisoma bastari ( Actinopterygii : Ailiidae ) from Deccan Peninsula , India

DNA barcoding substantiates species identification, and simultaneously indicates the misnomer taxa. Based on the morphological descriptions, we identified a lesser-known catfish, Clupisoma bastari, from Godavari River basin, and contributed novel DNA barcode data to the GenBank. The Kimura 2 parameter genetic divergence between species, and the neighbour-joining phylogeny clearly depicted a distinct clade of C. bastari in the studied dataset. Clupisoma bastari maintained sufficient K2P genetic divergence (8.3% to 11.2%) with other congeners, and branched as a sister-species of C. garua. The present study highlights possible existence of a few misnomer taxa in the GenBank. We encourage further extensive sampling of different congeners of Clupisoma from a wide range of habitats to explore the species diversity and phylogenetic relationship.


INTRODUCTION
The genus Clupisoma Swainson is classified under a newly set up family Ailiidae, and is currently comprised of nine valid species (Wang et al. 2016;Fricke 2020), distributed across Salween basin in Yunnan, China, to westward Indus basin in Pakistan (Jayaram 1977;Ferraris 2004;Chen et al. 2005). Among them, four species are distributed in Indus, Ganges, Brahmaputra, and Godavari basins, in India. Clupisoma bastari Datta & Karmakar (1980) was described from Indravathi River, a tributary of river Godavari in peninsular India. Due to its limited distribution, the species has been poorly studied, and it was once categorized as 'Endangered' (Molur & Walker 1998). The species is currently categorized as 'Data Deficient' in the International Union for Conservation of Nature Red List and referred to as extant resident of the State Chattisgarh in central India (Dahanukar 2011). Apart from a few studies on the length-weight relationship, and food and feeding habit, on a collection of specimens during 1997-98 from upper Godavari basin (Bhowate & Mulgir 2006, the species was sometime reported from Ravi Shankar Sagar reservoir and from Tapti river in the central Mahanadi basin (Desai & Srivastava 2004;Siddiqui & Pervin 2017). C. bastari was not enlisted in the updated checklist of ichthyofauna of Eastern Ghats as well as studies from other localities within the Deccan Peninsula (Barman 1993;Devi & Indra 2003;Johnson et al. 2012;Laxammappa & Bakshi 2016). C. bastari has been presumably overlooked in the earlier studies due to misidentification of Clupisoma congeners in India.
Besides traditional taxonomy, the molecular data is effectively evidenced to identify and distinguish freshwater fishes around the world (Hubert et al. 2008;Ward et al. 2009;Steinke et al. 2009;April et al. 2011;Collins et al. 2012). Several small to large-scale attempts have been endeavored to build-up the DNA barcode reference library of freshwater fishes from India and neighboring countries, aiming to quick and reliable species identification and to illuminate species diversity from different biogeographic zones (Khedkar et al. 2014;Chen et al. 2015;Barman et al. 2018;Laskar et al. 2018;Kundu et al. 2019;Rahman et al. 2019). Although, the GenBank database holds several publicly available DNA barcode sequences of Clupisoma species, the genetic information on C. bastari was lacking. We studied C. bastari from central Godavari basin surrounding its type locality and generated the DNA barcode data to fill the gap of knowledge.

Specimens of Clupisoma garua were collected from
Mahanadi river basin, Odisha; and C. bastari from two different localities in Godavari River basin in Deccan Peninsula, India (Figure 1) (Ward et al. 2005): FishF1-5′TCAACCAACCACAAAGACATTGGCAC3′ and FishR1-5′TAGACTTCTGGGTGGCCAAAGAATCA3′ was used to amplify the partial cytochrome oxidase subunit I gene (mtCOI) in a Veriti® Thermal Cycler (Applied Bio systems, Foster City, CA). The 30 µl PCR mixture contains 10 pmol of each primer, 100 ng of DNA template, 1 × PCR buffer, 1.0-1.5 mM of MgCl2, 0.25 mM of each dNTPs, and 1U of Taq polymerase (Takara BIO Inc., Japan). The thermal profile comprised of an initial step of 2 min at 95 °C followed by 35 cycles of 0.5 min at 94 °C, 0.5 min at 54 °C, and 1 min at 72 °C, followed in turn by 10 min at 72 °C and subsequent hold at 4°C. The PCR products were further purified using QIAquickR Gel extraction Kit (Qiagen, Valencia, CA). The cycle sequencing and Sanger sequencing was executed commercially. Both forward and reverse chromatograms were checked through SeqScanner V1.0 (Applied Biosystems Inc., CA, USA), nucleotide BLAST (https://blast.ncbi.nlm.nih.gov/), and ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) to trim the low quality reads and gaps. The COI barcode sequences of C. bastari and C. garua generated in this study are available in GenBank and the accession numbers are reflected in the phylogenetic tree. Further, the sequences of nominal Clupisoma congeners were downloaded from the GenBank database to form a combined dataset for estimating genetic distance and phylogenetic analysis. However, a few sequences of nominal C. garua (accession numbers: KX455904, FJ459470, FJ459471, and MN259175) were not included in the final dataset assuming that these are probably conspecifics of Silonia silondia as observed in test of phylogeny covering all the available sequences of the J TT family Ailiidae and Scilbeidae from NCBI database. The sequence of Ailia coila (MN083152) was used as an out-group in the phylogenetic analysis of the Clupisoma congeners. The dataset was aligned using ClustalX (Thompson et al. 1997) and Kimura 2 parameter (K2P) genetic distances and neighbor-joining phylogeny using K2P were generated by using MEGAX .

RESULTS
The specimens were morphologically identified following the taxonomic descriptions (Hamilton 1822;Hora 1937;Datta & Karmakar 1980;Ferraris 2004). Clupisoma bastari (Image 1) is identified based on the combination of following morphological characters: body elongate and compressed, abdominal edge keeled from vent to thorax, snout bluntly pointed, eyes large, visible from ventral surface, mouth subterminal, cresentic, upper jaw slightly longer, teeth villiform in bands in both jaws, vomero-palatine band interrupted in middle. Median longitudinal groove on upper surface of head extends to hind border of eye. Barbels four pairs, maxillary barbels extending to anal fin base, inner mandibular barbels longer than outer mandibular barbels, both the mandibular barbels are longer than head, nasal barbels extend to posterior edge of eye. Rayed dorsal-fin inserted above middle of pectoralfin, dorsal-fin with a strong spine serrated internally, adipose dorsal-fin above the last quarter of anal-fin base, pectoral-fin with a strong spine serrated internally, pelvic-fin ends before anal opening, caudal-fin deeply forked.
Although, the length of maxillary barbel and the extend of keel in abdominal edge place C. bastari in between C. garua (Hamilton, 1822) and C. prateri (Hora, 1937), but it is sufficiently distinct from them by the combination of other morphological characters, such as lengths of pectoral fins and maxillary barbels. Further, in C. garua, adipose fin is absent and anal fin is short while in the Burmese species C. prateri, the branched anal fin rays counts in the range from 37 to 42 (modally 39)

J TT
and the abdominal edge keeled throughout. However, in C. prateri, maxillary barbel extends up to middle of pelvic, mandibular barbel reaches base of pectoral, and pectoral reaches pelvic origin. These morphological differences are sometime indiscernible leading to incorrect identification among the three species.
The generated DNA barcodes of C. bastari (accession numbers: MF601325 and MT821302) maintained 9.9% K2P genetic divergence with our generated sequence of C. garua (accession number: MG572775) as well as with the database sequences of topotypic C. garua, and similarly with the other congeners (Table 1). The NJ phylogeny revealed the occurrence of four species clades with a distinct lineage of C. bastari (Clade-2) in the studied dataset ( Figure 2). The Clade-1 is unexpectedly included by sequences of three following nominal taxa maintaining very low genetic divergence of 0.6%: our own studied C. garua (Mahanadi River basin), C. garua (Barak River basin, Ganges River basin, and Narmada River basin in India; Surma River basin, Meghna River basin, and Sundarbans in Bangladesh), C. prateri (Narmada River basin in India, and Surma River basin and Sundarbans in Bangladesh), and C. longianalis (Huang, 1981) (Mekong River near its type locality).
The present genetic analysis evidenced the presence of misnomer taxa named as C. prateri and C. longianalis nested in C. garua clade-1 (Figure 2). The studied species, C. bastari (Clade-2) along with one database sequence (Clupisoma sp. JX260854 generated from Godavari River) showed 0.2% intra-species genetic divergence and maintained 9.9% K2P genetic divergence with C. garua (Clade-1) and 10.0-11.2% with other two clades (Clade-3 and Clade-4) ( Table 1). The Clade-3 is comprising of three database sequences of C. sinense from Mekong basin. The Clade-4 is comprising of two database sequences (Accessions: MN178280 and KY909150) with the name C. garua, but the clade is distinct from the topotypic C. garua (clade-1) and also maintains sufficient species level genetic distance with the congeners. In NCBI database, no sequence is available with the name C. Montana. However, the two sequences (MN178280 from Ghaghara River, Nepal; KY909150 from Ranganadi River, Arunachal Pradesh, India) are presumed as possible lineage of C. montana and tentatively assigned as C. Montana having type locality in Teesta River, India.
The BIN list in public data portal in Boldsystem revealed four distinct BINs in the Clupisoma. The species, C. bastari, was assigned a distinct BIN: BOLD:ABY1142. There are two different BINs for the sequences named as C. garua. A few of the sequences named as C. prateri are included in one of the BINs of C. garua. Similarly, two sequences included in one of the BINs of C. garua appear as a misidentified case which we tentatively assigned as C. Montana.

DISCUSSION
Among all the congeners, C. garua is a widely distributed species and listed frequently in several freshwater fish inventories (Gupta & Banerjee 2016; Bhakta & Sonia 2020). However, the report of occurrence of C. garua from Godavari basin is doubtful. One of the sequences of C. garua from Barak River basin (JN628921) in this clade-1 was also morphologically identified as C. garua by the first author in previous

J TT
studies (Bhattacharjee et al. 2012). Further, the sequences (JX983272 to JX983278) named as C. prateri sampled from Narmada basin have been corrected as C. garua (Khedkar et al. 2014). Nevertheless, C. prateri was originally described from Irrawady drainage in Myanmar. Later on, another species C. roosae Ferraris (2004) was described from the same river. But, no sequence information is available for C. roosae. Although, plethora of studies suggest the occurrence of C. garua in south Indian waters, but, no such specimen was observed in the Krishna River in Andhra Pradesh and the Godavari River in Telangana. We suggest further examination of C. garua using molecular data from southern Indian waters. Based on the morphological characters, C. montana and C. naziri Mirza & Awan (1973) (type locality Indus River basin Pakistan) were placed into one group having abdominal edge rounded while that is keeled in C. garua, C. bastari, and C. prateri (Datta & Karmakar 1980). Clupisoma montana is also a poorly known species and has been occasionally reported from central India (Johnson et al. 2012), Bihar (Gunasekar & Isaac 2017) and part of lower Brahmaputra basin in Assam (Saha & Bordoloi 2009). Besides, a few haematological and biological studies on C. Montana are also available (Grover et al. 1999). Therefore, further DNA barcode data of C. montana from its type locality will ease to understand the phylogeny and distribution of this species in a precise manner.
DNA barcoding uses genetic information of an agreed upon segment of mtCOI gene for efficient discrimination of animal taxa at species level (Hebert et al. 2003). With the application of this advanced technique, taxonomic comparison becomes an easy task (Tautz et al. 2002). This tool also effectively utilized for below the species level identification, cryptic species or species-complex detection through intra-and inter-species barcode gap assessment (Blaxter 2003). With the improving trends in DNA barcoding, the ichthyofaunal diversity has been largely explored throughout the world including India.
As of now a total of 11,613 DNA barcode sequences of class Actinopterygii have been generated from different biogeographic realms in India and deposited in the Barcode of Life data system (Accessed on 3 August 2020), and even GenBank consisted more than that. The present study contributes novel barcode sequences of morphologically identified lesser-known C. bastari to the GenBank database.

Data availability
The data that support the findings of this study are openly available at NCBI GenBank database at (https:// www.ncbi.nlm.nih.gov) with the accession number (MF601325, MG572775, and MT821302), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

www.threatenedtaxa.org
The Journal of Threatened Taxa (JoTT) is dedicated to building evidence for conservation globally by publishing peer-reviewed articles online every month at a reasonably rapid rate at www.threatenedtaxa.org. All articles published in JoTT are registered under Creative Commons Attribution 4.0 International License unless otherwise mentioned. JoTT allows allows unrestricted use, reproduction, and distribution of articles in any medium by providing adequate credit to the author(s) and the source of publication.