Identification

Title

A new machine learning approach to seabed biotope classification

Abstract

Files for use with the R script accompanying the paper Cooper (2020). Note that this script also uses files from `https://doi.org/10.14466/CefasDataHub.34`_ (details provided in script). Cooper, K.M. (2020). A new machine learning approach to seabed biotope classification. Science Advances. .. _`https://doi.org/10.14466/cefasdatahub.34`: https://doi.org/10.14466/CefasDataHub.34

Resource type

dataset

Resource locator

https://data.cefas.co.uk/view/19921

name: Cefas Data Portal

description: The Cefas Data Portal contains metadata records and data sets available to download and connect to in support of our commitment to open science. Data is available in the following formats: Binary download.

function: download

Unique resource identifier

code

CEFAS19921

codeSpace

https://data.cefas.co.uk

Dataset language

eng

Spatial reference system

code identifying the spatial reference system

Classification of spatial data and services

Topic category

environment

Keywords

Keyword set

keyword value

originating controlled vocabulary

title

GEMET, version 1.0

reference date

date type

publication

effective date

2008-06-01

Keyword set

keyword value

originating controlled vocabulary

title

SeaDataNet P03 parameter discovery vocabulary

reference date

date type

revision

effective date

2011-03-25

Keyword set

keyword value

originating controlled vocabulary

title

SeaDataNet P02 parameter discovery vocabulary

reference date

date type

revision

effective date

2011-03-25

Keyword set

keyword value

originating controlled vocabulary

title

GEMET - INSPIRE themes, version 1.0

reference date

date type

publication

effective date

2008-06-01

Keyword set

keyword value

originating controlled vocabulary

title

SeaVoX Vertical Co-ordinate Coverages

reference date

date type

revision

effective date

2010-05-18

Keyword set

keyword value

originating controlled vocabulary

title

MEDIN metadata record availability

reference date

date type

publication

effective date

2012-01-11

Keyword set

keyword value

originating controlled vocabulary

title

MEDIN metadata record availability

reference date

date type

publication

effective date

2012-01-11

Geographic location

West bounding longitude

1.73881

East bounding longitude

1.74086

North bounding latitude

52.4595

South bounding latitude

52.4581

Temporal reference

Temporal extent

Begin position

1969-03-30

End position

2018-01-11

Dataset reference date

date type

publication

effective date

2019-07-05

date type

revision

effective date

2024-07-12

date type

creation

effective date

2019-07-05

Frequency of update

notPlanned

Quality and validity

Lineage

Files include: BiotopePredictionScript.R (R script), EUROPE.shp (European Coastline), EuropeLiteScoWal.shp (European Coastline with UK boundaries), DEFRADEMKC8.shp (Seabed bathymetry), C5922DATASETFAM13022017.csv (Training dataset), PARTC16112018.csv (Test dataset), PARTCAGG16112018.csv (Aggregation data). Description of C5922DATASETFAM13022017.csv: This file is based on the RSMP dataset (see https://www.cefas.co.uk/cefas-data-hub/dois/rsmp-baseline-dataset/), but with macrofaunal data output at the level of family or above. A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore. Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve. Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 774 remained after correction and aggregation to family level. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 900 columns (variables/factors), yielding a matrix of 29,878,200 individual data values.

Conformity

Conformity report

specification

title

INSPIRE Data Specification on Habitats and Biotopes – Technical Guidelines

reference date

date type

publication

effective date

2013-12-10

degree

false

explanation

See the referenced specification

Conformity report

specification

title

reference date

date type

publication

effective date

2010-12-08

degree

true

explanation

See the referenced specification

Data format

name of format

Unknown

version of format

Constraints related to access and use

Constraint set

Limitations on public access

Constraint set

Limitations on public access

Responsible organisations

Responsible party

organisation name

Centre for Environment, Fisheries and Aquaculture Science, Lowestoft Laboratory (CEFAS)

full postal address

Cefas Lowestoft Laboratory

Pakefield Road

Lowestoft

NR33 0HT

UK

email address

data.manager@cefas.co.uk

responsible party role

originator

Responsible party

organisation name

Centre for Environment, Fisheries and Aquaculture Science, Lowestoft Laboratory (CEFAS)

full postal address

Cefas Lowestoft Laboratory

Pakefield Road

Lowestoft

NR33 0HT

UK

email address

data.manager@cefas.co.uk

responsible party role

custodian

Responsible party

organisation name

Centre for Environment, Fisheries and Aquaculture Science, Lowestoft Laboratory (CEFAS)

full postal address

Cefas Lowestoft Laboratory

Pakefield Road

Lowestoft

NR33 0HT

UK

email address

data.manager@cefas.co.uk

responsible party role

distributor

Responsible party

organisation name

Department for Environment, Food and Rural Affairs (DEFRA)

email address

defra.helpline@defra.gov.uk

responsible party role

owner

Metadata on metadata

Metadata point of contact

organisation name

Centre for Environment, Fisheries and Aquaculture Science, Lowestoft Laboratory (CEFAS)

full postal address

Cefas Lowestoft Laboratory

Pakefield Road

Lowestoft

NR33 0HT

UK

email address

data.manager@cefas.co.uk

responsible party role

pointOfContact

Metadata date

2024-07-12T12:51:20

Metadata language

eng