|
Titles |
Links |
| Computer Science |
Natural Language Processing (NLP) |
https://rajpurkar.github.io/SQuAD-explorer/ |
| Computer Vision |
https://cocodataset.org/#download |
| Algorithms & Data Structures |
https://snap.stanford.edu/data/ |
| Programming Languages / Code Analysis |
https://github.com/github/CodeSearchNet |
| Operating Systems |
https://github.com/google/cluster-data |
| Databases & Data Mining |
https://datasets.imdbws.com/ |
| Computer Architecture / Hardware |
https://github.com/felixsteinke/cpu-spec-dataset |
| Information Technology |
Cloud Computing |
https://github.com/google/cluster-data |
| Software Engineering |
https://www.kaggle.com/datasets/syedmharis/software-engineering-interview-questions-dataset |
| IT Service Management |
https://www.kaggle.com/datasets/swapniljadhav96/itsm-dataset |
| Cybersecurity |
https://www.kaggle.com/datasets/teamincribo/cyber-security-attacks |
| User Behavior / Web Analytics |
https://archive.org/details/datasets |
| Electrical Engineering |
Power Systems |
https://ieee-dataport.org/documents/power-system-multi-source-events-dataset |
| Renewable Energy (Solar/Wind) |
https://www.nrel.gov/grid/solar-power-data.html |
| Smart Grid |
https://www.kaggle.com/datasets/ziya07/smart-grid-monitoring-dataset/data |
| Electrical Machines |
https://ieee-dataport.org/open-access/industrial-machines-dataset-electrical-load-disaggregation |
| Control Systems |
https://ieee-dataport.org/documents/dataset-bundle-building-automation-and-control-systems-security-analysis# |
| Electronics and Communication Engineering |
Digital Signal Processing (DSP) |
https://www.kaggle.com/datasets/emirhanai/advanced-signal-processing-dataset-from-ai-sensors |
| Wireless Communication |
https://catalog.data.gov/dataset/?tags=wireless-communications-and-networks |
| 5G / Cellular Networks |
https://www.kaggle.com/datasets/vinothkannaece/5g-network-data |
| Antenna & RF Systems |
https://www.kaggle.com/datasets/suraj520/rf-signal-data |
| VLSI / IC Design |
https://github.com/vlsi/calcite-test-dataset |
| Biomedical |
PhysioNet (ECG, EEG, Vital Signs) |
https://physionet.org/ |
| MIMIC-IV Clinical Database |
https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2 |
| BraTS (Brain Tumor Segmentation) |
https://www.med.upenn.edu/cbica/brats2020/data.html |
| COVID-19 Radiography Database |
https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database |
| Renewable Energy |
NREL Solar Power Data |
https://github.com/Charlie5DH/Solar-Power-Datasets-and-Resources |
| NREL Wind Integration Datasets |
https://www.nrel.gov/grid/wind-toolkit.html |
| Global Energy Forecasting Competition (GEFCom) |
https://www.kaggle.com/competitions/GEF2012-wind-forecasting |
| Open Power System Data (Renewables) |
https://data.open-power-system-data.org/ |
| Mechanical Engineering |
NASA Turbofan Engine Degradation Simulation |
https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/ |
| IC Engine Vibration / Sound Datasets |
https://github.com/Charlie5DH/PredictiveMaintenance-and-Vibration-Resources |
| Structural Health Monitoring Sensor Data |
https://www.kaggle.com/datasets/ziya07/building-structural-health-sensor-dataset |
| Robotics / Control Benchmark Datasets |
https://github.com/mint-lab/awesome-robotics-datasets |
| Autonomous Vehicle Engineering |
KITTI Vision Benchmark Suite |
http://www.cvlibs.net/datasets/kitti/ |
| Waymo Open Dataset |
https://waymo.com/open/ |
| ApolloScape |
http://apolloscape.auto/ |
| Civil Engineering |
Building Energy Dataset |
https://www.kaggle.com/c/ashrae-energy-prediction |
| Pavia University Remote Sensing |
https://www.kaggle.com/datasets/syamkakarla/pavia-university-hsi |
| UCI Concrete Compressive Strength |
https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength |
| Chemical Engineering |
Catalysis Reaction Data (QM9 Molecules) |
https://quantum-machine.org/datasets/ |
| Chemical Process Simulation (DREAM Challenge) |
https://zenodo.org/records/3735364 |
| Industrial Chemical Sensor Data |
https://archive.ics.uci.edu/dataset/45/heart+disease |
| Process Systems Engineering Datasets (PSE) |
https://data.world/briannielsen/process-systems-engineering |
| Aerospace Engineering |
NASA Airfoil Self-Noise Dataset |
https://www.kaggle.com/datasets/fedesoriano/airfoil-selfnoise-dataset |
| UCI Flight Delay Dataset |
https://www.transtats.bts.gov/OT_Delay/ |
| NASA Turbofan Engine Degradation (C-MAPSS) |
https://github.com/kpeters/exploring-nasas-turbofan-dataset |
| OpenAeroStruct (Aero-Struct Optimization) |
https://github.com/mdolab/OpenAeroStruct |
| ERA5 Atmospheric Reanalysis Data |
https://cds.climate.copernicus.eu/datasets |
| Industrial Engineering |
UCI Manufacturing Failure Detection |
https://www.kaggle.com/datasets/ziya07/smart-manufacturing-iot-cloud-monitoring-dataset |
| SECOM Semiconductor Manufacturing Data |
https://archive.ics.uci.edu/ml/datasets/SECOM |
| Tennessee Eastman Process Simulation |
https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/tennessee_eastman_process_simulation_dataset.md |
| Open Jobs/Workforce Data (BLS) |
https://www.bls.gov/data/ |
| Assembly Line Sensor Data |
https://universe.roboflow.com/wd-rohcm/dataset-s7uii |
| Metallurgical Engineering |
Materials Data Repository (NIST) |
https://github.com/sedaoturak/data-resources-for-materials-science |
| Materials Project (Crystallography & Properties) |
https://materialsproject.org/ |
| Open Quantum Materials Database (OQMD) |
https://colab.research.google.com/github/Tony-Y/oqmd-v1.2-dataset-for-cgnn/blob/main/OQMD_v1_2_dataset_for_CGNN.ipynb |
| Materials Science Engineering |
Materials Project Database |
https://materialsproject.org/ |
| NIST Thermo-Calc Datasets |
https://www.nist.gov/programs/projects/thermo-calc-data |
| Citrine Materials Data |
https://citrine.io/media-post/data-highlight-materials-project-dataset/ |
| Jarvis DFT Database |
https://jarvis.nist.gov/ |
| Mechatronics Engineering |
UCI Human Activity Recognition Using Smartphones |
https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones |
| Robotic Grasping Dataset (Cornell) |
https://www.kaggle.com/datasets/oneoneliu/cornell-grasp |
| OpenAI Gym Robotics Environments |
https://github.com/openai/robogym |
| Mobile Robot Navigation (TurtleBot Logs) |
https://zenodo.org/records/1188976 |
| Inertial Measurement Unit (IMU) Motion Data |
https://www.kaggle.com/datasets/ziya07/ai-powered-imu-motion-dataset |
| Automobile Engineering |
KITTI Autonomous Driving Dataset |
http://www.cvlibs.net/datasets/kitti/ |
| NUScenes AV Dataset |
https://www.kaggle.com/datasets/mitanshuchakrawarty/nuscenes |
| Car Evaluation Dataset |
https://archive.ics.uci.edu/ml/datasets/Car+Evaluation |
| Vehicle Fuel Consumption (EPA) |
https://www.fueleconomy.gov/feg/download.shtml |
| Open Traffic Data (HERE) |
https://developer.here.com/products/traffic |
| Control Systems Engineering |
UCI PID Controller Benchmark Data |
https://archive.ics.uci.edu/ml/datasets/Servo |
| MATLAB/Simulink Control Test Cases (CORA) |
https://in.mathworks.com/matlabcentral/fileexchange/68551-cora |
| Benchmark Control System Models (DAE) |
https://www.cds.caltech.edu/~murray/wiki/ |
| Aircraft Control Simulation Logs |
https://data.nas.nasa.gov/ |
| Instrumentation & Control Engineering |
UCI Servo Control Dataset |
https://archive.ics.uci.edu/ml/datasets/Servo |
| PID Tuning Benchmark (MATLAB/Simulink logs) |
https://github.com/contractor-core/cora-benchmarks |
| Industrial Process Control Data (Tennessee Eastman) |
https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/tennessee_eastman_process_simulation_dataset.md |
| Embedded Systems Engineering |
WISDM Smartphone Data (Embedded Sensors) |
https://www.kaggle.com/datasets/antonandreenko/industrial-control-system-ics-alarm-text-dataset |
| OpenEmbedded Benchmark Dataset |
https://github.com/openembedded/ |
| IoT Traffic Dataset (UCI) |
https://github.com/thieu1995/iot_dataset/blob/master/ReadMe.md |
| Arduino Sensor Dataset (UCI) |
https://archive.ics.uci.edu/dataset/506/human+activity+recognition+from+continuous+ambient+sensor+data |
| VLSI Design Engineering |
ISCAS Circuits Benchmark (VLSI Testing) |
https://www.kaggle.com/datasets/hemanthhari/vlsi-data |
| OpenROAD VLSI Data (Layout/Synthesis) |
https://github.com/The-OpenROAD-Project |
| ISPD Contest Benchmark Suites |
https://universe.roboflow.com/casproject/ispd |
| Microelectronics Engineering |
Microelectronic Failure Analysis Data |
https://www.kaggle.com/datasets/umerrtx/machine-failure-prediction-using-sensor-data |
| SEM Image Dataset (Materials) |
https://github.com/BAMresearch/automatic-sem-image-segmentation |
| Power Electronics Engineering |
Power Electronics Converter Data (Simulation) |
https://data.world/briannielsen/power-electronics |
| PEC Dataset (Inverter/Converter Logs) |
https://www.kaggle.com/datasets/rusuanjun/pec-dataset |
| Electric Vehicle Powertrain Data |
https://data.gov/transportation/ |
| Grid-Connected Inverter Dataset |
https://www.nrel.gov/grid/solar-power-data.html |
| Biotechnology Engineering |
Genomic Data (NCBI SRA) |
https://www.ncbi.nlm.nih.gov/sra |
| TCGA Cancer Genomics Dataset |
https://portal.gdc.cancer.gov/ |
| Human Microbiome Project Data |
https://github.com/awslabs/open-data-registry/tree/main/datasets |
| Protein Data Bank (PDB) |
https://catalog.data.gov/dataset/protein-data-bank-pdb |
| KEGG Pathway Database |
https://www.genome.jp/kegg/ |
| Pharmaceutical Engineering |
PubChem BioAssay |
https://archive.ics.uci.edu/dataset/209/pubchem+bioassay+data |
| DrugBank (Drug Data) |
https://www.kaggle.com/datasets/aryelbezerra/drugbank-approved-drugs-dataset |
| ChEMBL (Bioactive Molecules) |
https://github.com/awslabs/open-data-registry/blob/main/datasets/chembl.yaml |
| Genetic Engineering |
NCBI Gene Expression Omnibus (GEO) |
https://www.ncbi.nlm.nih.gov/geo/ |
| 1000 Genomes Project |
https://www.internationalgenome.org/data |
| Drosophila RNA-Seq (modENCODE) |
https://www.kaggle.com/datasets/tianjiechen/tcga-rna-datasets |
| Food Technology Engineering |
Food Composition Database (USDA) |
https://fdc.nal.usda.gov/ |
| Food Quality & Safety Data (UCI) |
https://www.kaggle.com/datasets/nikhitaganiger/food-safety |
| Food Microbiology (FERMENTATION) |
https://data.world/makefoodsafe/fermentation |
| Agricultural Engineering |
UCI Crop Dataset (Plant Seedlings) |
https://www.kaggle.com/competitions/plant-seedlings-classification |
| FAOSTAT Agriculture Data |
https://www.fao.org/faostat/en/ |
| Weather & Yield (Global) |
https://www.ecmwf.int/en/forecasts/datasets |
| Irrigation & Soil Moisture Data (USDA) |
https://catalog.data.gov/dataset/?tags=irrigation |
| Dairy Technology Engineering |
Milk Composition Database (IDF) |
https://github.com/saideepaknagaraj/Milk-spectra-analysis |
| UCI Fermented Dairy Dataset |
https://www.kaggle.com/datasets/suraj520/dairy-goods-sales-dataset |
| Power Systems Engineering |
NREL Renewable Integration Data |
https://registry.opendata.aws/nrel-pds-wtk/ |
| European Transmission System Data |
https://www.entsoe.eu/data/ |
| UCI Electrical Grid Stability |
https://archive.ics.uci.edu/dataset/471/electrical+grid+stability+simulated+data |
| Geological Engineering |
USGS Earthquake Catalog |
https://www.kaggle.com/datasets/rupindersinghrana/usgs-earthquakes-2024 |
| OneGeology Global Geoscience Data |
https://www.kaggle.com/competitions/geology-forecast-challenge-open |
| Mineral Resources Data System (USGS) |
https://github.com/DOI-USGS/dataretrieval-python |
| Geo-Environmental Engineering |
Global Soil Data (ISRIC-World Soil Information) |
https://data.nasa.gov/dataset/global-data-set-of-derived-soil-properties-0-5-degree-grid-isric-wise-3fec2 |
| Air Quality Open Data (OpenAQ) |
https://openaq.org/ |
| Water Quality Portal (USGS + EPA) |
https://www.waterqualitydata.us/ |
| NASA Earthdata Environmental Datasets |
https://www.kaggle.com/datasets/ivansher/nasa-nearest-earth-objects-1910-2024 |
| Nanotechnology Engineering |
Nanomaterial Registry |
https://github.com/NanoCommons/datasets |
| Materials Project (Nano/Crystalline Data) |
https://materialsproject.org/ |
| PubChem Nanomaterials |
https://pubchem.ncbi.nlm.nih.gov/ |
| Networking |
CAIDA Internet Traffic Dataset |
https://www.caida.org/data/passive/ |
| MAWI Working Group Traffic Archive |
https://mawi.wide.ad.jp/mawi/ |
| Internet Traffic Archive (ITA) |
https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset/data |
| Cybersecurity |
CSE-CIC-IDS 2018 |
https://www.unb.ca/cic/datasets/ids-2018.html |
| MITRE ATT&CK Evaluations Dataset |
https://github.com/mitre-attack/attack-stix-data |
| DARPA Intrusion Detection Dataset |
https://www.ll.mit.edu/r-d/datasets |
| Network Security |
UNSW-NB15 Dataset |
https://research.unsw.edu.au/projects/unsw-nb15-dataset |
| NSL-KDD Dataset |
https://www.unb.ca/cic/datasets/nsl.html |
| Bot-IoT Dataset |
https://research.unsw.edu.au/projects/bot-iot-dataset |
| Wireless Sensor Network (WSN) |
Intel Berkeley Research Lab Sensor Dataset |
http://db.csail.mit.edu/labdata/labdata.html |
| LUCE Environmental Sensor Dataset |
https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k |
| UCI Sensorless Drive Diagnosis |
https://archive.ics.uci.edu/datasets?search=Sensorless_drive_diagnosis |
| Wireless Communication |
DeepSig RadioML Dataset |
https://github.com/sofwerx/deepsig_datasets |
| Wireless InSite Channel Dataset |
https://www.qualcomm.com/developer/software/wireless-indoor-simulations-dataset |
| NYU Wireless mmWave Channel Measurements |
https://archive.ics.uci.edu/datasets?search=Sensorless_drive_diagnosis |
| Network Communication |
Stanford SNAP Communication Networks |
https://snap.stanford.edu/data/ |
| Email Communication Network (Enron) |
https://www.cs.cmu.edu/~enron/ |
| EU Email Communication Dataset |
https://snap.stanford.edu/data/email-Eu-core.html |
| Satellite Communication |
NASA Space Communications Dataset |
https://data.nasa.gov/ |
| ESA Satellite Telemetry Data |
https://www.kaggle.com/datasets/sammahoney/esa-anomaly-dataset |
| SATCOM Channel Measurement Dataset |
https://github.com/clarkzjw/LENS |
| Telecommunication |
ITU Telecommunication Indicators |
https://data360.worldbank.org/en/dataset/ITU_DH |
| Ofcom Telecom Market Data |
https://www.ofcom.org.uk/research-and-data |
| Telecom Italia Network Traffic Dataset |
https://doi.org/10.7910/DVN/3QBYB5 |
| Broadband Measurement Data (FCC MBA) |
https://www.fcc.gov/general/measuring-broadband-america |
| CAIDA Internet Topology Data |
https://www.caida.org/data/ |
| Edge Computing |
IoT Edge Analytics Dataset (UCI) |
https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot |
| Edge AI Sensor Dataset (WISDM) |
https://archive.ics.uci.edu/ml/datasets/WISDM+Smartphone+and+Smartwatch+Activity+and+Biometry+Dataset |
| Azure IoT Edge Telemetry Data |
https://www.kaggle.com/code/chaozhuang/iot-telemetry-sensor-data-analysis |
| Fog Computing |
Smart City Fog Computing Dataset |
https://github.com/IBM/smart-city-analytics |
| IoT-Fog Resource Usage Dataset |
https://www.kaggle.com/datasets/ziya07/multi-tier-iot-resource-allocation-dataset |
| Vehicular Fog Computing Dataset |
https://github.com/aniketmaurya/vehicular-fog-dataset |
| Cloud Fog Task Scheduling Dataset |
https://data.world/uci/fog-computing |
| Smart Healthcare Fog Dataset |
https://www.kaggle.com/datasets/acharyakamal/smart-healthcare-prediction-management-system |
| Optical Communication |
Optical Fiber Channel Dataset |
https://github.com/functions-lab/COSMOS-EDFA-Dataset |
| Coherent Optical Communication Dataset |
https://zenodo.org/records/4553836 |
| Optical Signal Modulation Dataset |
https://ieee-dataport.org/open-access/optical-communication-datasets |
| Nonlinear Fiber Optics Dataset |
https://catalog.data.gov/dataset/?tags=fiber-optic |
| Optical Noise Measurement Dataset |
https://zenodo.org/records/8392622 |
| Optical Network |
GÉANT Network Topology Data |
https://ieee-dataport.org/documents/traffic-datsets-abilene-geant-taxibj |
| Optical Transport Network Dataset (OTN) |
https://ieee-dataport.org/open-access/optical-network-datasets |
| WDM Network Benchmark Dataset |
https://www.kaggle.com/competitions/ofc-2026-ml-challenge |
| Flex-Grid Optical Network Dataset |
https://zenodo.org/records/3696817 |
| Cellular Network |
CRAwdAD Cellular Network Traces |
https://crawdad.org/ |
| OpenCellID (Cell Tower Data) |
https://www.opencellid.org/ |
| MIT Reality Mining Dataset |
http://realitycommons.media.mit.edu/realitymining.html |
| Telecom Italia Mobile Dataset |
https://dandelion.eu/datamine/open-big-data/ |
| 5G Dataset (InterDigital) |
https://www.kaggle.com/datasets/vinothkannaece/5g-network-data |
| Mobile Communication |
UCI Human Mobility Dataset |
https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones |
| Mobile Phone Usage Dataset (D4D Orange) |
https://www.kaggle.com/datasets/bhadramohit/smartphone-usage-and-behavioral-dataset |
| MIT Smartphone Sensing Dataset |
https://www.kaggle.com/datasets/prince7489/smartphone-usage-dataset |
| Wireless Mobility Traces (CRAWDAD) |
https://crawdad.org/ |
| Cell Phone Activity Dataset (Telecom Italia) |
https://www.kaggle.com/code/ijfezika/mobile-phone-activity-exploratory-analysis |
| Distributed Computing |
Google Cluster Workload Traces |
https://github.com/google/cluster-data |
| Alibaba Cluster Trace Dataset |
https://github.com/alibaba/clusterdata |
| Grid Workload Archive |
https://gwa.ewi.tudelft.nl/ |
| HPC Job Scheduling Dataset |
https://zenodo.org/records/3634616 |
| Distributed Systems Benchmark (DeathStarBench) |
https://github.com/delimitrou/DeathStarBench |
| Cloud Computing |
Google Cloud Trace Dataset |
https://github.com/google/cluster-data |
| Azure Public Cloud Dataset |
https://www.kaggle.com/datasets/rishi2123/oragnizations-expenses-2023-2024 |
| AWS Open Data Registry |
https://registry.opendata.aws/ |
| OpenDC Cloud Workload Dataset |
https://github.com/atlarge-research/opendc |
| Bitbrains Cloud Workload Traces |
https://www.kaggle.com/datasets/gauravdhamane/gwa-bitbrains |
| Computer Vision |
COCO (Common Objects in Context) |
https://cocodataset.org/ |
| Pascal VOC |
http://host.robots.ox.ac.uk/pascal/VOC/ |
| Cityscapes |
https://www.cityscapes-dataset.com/ |
| Open Images Dataset |
https://storage.googleapis.com/openimages/web/index.html |
| Pattern Recognition |
MNIST Handwritten Digits |
http://yann.lecun.com/exdb/mnist/ |
| EMNIST Extended Digits & Letters |
https://www.nist.gov/itl/products-and-services/emnist-dataset |
| UCI Optical Recognition of Handwritten Digits |
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits |
| ISOLET Spoken Letter Dataset |
https://archive.ics.uci.edu/ml/datasets/isolet |
| Caltech 101 |
https://data.caltech.edu/records/20086 |
| Remote Sensing |
Landsat Satellite Imagery |
https://landsat.gsfc.nasa.gov/data/ |
| Sentinel-2 Satellite Data |
https://www.kaggle.com/datasets/salmaadell/eurosat-rgb |
| UC Merced Land Use Dataset |
https://www.kaggle.com/datasets/abdulhasibuddin/uc-merced-land-use-dataset |
| ISPRS Aerial Image Dataset |
https://github.com/whuwuteng/Aerial_Stereo_Dataset |
| MODIS Earth Observation Data |
https://modis.gsfc.nasa.gov/data/ |
| Natural Language Processing (NLP) |
SQuAD (Question Answering) |
https://rajpurkar.github.io/SQuAD-explorer/ |
| WikiText Language Modeling Dataset |
https://www.kaggle.com/datasets/rohitgr/wikitext |
| Common Crawl Text Corpus |
https://www.kaggle.com/datasets/jyesawtellrickson/commoncrawl |
| IMDB Movie Reviews |
https://ai.stanford.edu/~amaas/data/sentiment/ |
| Image Processing |
Berkeley Segmentation Dataset (BSDS500) |
https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/ |
| USC SIPI Image Database |
https://sipi.usc.edu/database/ |
| Set12 Image Denoising Dataset |
https://github.com/cszn/DnCNN |
| DIV2K High-Resolution Images |
https://data.vision.ee.ethz.ch/cvl/DIV2K/ |
| Kodak Image Dataset |
https://r0k.us/graphics/kodak/ |
| Signal Processing |
MIT-BIH Arrhythmia ECG Dataset |
https://physionet.org/content/mitdb/ |
| Speech Commands Dataset |
https://www.tensorflow.org/datasets/catalog/speech_commands |
| RadioML Signal Modulation Dataset |
https://www.kaggle.com/datasets/pinxau1000/radioml2018 |
| UCI Gas Sensor Array Drift Dataset |
https://archive.ics.uci.edu/dataset/224/gas+sensor+array+drift+dataset |
| EEG Motor Movement Dataset |
https://physionet.org/content/eegmmidb/ |
| Biomedical |
PhysioNet Clinical Signals |
https://physionet.org/ |
| MIMIC-IV Clinical Database |
https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2 |
| BraTS Brain Tumor Dataset |
https://www.med.upenn.edu/cbica/brats2020/data.html |
| NIH Chest X-ray Dataset |
https://nihcc.app.box.com/v/ChestXray-NIHCC |
| ADNI Alzheimer’s Dataset |
https://adni.loni.usc.edu/ |
| Big Data |
Google Cluster Workload Traces |
https://github.com/google/cluster-data |
| Amazon Reviews Dataset |
https://registry.opendata.aws/amazon-reviews/ |
| NYC Taxi Trip Records |
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page |
| Wikipedia Data Dumps |
https://dumps.wikimedia.org/ |
| Common Crawl Web Data |
https://www.kaggle.com/datasets/jyesawtellrickson/commoncrawl |
| Software Engineering |
PROMISE Software Defect Dataset |
https://github.com/feiwww/PROMISE-backup |
| CodeSearchNet |
https://github.com/github/CodeSearchNet |
| Apache Software Logs |
https://www.kaggle.com/datasets/omduggineni/loghub-apache-log-data |
| GitHub Public Dataset (BigQuery) |
https://cloud.google.com/bigquery/public-data/github |
| NASA Software Defect Dataset |
https://github.com/klainfo/NASADefectDataset |
| Power Electronics |
Power Converter Dataset (Zenodo) |
https://zenodo.org/records/3606180 |
| PEMS Power Electronics Measurements |
https://www.kaggle.com/datasets/sepandhaghighi/proton-exchange-membrane-pem-fuel-cell-dataset |
| Inverter Fault Diagnosis Dataset |
https://www.kaggle.com/datasets/ziya07/fault-diagnosis-dataset-for-new-energy-vehicles |
| Electric Drive & Converter Dataset (UCI) |
https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014 |
| Power Systems |
IEEE Power System Test Cases |
https://labs.ece.uw.edu/pstca/ |
| MATPOWER Power Grid Data |
https://matpower.org/download/ |
| ENTSO-E Electricity Network Data |
https://www.entsoe.eu/data/ |
| NREL Power System Data |
https://www.nrel.gov/grid/data-tools.html |
| UCI Electrical Grid Stability Dataset |
https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data |
| Open Power System Data |
https://data.open-power-system-data.org/ |
| Wind Turbine / Solar Energy |
NREL Wind Toolkit |
https://www.nrel.gov/grid/wind-toolkit.html |
| NREL Solar Power Data |
https://www.nrel.gov/grid/solar-power-data.html |
| Wind Turbine SCADA Dataset |
https://data.world/energi/wind-turbine-scada |
| Global Solar Atlas Data |
https://gee-community-catalog.org/projects/gsa/ |
| GEFCom Renewable Forecasting Dataset |
https://www.kaggle.com/competitions/GEF2012-wind-forecasting |
| COCO Dataset |
https://cocodataset.org/ |
| Open Images Dataset |
https://storage.googleapis.com/openimages/web/index.html |
| AI2 ARC Reasoning Dataset |
https://allenai.org/data/arc |
| CLEVR Reasoning Dataset |
https://cs.stanford.edu/people/jcjohns/clevr/ |
| Artificial Intelligence |
UCI Machine Learning Repository |
https://archive.ics.uci.edu/ml/index.php |
| OpenML Benchmark Datasets |
https://www.openml.org/ |
| LIBSVM Dataset Collection |
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| StatLib Datasets |
https://lib.stat.cmu.edu/datasets/ |
| Deep Learning |
MNIST |
https://git-disl.github.io/GTDLBench/datasets/mnist_datasets/ |
| CIFAR-10 / CIFAR-100 |
https://www.cs.toronto.edu/~kriz/cifar.html |
| SVHN Dataset |
http://ufldl.stanford.edu/housenumbers/ |
| CelebA Face Dataset |
https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html |
| ImageNet-21K |
https://www.image-net.org/download |
| DeepSig RadioML Dataset |
https://www.deepsig.ai/datasets/ |
| AI LLM (Large Language Models) |
The Pile (Massive Text Corpus) |
https://pile.eleuther.ai/ |
| C4 (Colossal Clean Crawled Corpus) |
https://www.tensorflow.org/datasets/catalog/c4 |
| WikiText-103 |
https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-103-raw-v1 |
| OpenWebText |
https://github.com/jcpeterson/openwebtext |
| BigScience ROOTS Corpus |
https://huggingface.co/bigscience-corpus |
| AI SLM |
Alpaca Instruction Dataset |
https://github.com/tatsu-lab/stanford_alpaca |
| FLAN Instruction Dataset |
https://github.com/google-research/FLAN |
| Dolly Instruction Dataset |
https://huggingface.co/datasets/databricks/databricks-dolly-15k |
| TinyStories Dataset |
https://huggingface.co/datasets/roneneldan/TinyStories |
| Artificial General Intelligence |
ARC (Abstraction and Reasoning Corpus) |
https://allenai.org/data/arc |
| BabyAI Platform & Dataset |
https://github.com/mila-iqia/babyai |
| bAbI Reasoning Tasks |
https://github.com/facebookarchive/bAbI-tasks |
| CLEVR Compositional Reasoning Dataset |
https://cs.stanford.edu/people/jcjohns/clevr/ |
| Neuro-Symbolic AI |
CLEVRER (Causal & Symbolic Reasoning) |
http://clevrer.csail.mit.edu/ |
| DeepProbLog Datasets |
https://github.com/ML-KULeuven/deepproblog |
| Abduction and Argumentation Dataset |
https://github.com/AbductiveLearning/ABLSim |
| Logic Tensor Networks Benchmarks |
https://github.com/logictensornetworks/ltntorch |
| Cognitive Computing |
OpenCog Cognitive Datasets |
https://github.com/opencog/opencog |
| ATOMIC Commonsense Knowledge Graph |
https://allenai.org/data/atomic |
| ConceptNet |
https://conceptnet.io/ |
| MindBigData (Human Thought Data) |
http://www.mindbigdata.com/opendb/ |
| Self-Supervised Learning |
ImageNet (Unlabeled / SSL Use) |
https://www.image-net.org/ |
| STL-10 Dataset |
https://cs.stanford.edu/~acoates/stl10/ |
| AudioSet (Self-Supervised Audio) |
https://research.google.com/audioset/ |
| Kinetics Video Dataset |
https://github.com/cvdfoundation/kinetics-dataset |
| LibriSpeech (SSL for Speech) |
https://www.openslr.org/12 |
| Federated Learning |
LEAF Federated Learning Benchmark |
https://leaf.cmu.edu/ |
| FedScale Dataset Suite |
https://github.com/SymbioticLab/FedScale |
| Google Federated EMNIST |
https://figshare.com/articles/dataset/Federated_EMNIST_Dataset/26308777 |
| NIID-Bench Federated Dataset |
https://github.com/Xtra-Computing/NIID-Bench |
| Explainable AI |
UCI Adult Dataset (XAI Benchmark) |
https://archive.ics.uci.edu/ml/datasets/adult |
| COMPAS Recidivism Dataset |
https://www.kaggle.com/datasets/danofer/compass |
| OpenML Explainability Benchmarks |
https://www.openml.org/search?type=data |
| MIMIC-IV (Clinical XAI) |
https://mimic.physionet.org/ |
| FICO Explainable ML Challenge Dataset |
https://www.kaggle.com/datasets/lhagiimn/fico-dataset |
| Quantum Machine Learning |
QML Benchmark Datasets (IBM) |
https://huggingface.co/datasets/Cohaerence/ibm-qml-kernel |
| Quantum Data Sets (UCI-style) |
https://quantum-machine.org/datasets/ |
| QASM Circuit Dataset |
https://github.com/FujiiLabCollaboration/MNISQ-quantum-circuit-dataset |
| QML Toy Datasets (PennyLane) |
https://pennylane.ai/datasets/collection/qml-benchmarks |
| Edge AI / TinyML |
Google Speech Commands (TinyML) |
https://www.tensorflow.org/datasets/catalog/speech_commands |
| MLPerf Tiny Benchmark Dataset |
https://github.com/mlcommons/tiny |
| Edge Impulse Public Datasets |
https://docs.edgeimpulse.com/datasets |
| UCI HAR (Embedded Sensors) |
https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones |
| WISDM Wearable Dataset |
https://archive.ics.uci.edu/dataset/507/wisdm+smartphone+and+smartwatch+activity+and+biometrics+dataset |
| Generative AI |
LAION-5B Multimodal Dataset |
https://laion.ai/blog/laion-5b/ |
| The Pile (Text Generation) |
https://pile.eleuther.ai/ |
| Common Crawl |
https://commoncrawl.org/ |
| CelebA (Image Generation) |
https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html |
| MusicNet (Audio Generation) |
https://homes.cs.washington.edu/~thickstn/musicnet.html |
| CodeSearchNet (Code Generation) |
https://github.com/github/CodeSearchNet |
| Neuromorphic Computing |
Spiking Heidelberg Digits (SHD) |
https://ieee-dataport.org/open-access/heidelberg-spiking-datasets |
| Spiking Speech Commands |
https://zenkelab.org/resources/spiking-heidelberg-datasets-shd/ |
| DVS Gesture Dataset |
https://github.com/VicenteAlex/DVS-Gesture-Chain?tab=readme-ov-file |
| N-MNIST |
https://www.garrickorchard.com/datasets/n-mnist |
| Data Science and Analytics |
UCI Adult Income Dataset |
https://archive.ics.uci.edu/ml/datasets/adult |
| OpenML Benchmark Suite |
https://www.openml.org/search?type=data |
| World Bank Open Data |
https://data.worldbank.org/ |
| NYC Open Data |
https://www.kaggle.com/datasets/nycopendata/new-york |
| Self-Supervised Learning |
YCB Object and Model Set |
https://www.ycbbenchmarks.com/ |
| RoboNet Dataset |
https://github.com/SudeepDasari/RoboNet |
| DROID Robot Manipulation Dataset |
https://droid-dataset.github.io/ |
| Oxford RobotCar Dataset |
https://robotcar-dataset.robots.ox.ac.uk/ |
| Signals and Systems |
MIT-BIH Arrhythmia Database |
https://physionet.org/content/mitdb/ |
| ECG-ID Database |
https://physionet.org/content/ecgiddb/ |
| NOAA Signal Data |
https://www.ngdc.noaa.gov/ |
| UCR Time Series Archive |
https://www.timeseriesclassification.com/ |
| PhysioNet Signal Archive |
https://physionet.org/about/database/ |
| Blockchain |
Ethereum Blockchain Dataset |
https://www.kaggle.com/datasets/bigquery/ethereum-blockchain |
| Bitcoin Historical Data |
https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data |
| Elliptic Bitcoin Dataset |
https://www.kaggle.com/datasets/ellipticco/elliptic-data-set |
| Blockchain.com Charts Data |
https://www.blockchain.com/charts |
| 5G Network |
ITU IMT-2020 Evaluation Data |
https://www.itu.int/md/meetingdoc.asp?lang=en&parent=R19-IMT.2020.SAT-C&source=WP4B |
| Open5GCore Datasets |
https://github.com/OPENAIRINTERFACE/openairinterface5g |
| 5G NR Channel Model Data |
https://zenodo.org/records/15210986 |
| VANET |
VeReMi Dataset |
https://veremi-dataset.github.io/ |
| Luxembourg SUMO Traffic Dataset |
https://github.com/lcodeca/LuSTScenario |
| NGSIM Vehicle Trajectories |
https://www.kaggle.com/datasets/nigelwilliams/ngsim-vehicle-trajectory-data-us-101 |
| TAPAS Cologne Traffic Dataset |
https://sumo.dlr.de/docs/Data/Scenarios/TAPASCologne.html |
| V2X Communication |
OPV2V Autonomous Driving Dataset |
https://mobility-lab.seas.ucla.edu/opv2v/ |
| DAIR-V2X Dataset |
https://github.com/AIR-THU/DAIR-V2X?tab=readme-ov-file#dataset |
| V2X-Sim Dataset |
https://github.com/ai4ce/V2X-Sim |
| OFDM Wireless Communication |
COST 207 Channel Model Data |
https://onlinelibrary.wiley.com/doi/epdf/10.1002/0470847808.app5 |
| IEEE 802.11 OFDM Signal Dataset |
https://ieee-dataport.org/open-access/deepwiphy-synthetic-and-real-world-ieee-80211ax-ofdm-symbol-dataset |
| Wireless InSite Ray-Tracing Dataset |
https://github.com/sowang46/mmWave_V2X_dataset |
| MANET (Mobile Ad Hoc Networks) |
CRAWDAD MANET Traces |
https://crawdad.org/ |
| MIT MANET Mobility Dataset |
https://tracebase.org/tracebase/ |
| Manet Routing Dataset |
https://www.kaggle.com/datasets/gymprathap/manet-routing-dataset |
| SDN (Software Defined Networking) |
ARP SDN Traffic Dataset |
https://github.com/nisha077/ARP-SDN-Dataset |
| InSDN Dataset |
https://www.kaggle.com/datasets/badcodebuilder/insdn-dataset |
| SDN DDoS Dataset |
https://ieee-dataport.org/documents/sdn-ddos-attack-image-dataset |
| Mininet Traffic Dataset |
https://data.mendeley.com/datasets/9hz6f62gtk/1 |
| Underwater Sensor Network |
SFI Smart Ocean Acoustic Dataset |
https://ieee-dataport.org/open-access/sfi-smart-ocean-dataset-underwater-acoustic-communications |
| Underwater Acoustic Communication Dataset |
https://ieee-dataport.org/documents/band-full-duplex-underwater-acoustic-communication-measurements-lake-environment |
| UUV Simulator Data |
https://catalog.data.gov/dataset/teamer-electrically-engaged-undulation-system-for-unmanned-underwater-vehicles-7ffb5 |
| Sea Trial Acoustic Dataset |
https://zenodo.org/records/6372728 |
| IoT (Internet of Things) |
IoT-23 Dataset |
https://www.stratosphereips.org/datasets-iot23 |
| TON_IoT Dataset |
https://research.unsw.edu.au/projects/toniot-datasets |
| Intel Lab IoT Sensor Dataset |
https://db.csail.mit.edu/labdata/labdata.html |
| Smart Home IoT Dataset |
https://www.kaggle.com/datasets/taranvee/smart-home-dataset-with-weather-information |
| Quantum Networking |
Quantum Internet Dataset |
https://zenodo.org/records/17504715 |
| IBM Quantum Network Data |
https://quantum-computing.ibm.com/services/resources |
| Quantum Entanglement Network Dataset |
https://zenodo.org/records/8279583 |
| QKD Experimental Dataset |
https://archive.researchdata.leeds.ac.uk/1285/ |
| 6G Networks |
6G Channel Measurement Dataset |
https://github.com/ocatak/6g-channel-estimation-dataset |
| Terahertz Communication Dataset |
https://ieee-dataport.org/documents/measurement-based-parameterization-physics-reflection-models-terahertz-communication-s21 |
| Hexa-X 6G Dataset |
https://zenodo.org/records/17396743 |
| AI-enabled 6G Network Dataset |
https://www.kaggle.com/datasets/ziya07/dynamic-network-slicing-dataset-in-6g-networks |
| Network Routing |
Rocketfuel ISP Topology Dataset |
https://www.cs.washington.edu/research/networking/rocketfuel/ |
| CAIDA Internet Topology Data |
https://www.caida.org/catalog/datasets/ |
| INET Routing Dataset |
https://www.kaggle.com/datasets/asfandyar250/network |
| Opensource Routing Traces |
https://github.com/BNN-UPC/NetworkModelingDatasets |
| Intrusion Detection System |
CIC-IDS2017 |
https://www.unb.ca/cic/datasets/ids-2017.html |
| UNSW-NB15 |
https://research.unsw.edu.au/projects/unsw-nb15-dataset |
| NSL-KDD |
https://www.unb.ca/cic/datasets/nsl.html |
| TII-SSRC-23 |
https://ieee-dataport.org/documents/tii-ssrc-23-dataset-edited |
| DARPA IDS Dataset |
https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset |
| MIMO (Multiple Input Multiple Output) |
DeepMIMO Dataset |
https://github.com/DeepMIMO/DeepMIMO |
| NYU Wireless mmWave MIMO Dataset |
https://github.com/nyu-wireless/mmwRobotNav |
| Massive MIMO Channel Measurements |
https://ieee-dataport.org/open-access/beamspace-channel-dataset-mmwave-massive-mimo |
| COST 2100 MIMO Dataset |
https://www.kaggle.com/datasets/forment/cost2100 |
| Cognitive Radio Networks |
CRAWDAD Spectrum Occupancy Measurements |
https://crawdad.org/ |
| Spectrum Measurement Dataset |
https://www.kaggle.com/datasets/ajithdari/cass-spectrum-dataset |
| Electrosense Radio Spectrum Dataset |
https://zenodo.org/records/7521246 |
| IEEE 802.22 WRAN Simulation Data |
https://www.ieee802.org/22/ |
| Digital Forensics |
Digital Corpora Forensic Images |
https://digitalcorpora.org/ |
| DFRWS Forensic Challenge Datasets |
https://www.dfrws.org/forensic-challenges/ |
| NIST CFReDS Dataset |
https://cfreds.nist.gov/ |
| UC Irvine Memory Forensics Dataset |
https://daniyyell.com/datasets/Memory-Forensics-Attack-Simulation-Dataset/ |
| Wireless Body Area Network (WBAN) |
MHEALTH Dataset |
https://archive.ics.uci.edu/ml/datasets/mhealth+dataset |
| WISDM Wearable Sensor Dataset |
https://www.cis.fordham.edu/wisdm/dataset.php |
| BSN Challenge Dataset |
https://physionet.org/content/bhi-2018-challenge/1.0/ |
| LTE (Long Term Evolution) |
OpenAirInterface LTE Dataset |
https://data.europa.eu/data/datasets/oai-zenodo-org-10811147?locale=de |
| LTE Drive Test Dataset |
https://ieee-dataport.org/open-access/technical-university-denmark-lte-drive-test-measurements |
| Vienna LTE-A Link Level Simulator Data |
https://arxiv.org/html/2603.02638v1 |
| Ad Hoc Networks |
MIT Reality Mining Dataset |
http://realitycommons.media.mit.edu/realitymining.html |
| FAN-GHETS24 Ad Hoc Dataset |
https://zenodo.org/records/13315419 |
| Helsinki Mobility Traces |
https://www.tracebase.org/tracebase/ |
| Forensic Science |
Digital Corpora Forensic Images |
https://digitalcorpora.org/ |
| NIST CFReDS |
https://cfreds.nist.gov/ |
| DFRWS Forensic Challenge Datasets |
https://www.dfrws.org/forensic-challenges/ |
| GovDocs1 Forensic Corpus |
https://digitalcorpora.org/corpora/govdocs |
| Psychology |
Open Psychometrics Data |
https://openpsychometrics.org/_rawdata/ |
| Human Connectome Project |
https://www.humanconnectome.org/ |
| Child Mind Institute Dataset |
https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/data |
| MIDUS Psychological Study |
https://midus.wisc.edu/data-access/ |
| APA Open Data Repository |
https://www.apa.org/pubs/databases |
| Public Administration |
World Bank Governance Indicators |
https://www.imf.org/en/publications/sprolls/world-economic-outlook-databases |
| OECD Public Governance Data |
https://oecd-public-integrity-indicators.org/indicators/ |
| UN Public Administration Dataset |
https://publicadministration.un.org/ |
| USA Government Open Data |
https://www.data.gov/ |
| European Open Government Data |
https://data.europa.eu/ |
| Economics |
Penn World Table |
https://www.rug.nl/ggdc/productivity/pwt/ |
| IMF World Economic Outlook Data |
https://www.data.imf.org/en |
| World Bank Development Indicators |
https://databank.worldbank.org/source/world-development-indicators |
| OECD Economic Outlook |
https://data-explorer.oecd.org/ |
| FRED Economic Data |
https://fred.stlouisfed.org/ |
| International Relations |
Correlates of War Dataset |
https://correlatesofwar.org/ |
| UCDP Conflict Dataset |
https://ucdp.uu.se/ |
| GDELT Global Events Database |
https://www.gdeltproject.org/ |
| World Trade Organization Statistics |
https://stats.wto.org/ |
| SIPRI Military Expenditure Database |
https://www.sipri.org/databases/milex |
| Education |
National Center for Education Statistics |
https://catalog.data.gov/dataset?publisher=NationalCenterforEducationStatistics%28NCES%29 |
| OECD PISA Dataset |
https://www.oecd.org/pisa/data/ |
| World Bank Education Statistics |
https://databank.worldbank.org/source/education-statistics |
| Open University Learning Analytics Dataset |
https://analyse.kmi.open.ac.uk/open-dataset |
| UCI Student Performance Dataset |
https://archive.ics.uci.edu/ml/datasets/student+performance |
| Commerce |
UN Comtrade International Trade Data |
https://comtrade.un.org/ |
| World Bank Enterprise Surveys |
https://data360.worldbank.org/en/dataset/WB_ES |
| Retail Scanner Data (US Census) |
https://www.kaggle.com/datasets/census/retail-and-retailers-sales-time-series-collection |
| Eurostat Business Statistics |
https://ec.europa.eu/eurostat |
| Global Financial Data |
https://github.com/JerBouma/FinanceDatabase |
| Business Administration |
Harvard Dataverse Business Datasets |
https://dataverse.harvard.edu/ |
| Crunchbase Open Data Map |
https://data.crunchbase.com/ |
| Compustat Financial Dataset |
https://www.marketplace.spglobal.com/ |
| IBM HR Analytics Dataset |
https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset |
| Wharton Research Data Services |
https://wrds.wharton.upenn.edu/ |
| Physics |
CERN Open Data Portal |
https://opendata.cern.ch/ |
| NASA Physical Sciences Data |
https://pds.nasa.gov/ |
| LIGO Open Science Center |
https://losc.ligo.org/ |
| Materials Project Dataset |
https://materialsproject.org/ |
| NIST Physical Measurement Data |
https://www.nist.gov/data |
| Chemistry |
PubChem Database |
https://pubchem.ncbi.nlm.nih.gov/ |
| ChemSpider |
https://www.chemspider.com/ |
| NIST Chemistry WebBook |
https://webbook.nist.gov/chemistry/ |
| Harvard Clean Energy Project Dataset |
https://cepdb.molecularspace.org/ |
| QM9 Molecular Dataset |
https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/qm9.csv |
| Mathematics |
OEIS Integer Sequences |
https://oeis.org/ |
| L-Functions and Modular Forms Database |
https://www.lmfdb.org/ |
| UCI Mathematical Datasets |
https://archive.ics.uci.edu/ml/index.php |
| Numerical Dataset Archive |
https://www.kaggle.com/datasets/subhashinimariappan/numerical-dataset |
| Kaggle Mathematical Modeling Data |
https://www.kaggle.com/datasets/xinyilea/mathematical-modeling-data/data |
| Computational Science |
NERSC Scientific Data Repository |
https://www.nersc.gov |
| Argonne Leadership Computing Facility Data |
https://ieee-dataport.org/documents/argonne-leadership-computing-facility-data-catalog |
| NASA High-End Computing Data |
https://www.nas.nasa.gov/hecc/ |
| LANL Simulation Datasets |
https://www.kaggle.com/c/LANL-Earthquake-Prediction |
| Statistics |
StatLib Data Archive |
http://lib.stat.cmu.edu/datasets/ |
| UCI Machine Learning Repository |
https://archive.ics.uci.edu/ml/ |
| World Bank Statistical Data |
https://databank.worldbank.org/ |
| OECD Statistics |
https://stats.oecd.org/ |
| US Census Bureau Statistics |
https://www.kaggle.com/datasets/census/census-bureau-usa |
| Biology |
NCBI BioProject |
https://www.ncbi.nlm.nih.gov/bioproject/ |
| Ensembl Genome Database |
https://www.useast.ensembl.org/ |
| Human Protein Atlas |
https://www.proteinatlas.org/ |
| PDB Biological Structures |
https://www.rcsb.org/ |
| BioStudies Database |
https://www.ebi.ac.uk/biostudies/ |
| Botany |
TRY Plant Trait Database |
https://www.try-db.org/ |
| GBIF Plant Occurrence Data |
https://www.gbif.org/ |
| USDA PLANTS Database |
https://plants.sc.egov.usda.gov/ |
| Global Biodiversity Information Facility (Plants) |
https://www.gbif.org/dataset |
| Plant Phenotyping Dataset |
https://www.plant-phenotyping.org/datasets |
| Zoology |
GBIF Animal Occurrence Data |
https://www.gbif.org/ |
| PanTHERIA Mammal Traits Dataset |
https://esapubs.org/archive/ecol/E090/184/ |
| Animal Diversity Web Data |
https://animaldiversity.org/ |
| Movebank Animal Tracking Data |
https://www.kaggle.com/datasets/pulkit8595/movebank-animal-tracking |
| VertNet Vertebrate Dataset |
https://vertnet.org/ |
| Microbiology |
NCBI Genome Database |
https://www.ncbi.nlm.nih.gov/genome/ |
| PATRIC Bacterial Bioinformatics Resource |
https://www.patricbrc.org/ |
| IMG/M Microbial Genome Database |
https://img.jgi.doe.gov/ |
| Human Microbiome Project |
https://www.hmpdacc.org/ |
| MicrobiomeDB |
https://microbiomedb.org/ |
| Genetics |
NCBI Gene Database |
https://www.ncbi.nlm.nih.gov/gene/ |
| 1000 Genomes Project |
https://www.internationalgenome.org/ |
| GWAS Catalog |
https://www.ebi.ac.uk/gwas/ |
| ClinVar Genetic Variants |
https://www.ncbi.nlm.nih.gov/clinvar/ |
| OMIM Genetic Disorders Database |
https://www.omim.org/ |
| Genomics |
ENCODE Project Dataset |
https://www.encodeproject.org/ |
| GenBank |
https://www.ncbi.nlm.nih.gov/genbank/ |
| TCGA Genomic Data |
https://portal.gdc.cancer.gov/ |
| UCSC Genome Browser Data |
https://genome.ucsc.edu/ |
| ArrayExpress Genomics Data |
https://www.ebi.ac.uk/arrayexpress/ |
| Molecular Biology |
Protein Data Bank |
https://www.rcsb.org/docs/general-help/organization-of-3d-structures-in-the-protein-data-bank |
| UniProt Protein Database |
https://www.uniprot.org/ |
| BioGRID Interaction Dataset |
https://downloads.thebiogrid.org/BioGRID |
| STRING Protein Interaction Data |
https://string-db.org/ |
| Gene Expression Omnibus |
https://www.ncbi.nlm.nih.gov/sites/GDSbrowser/ |
| Immunology |
ImmPort Immunology Data |
https://www.immport.org/ |
| IEDB Immune Epitope Database |
https://www.iedb.org/ |
| Human Cell Atlas Immune Data |
https://www.data.humancellatlas.org/ |
| Vaccine Adverse Event Reporting System |
https://vaers.hhs.gov/data.html |
| FlowRepository Cytometry Data |
https://flowrepository.org/ |
| Neurobiology |
Allen Brain Atlas |
https://brain-map.org/ |
| OpenNeuro |
https://openneuro.org/ |
| Human Connectome Project |
https://www.humanconnectome.org/ |
| Neurodata Without Borders |
https://www.nwb.org/ |
| CRCNS Neural Data Repository |
https://crcns.org/data-sets |
| Bioinformatics |
NCBI Sequence Read Archive |
https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/ |
| KEGG Pathway Database |
https://www.genome.jp/kegg/ |
| Reactome Pathway Dataset |
https://reactome.org/download-data |
| BioMart Data Portal |
https://asia.ensembl.org/info/data/biomart/index.html? |
| Zenodo Bioinformatics Datasets |
https://www.ncbi.nlm.nih.gov/geo/ |
| Marine Biology |
NOAA Oceanographic Data |
https://www.nodc.noaa.gov/ |
| Coral Reef Monitoring Dataset |
https://www.kaggle.com/datasets/jxwleong/coral-reef-dataset |
| World Ocean Atlas |
https://www.ncei.noaa.gov/products/world-ocean-atlas |
| Marine Microbial Eukaryote Transcriptome Project |
https://gold.jgi.doe.gov/sraexperiment?id=SRX554091 |
| Wildlife Biology |
Movebank Wildlife Tracking Data |
https://www.movebank.org/ |
| Global Biodiversity Information Facility |
https://www.gbif.org/ |
| IUCN Red List Data |
https://www.iucnredlist.org/resources/spatial-data-download |
| Wildlife Insights Camera Trap Data |
https://www.wildlifeinsights.org/ |
| Living Planet Database |
https://livingplanetindex.org/data_portal |
| Human Biology |
UK Biobank |
https://www.ukbiobank.ac.uk/ |
| Human Protein Atlas |
https://www.proteinatlas.org/ |
| NHANES Health Dataset |
https://www.kaggle.com/datasets/cdc/national-health-and-nutrition-examination-survey |
| Human Cell Atlas |
https://www.humancellatlas.org/ |
| GTEx Gene Expression Dataset |
https://gtexportal.org/home/ |
| Robotics and Automation |
YCB Object and Model Set |
https://www.ycbbenchmarks.com/ |
| RoboNet Dataset |
https://github.com/SudeepDasari/RoboNet |
| DROID Robot Manipulation Dataset |
https://droid-dataset.github.io/ |
| Oxford RobotCar Dataset |
https://robotcar-dataset.robots.ox.ac.uk/ |