Research Made Reliable

Research Datasets

Titles Links
Computer Science Natural Language Processing (NLP) https://rajpurkar.github.io/SQuAD-explorer/
Computer Vision https://cocodataset.org/#download
Algorithms & Data Structures https://snap.stanford.edu/data/
Programming Languages / Code Analysis https://github.com/github/CodeSearchNet
Operating Systems https://github.com/google/cluster-data
Databases & Data Mining https://datasets.imdbws.com/
Computer Architecture / Hardware https://github.com/felixsteinke/cpu-spec-dataset
Information Technology Cloud Computing https://github.com/google/cluster-data
Software Engineering https://www.kaggle.com/datasets/syedmharis/software-engineering-interview-questions-dataset
IT Service Management https://www.kaggle.com/datasets/swapniljadhav96/itsm-dataset
Cybersecurity https://www.kaggle.com/datasets/teamincribo/cyber-security-attacks
User Behavior / Web Analytics https://archive.org/details/datasets
Electrical Engineering Power Systems https://ieee-dataport.org/documents/power-system-multi-source-events-dataset
Renewable Energy (Solar/Wind) https://www.nrel.gov/grid/solar-power-data.html
Smart Grid https://www.kaggle.com/datasets/ziya07/smart-grid-monitoring-dataset/data
Electrical Machines https://ieee-dataport.org/open-access/industrial-machines-dataset-electrical-load-disaggregation
Control Systems https://ieee-dataport.org/documents/dataset-bundle-building-automation-and-control-systems-security-analysis#
Electronics and Communication Engineering Digital Signal Processing (DSP) https://www.kaggle.com/datasets/emirhanai/advanced-signal-processing-dataset-from-ai-sensors
Wireless Communication https://catalog.data.gov/dataset/?tags=wireless-communications-and-networks
5G / Cellular Networks https://www.kaggle.com/datasets/vinothkannaece/5g-network-data
Antenna & RF Systems https://www.kaggle.com/datasets/suraj520/rf-signal-data
VLSI / IC Design https://github.com/vlsi/calcite-test-dataset
Biomedical PhysioNet (ECG, EEG, Vital Signs) https://physionet.org/
MIMIC-IV Clinical Database https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2
BraTS (Brain Tumor Segmentation) https://www.med.upenn.edu/cbica/brats2020/data.html
COVID-19 Radiography Database https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database
Renewable Energy NREL Solar Power Data https://github.com/Charlie5DH/Solar-Power-Datasets-and-Resources
NREL Wind Integration Datasets https://www.nrel.gov/grid/wind-toolkit.html
Global Energy Forecasting Competition (GEFCom) https://www.kaggle.com/competitions/GEF2012-wind-forecasting
Open Power System Data (Renewables) https://data.open-power-system-data.org/
Mechanical Engineering NASA Turbofan Engine Degradation Simulation https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/
IC Engine Vibration / Sound Datasets https://github.com/Charlie5DH/PredictiveMaintenance-and-Vibration-Resources
Structural Health Monitoring Sensor Data https://www.kaggle.com/datasets/ziya07/building-structural-health-sensor-dataset
Robotics / Control Benchmark Datasets https://github.com/mint-lab/awesome-robotics-datasets
Autonomous Vehicle Engineering KITTI Vision Benchmark Suite http://www.cvlibs.net/datasets/kitti/
Waymo Open Dataset https://waymo.com/open/
ApolloScape http://apolloscape.auto/
Civil Engineering Building Energy Dataset https://www.kaggle.com/c/ashrae-energy-prediction
Pavia University Remote Sensing https://www.kaggle.com/datasets/syamkakarla/pavia-university-hsi
UCI Concrete Compressive Strength https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
Chemical Engineering Catalysis Reaction Data (QM9 Molecules) https://quantum-machine.org/datasets/
Chemical Process Simulation (DREAM Challenge) https://zenodo.org/records/3735364
Industrial Chemical Sensor Data https://archive.ics.uci.edu/dataset/45/heart+disease
Process Systems Engineering Datasets (PSE) https://data.world/briannielsen/process-systems-engineering
Aerospace Engineering NASA Airfoil Self-Noise Dataset https://www.kaggle.com/datasets/fedesoriano/airfoil-selfnoise-dataset
UCI Flight Delay Dataset https://www.transtats.bts.gov/OT_Delay/
NASA Turbofan Engine Degradation (C-MAPSS) https://github.com/kpeters/exploring-nasas-turbofan-dataset
OpenAeroStruct (Aero-Struct Optimization) https://github.com/mdolab/OpenAeroStruct
ERA5 Atmospheric Reanalysis Data https://cds.climate.copernicus.eu/datasets
Industrial Engineering UCI Manufacturing Failure Detection https://www.kaggle.com/datasets/ziya07/smart-manufacturing-iot-cloud-monitoring-dataset
SECOM Semiconductor Manufacturing Data https://archive.ics.uci.edu/ml/datasets/SECOM
Tennessee Eastman Process Simulation https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/tennessee_eastman_process_simulation_dataset.md
Open Jobs/Workforce Data (BLS) https://www.bls.gov/data/
Assembly Line Sensor Data https://universe.roboflow.com/wd-rohcm/dataset-s7uii
Metallurgical Engineering Materials Data Repository (NIST) https://github.com/sedaoturak/data-resources-for-materials-science
Materials Project (Crystallography & Properties) https://materialsproject.org/
Open Quantum Materials Database (OQMD) https://colab.research.google.com/github/Tony-Y/oqmd-v1.2-dataset-for-cgnn/blob/main/OQMD_v1_2_dataset_for_CGNN.ipynb
Materials Science Engineering Materials Project Database https://materialsproject.org/
NIST Thermo-Calc Datasets https://www.nist.gov/programs/projects/thermo-calc-data
Citrine Materials Data https://citrine.io/media-post/data-highlight-materials-project-dataset/
Jarvis DFT Database https://jarvis.nist.gov/
Mechatronics Engineering UCI Human Activity Recognition Using Smartphones https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
Robotic Grasping Dataset (Cornell) https://www.kaggle.com/datasets/oneoneliu/cornell-grasp
OpenAI Gym Robotics Environments https://github.com/openai/robogym
Mobile Robot Navigation (TurtleBot Logs) https://zenodo.org/records/1188976
Inertial Measurement Unit (IMU) Motion Data https://www.kaggle.com/datasets/ziya07/ai-powered-imu-motion-dataset
Automobile Engineering KITTI Autonomous Driving Dataset http://www.cvlibs.net/datasets/kitti/
NUScenes AV Dataset https://www.kaggle.com/datasets/mitanshuchakrawarty/nuscenes
Car Evaluation Dataset https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
Vehicle Fuel Consumption (EPA) https://www.fueleconomy.gov/feg/download.shtml
Open Traffic Data (HERE) https://developer.here.com/products/traffic
Control Systems Engineering UCI PID Controller Benchmark Data https://archive.ics.uci.edu/ml/datasets/Servo
MATLAB/Simulink Control Test Cases (CORA) https://in.mathworks.com/matlabcentral/fileexchange/68551-cora
Benchmark Control System Models (DAE) https://www.cds.caltech.edu/~murray/wiki/
Aircraft Control Simulation Logs https://data.nas.nasa.gov/
Instrumentation & Control Engineering UCI Servo Control Dataset https://archive.ics.uci.edu/ml/datasets/Servo
PID Tuning Benchmark (MATLAB/Simulink logs) https://github.com/contractor-core/cora-benchmarks
Industrial Process Control Data (Tennessee Eastman) https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/tennessee_eastman_process_simulation_dataset.md
Embedded Systems Engineering WISDM Smartphone Data (Embedded Sensors) https://www.kaggle.com/datasets/antonandreenko/industrial-control-system-ics-alarm-text-dataset
OpenEmbedded Benchmark Dataset https://github.com/openembedded/
IoT Traffic Dataset (UCI) https://github.com/thieu1995/iot_dataset/blob/master/ReadMe.md
Arduino Sensor Dataset (UCI) https://archive.ics.uci.edu/dataset/506/human+activity+recognition+from+continuous+ambient+sensor+data
VLSI Design Engineering ISCAS Circuits Benchmark (VLSI Testing) https://www.kaggle.com/datasets/hemanthhari/vlsi-data
OpenROAD VLSI Data (Layout/Synthesis) https://github.com/The-OpenROAD-Project
ISPD Contest Benchmark Suites https://universe.roboflow.com/casproject/ispd
Microelectronics Engineering Microelectronic Failure Analysis Data https://www.kaggle.com/datasets/umerrtx/machine-failure-prediction-using-sensor-data
SEM Image Dataset (Materials) https://github.com/BAMresearch/automatic-sem-image-segmentation
Power Electronics Engineering Power Electronics Converter Data (Simulation) https://data.world/briannielsen/power-electronics
PEC Dataset (Inverter/Converter Logs) https://www.kaggle.com/datasets/rusuanjun/pec-dataset
Electric Vehicle Powertrain Data https://data.gov/transportation/
Grid-Connected Inverter Dataset https://www.nrel.gov/grid/solar-power-data.html
Biotechnology Engineering Genomic Data (NCBI SRA) https://www.ncbi.nlm.nih.gov/sra
TCGA Cancer Genomics Dataset https://portal.gdc.cancer.gov/
Human Microbiome Project Data https://github.com/awslabs/open-data-registry/tree/main/datasets
Protein Data Bank (PDB) https://catalog.data.gov/dataset/protein-data-bank-pdb
KEGG Pathway Database https://www.genome.jp/kegg/
Pharmaceutical Engineering PubChem BioAssay https://archive.ics.uci.edu/dataset/209/pubchem+bioassay+data
DrugBank (Drug Data) https://www.kaggle.com/datasets/aryelbezerra/drugbank-approved-drugs-dataset
ChEMBL (Bioactive Molecules) https://github.com/awslabs/open-data-registry/blob/main/datasets/chembl.yaml
Genetic Engineering NCBI Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/
1000 Genomes Project https://www.internationalgenome.org/data
Drosophila RNA-Seq (modENCODE) https://www.kaggle.com/datasets/tianjiechen/tcga-rna-datasets
Food Technology Engineering Food Composition Database (USDA) https://fdc.nal.usda.gov/
Food Quality & Safety Data (UCI) https://www.kaggle.com/datasets/nikhitaganiger/food-safety
Food Microbiology (FERMENTATION) https://data.world/makefoodsafe/fermentation
Agricultural Engineering UCI Crop Dataset (Plant Seedlings) https://www.kaggle.com/competitions/plant-seedlings-classification
FAOSTAT Agriculture Data https://www.fao.org/faostat/en/
Weather & Yield (Global) https://www.ecmwf.int/en/forecasts/datasets
Irrigation & Soil Moisture Data (USDA) https://catalog.data.gov/dataset/?tags=irrigation
Dairy Technology Engineering Milk Composition Database (IDF) https://github.com/saideepaknagaraj/Milk-spectra-analysis
UCI Fermented Dairy Dataset https://www.kaggle.com/datasets/suraj520/dairy-goods-sales-dataset
Power Systems Engineering NREL Renewable Integration Data https://registry.opendata.aws/nrel-pds-wtk/
European Transmission System Data https://www.entsoe.eu/data/
UCI Electrical Grid Stability https://archive.ics.uci.edu/dataset/471/electrical+grid+stability+simulated+data
Geological Engineering USGS Earthquake Catalog https://www.kaggle.com/datasets/rupindersinghrana/usgs-earthquakes-2024
OneGeology Global Geoscience Data https://www.kaggle.com/competitions/geology-forecast-challenge-open
Mineral Resources Data System (USGS) https://github.com/DOI-USGS/dataretrieval-python
Geo-Environmental Engineering Global Soil Data (ISRIC-World Soil Information) https://data.nasa.gov/dataset/global-data-set-of-derived-soil-properties-0-5-degree-grid-isric-wise-3fec2
Air Quality Open Data (OpenAQ) https://openaq.org/
Water Quality Portal (USGS + EPA) https://www.waterqualitydata.us/
NASA Earthdata Environmental Datasets https://www.kaggle.com/datasets/ivansher/nasa-nearest-earth-objects-1910-2024
Nanotechnology Engineering Nanomaterial Registry https://github.com/NanoCommons/datasets
Materials Project (Nano/Crystalline Data) https://materialsproject.org/
PubChem Nanomaterials https://pubchem.ncbi.nlm.nih.gov/
Networking CAIDA Internet Traffic Dataset https://www.caida.org/data/passive/
MAWI Working Group Traffic Archive https://mawi.wide.ad.jp/mawi/
Internet Traffic Archive (ITA) https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset/data
Cybersecurity CSE-CIC-IDS 2018 https://www.unb.ca/cic/datasets/ids-2018.html
MITRE ATT&CK Evaluations Dataset https://github.com/mitre-attack/attack-stix-data
DARPA Intrusion Detection Dataset https://www.ll.mit.edu/r-d/datasets
Network Security UNSW-NB15 Dataset https://research.unsw.edu.au/projects/unsw-nb15-dataset
NSL-KDD Dataset https://www.unb.ca/cic/datasets/nsl.html
Bot-IoT Dataset https://research.unsw.edu.au/projects/bot-iot-dataset
Wireless Sensor Network (WSN) Intel Berkeley Research Lab Sensor Dataset http://db.csail.mit.edu/labdata/labdata.html
LUCE Environmental Sensor Dataset https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k
UCI Sensorless Drive Diagnosis https://archive.ics.uci.edu/datasets?search=Sensorless_drive_diagnosis
Wireless Communication DeepSig RadioML Dataset https://github.com/sofwerx/deepsig_datasets
Wireless InSite Channel Dataset https://www.qualcomm.com/developer/software/wireless-indoor-simulations-dataset
NYU Wireless mmWave Channel Measurements https://archive.ics.uci.edu/datasets?search=Sensorless_drive_diagnosis
Network Communication Stanford SNAP Communication Networks https://snap.stanford.edu/data/
Email Communication Network (Enron) https://www.cs.cmu.edu/~enron/
EU Email Communication Dataset https://snap.stanford.edu/data/email-Eu-core.html
Satellite Communication NASA Space Communications Dataset https://data.nasa.gov/
ESA Satellite Telemetry Data https://www.kaggle.com/datasets/sammahoney/esa-anomaly-dataset
SATCOM Channel Measurement Dataset https://github.com/clarkzjw/LENS
Telecommunication ITU Telecommunication Indicators https://data360.worldbank.org/en/dataset/ITU_DH
Ofcom Telecom Market Data https://www.ofcom.org.uk/research-and-data
Telecom Italia Network Traffic Dataset https://doi.org/10.7910/DVN/3QBYB5
Broadband Measurement Data (FCC MBA) https://www.fcc.gov/general/measuring-broadband-america
CAIDA Internet Topology Data https://www.caida.org/data/
Edge Computing IoT Edge Analytics Dataset (UCI) https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot
Edge AI Sensor Dataset (WISDM) https://archive.ics.uci.edu/ml/datasets/WISDM+Smartphone+and+Smartwatch+Activity+and+Biometry+Dataset
Azure IoT Edge Telemetry Data https://www.kaggle.com/code/chaozhuang/iot-telemetry-sensor-data-analysis
Fog Computing Smart City Fog Computing Dataset https://github.com/IBM/smart-city-analytics
IoT-Fog Resource Usage Dataset https://www.kaggle.com/datasets/ziya07/multi-tier-iot-resource-allocation-dataset
Vehicular Fog Computing Dataset https://github.com/aniketmaurya/vehicular-fog-dataset
Cloud Fog Task Scheduling Dataset https://data.world/uci/fog-computing
Smart Healthcare Fog Dataset https://www.kaggle.com/datasets/acharyakamal/smart-healthcare-prediction-management-system
Optical Communication Optical Fiber Channel Dataset https://github.com/functions-lab/COSMOS-EDFA-Dataset
Coherent Optical Communication Dataset https://zenodo.org/records/4553836
Optical Signal Modulation Dataset https://ieee-dataport.org/open-access/optical-communication-datasets
Nonlinear Fiber Optics Dataset https://catalog.data.gov/dataset/?tags=fiber-optic
Optical Noise Measurement Dataset https://zenodo.org/records/8392622
Optical Network GÉANT Network Topology Data https://ieee-dataport.org/documents/traffic-datsets-abilene-geant-taxibj
Optical Transport Network Dataset (OTN) https://ieee-dataport.org/open-access/optical-network-datasets
WDM Network Benchmark Dataset https://www.kaggle.com/competitions/ofc-2026-ml-challenge
Flex-Grid Optical Network Dataset https://zenodo.org/records/3696817
Cellular Network CRAwdAD Cellular Network Traces https://crawdad.org/
OpenCellID (Cell Tower Data) https://www.opencellid.org/
MIT Reality Mining Dataset http://realitycommons.media.mit.edu/realitymining.html
Telecom Italia Mobile Dataset https://dandelion.eu/datamine/open-big-data/
5G Dataset (InterDigital) https://www.kaggle.com/datasets/vinothkannaece/5g-network-data
Mobile Communication UCI Human Mobility Dataset https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones
Mobile Phone Usage Dataset (D4D Orange) https://www.kaggle.com/datasets/bhadramohit/smartphone-usage-and-behavioral-dataset
MIT Smartphone Sensing Dataset https://www.kaggle.com/datasets/prince7489/smartphone-usage-dataset
Wireless Mobility Traces (CRAWDAD) https://crawdad.org/
Cell Phone Activity Dataset (Telecom Italia) https://www.kaggle.com/code/ijfezika/mobile-phone-activity-exploratory-analysis
Distributed Computing Google Cluster Workload Traces https://github.com/google/cluster-data
Alibaba Cluster Trace Dataset https://github.com/alibaba/clusterdata
Grid Workload Archive https://gwa.ewi.tudelft.nl/
HPC Job Scheduling Dataset https://zenodo.org/records/3634616
Distributed Systems Benchmark (DeathStarBench) https://github.com/delimitrou/DeathStarBench
Cloud Computing Google Cloud Trace Dataset https://github.com/google/cluster-data
Azure Public Cloud Dataset https://www.kaggle.com/datasets/rishi2123/oragnizations-expenses-2023-2024
AWS Open Data Registry https://registry.opendata.aws/
OpenDC Cloud Workload Dataset https://github.com/atlarge-research/opendc
Bitbrains Cloud Workload Traces https://www.kaggle.com/datasets/gauravdhamane/gwa-bitbrains
Computer Vision COCO (Common Objects in Context) https://cocodataset.org/
Pascal VOC http://host.robots.ox.ac.uk/pascal/VOC/
Cityscapes https://www.cityscapes-dataset.com/
Open Images Dataset https://storage.googleapis.com/openimages/web/index.html
Pattern Recognition MNIST Handwritten Digits http://yann.lecun.com/exdb/mnist/
EMNIST Extended Digits & Letters https://www.nist.gov/itl/products-and-services/emnist-dataset
UCI Optical Recognition of Handwritten Digits https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
ISOLET Spoken Letter Dataset https://archive.ics.uci.edu/ml/datasets/isolet
Caltech 101 https://data.caltech.edu/records/20086
Remote Sensing Landsat Satellite Imagery https://landsat.gsfc.nasa.gov/data/
Sentinel-2 Satellite Data https://www.kaggle.com/datasets/salmaadell/eurosat-rgb
UC Merced Land Use Dataset https://www.kaggle.com/datasets/abdulhasibuddin/uc-merced-land-use-dataset
ISPRS Aerial Image Dataset https://github.com/whuwuteng/Aerial_Stereo_Dataset
MODIS Earth Observation Data https://modis.gsfc.nasa.gov/data/
Natural Language Processing (NLP) SQuAD (Question Answering) https://rajpurkar.github.io/SQuAD-explorer/
WikiText Language Modeling Dataset https://www.kaggle.com/datasets/rohitgr/wikitext
Common Crawl Text Corpus https://www.kaggle.com/datasets/jyesawtellrickson/commoncrawl
IMDB Movie Reviews https://ai.stanford.edu/~amaas/data/sentiment/
Image Processing Berkeley Segmentation Dataset (BSDS500) https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/
USC SIPI Image Database https://sipi.usc.edu/database/
Set12 Image Denoising Dataset https://github.com/cszn/DnCNN
DIV2K High-Resolution Images https://data.vision.ee.ethz.ch/cvl/DIV2K/
Kodak Image Dataset https://r0k.us/graphics/kodak/
Signal Processing MIT-BIH Arrhythmia ECG Dataset https://physionet.org/content/mitdb/
Speech Commands Dataset https://www.tensorflow.org/datasets/catalog/speech_commands
RadioML Signal Modulation Dataset https://www.kaggle.com/datasets/pinxau1000/radioml2018
UCI Gas Sensor Array Drift Dataset https://archive.ics.uci.edu/dataset/224/gas+sensor+array+drift+dataset
EEG Motor Movement Dataset https://physionet.org/content/eegmmidb/
Biomedical PhysioNet Clinical Signals https://physionet.org/
MIMIC-IV Clinical Database https://www.kaggle.com/datasets/montassarba/mimic-iv-clinical-database-demo-2-2
BraTS Brain Tumor Dataset https://www.med.upenn.edu/cbica/brats2020/data.html
NIH Chest X-ray Dataset https://nihcc.app.box.com/v/ChestXray-NIHCC
ADNI Alzheimer’s Dataset https://adni.loni.usc.edu/
Big Data Google Cluster Workload Traces https://github.com/google/cluster-data
Amazon Reviews Dataset https://registry.opendata.aws/amazon-reviews/
NYC Taxi Trip Records https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Wikipedia Data Dumps https://dumps.wikimedia.org/
Common Crawl Web Data https://www.kaggle.com/datasets/jyesawtellrickson/commoncrawl
Software Engineering PROMISE Software Defect Dataset https://github.com/feiwww/PROMISE-backup
CodeSearchNet https://github.com/github/CodeSearchNet
Apache Software Logs https://www.kaggle.com/datasets/omduggineni/loghub-apache-log-data
GitHub Public Dataset (BigQuery) https://cloud.google.com/bigquery/public-data/github
NASA Software Defect Dataset https://github.com/klainfo/NASADefectDataset
Power Electronics Power Converter Dataset (Zenodo) https://zenodo.org/records/3606180
PEMS Power Electronics Measurements https://www.kaggle.com/datasets/sepandhaghighi/proton-exchange-membrane-pem-fuel-cell-dataset
Inverter Fault Diagnosis Dataset https://www.kaggle.com/datasets/ziya07/fault-diagnosis-dataset-for-new-energy-vehicles
Electric Drive & Converter Dataset (UCI) https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014
Power Systems IEEE Power System Test Cases https://labs.ece.uw.edu/pstca/
MATPOWER Power Grid Data https://matpower.org/download/
ENTSO-E Electricity Network Data https://www.entsoe.eu/data/
NREL Power System Data https://www.nrel.gov/grid/data-tools.html
UCI Electrical Grid Stability Dataset https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data
Open Power System Data https://data.open-power-system-data.org/
Wind Turbine / Solar Energy NREL Wind Toolkit https://www.nrel.gov/grid/wind-toolkit.html
NREL Solar Power Data https://www.nrel.gov/grid/solar-power-data.html
Wind Turbine SCADA Dataset https://data.world/energi/wind-turbine-scada
Global Solar Atlas Data https://gee-community-catalog.org/projects/gsa/
GEFCom Renewable Forecasting Dataset https://www.kaggle.com/competitions/GEF2012-wind-forecasting
COCO Dataset https://cocodataset.org/
Open Images Dataset https://storage.googleapis.com/openimages/web/index.html
AI2 ARC Reasoning Dataset https://allenai.org/data/arc
CLEVR Reasoning Dataset https://cs.stanford.edu/people/jcjohns/clevr/
Artificial Intelligence UCI Machine Learning Repository https://archive.ics.uci.edu/ml/index.php
OpenML Benchmark Datasets https://www.openml.org/
LIBSVM Dataset Collection https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
StatLib Datasets https://lib.stat.cmu.edu/datasets/
Deep Learning MNIST https://git-disl.github.io/GTDLBench/datasets/mnist_datasets/
CIFAR-10 / CIFAR-100 https://www.cs.toronto.edu/~kriz/cifar.html
SVHN Dataset http://ufldl.stanford.edu/housenumbers/
CelebA Face Dataset https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
ImageNet-21K https://www.image-net.org/download
DeepSig RadioML Dataset https://www.deepsig.ai/datasets/
AI LLM (Large Language Models) The Pile (Massive Text Corpus) https://pile.eleuther.ai/
C4 (Colossal Clean Crawled Corpus) https://www.tensorflow.org/datasets/catalog/c4
WikiText-103 https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-103-raw-v1
OpenWebText https://github.com/jcpeterson/openwebtext
BigScience ROOTS Corpus https://huggingface.co/bigscience-corpus
AI SLM Alpaca Instruction Dataset https://github.com/tatsu-lab/stanford_alpaca
FLAN Instruction Dataset https://github.com/google-research/FLAN
Dolly Instruction Dataset https://huggingface.co/datasets/databricks/databricks-dolly-15k
TinyStories Dataset https://huggingface.co/datasets/roneneldan/TinyStories
Artificial General Intelligence ARC (Abstraction and Reasoning Corpus) https://allenai.org/data/arc
BabyAI Platform & Dataset https://github.com/mila-iqia/babyai
bAbI Reasoning Tasks https://github.com/facebookarchive/bAbI-tasks
CLEVR Compositional Reasoning Dataset https://cs.stanford.edu/people/jcjohns/clevr/
Neuro-Symbolic AI CLEVRER (Causal & Symbolic Reasoning) http://clevrer.csail.mit.edu/
DeepProbLog Datasets https://github.com/ML-KULeuven/deepproblog
Abduction and Argumentation Dataset https://github.com/AbductiveLearning/ABLSim
Logic Tensor Networks Benchmarks https://github.com/logictensornetworks/ltntorch
Cognitive Computing OpenCog Cognitive Datasets https://github.com/opencog/opencog
ATOMIC Commonsense Knowledge Graph https://allenai.org/data/atomic
ConceptNet https://conceptnet.io/
MindBigData (Human Thought Data) http://www.mindbigdata.com/opendb/
Self-Supervised Learning ImageNet (Unlabeled / SSL Use) https://www.image-net.org/
STL-10 Dataset https://cs.stanford.edu/~acoates/stl10/
AudioSet (Self-Supervised Audio) https://research.google.com/audioset/
Kinetics Video Dataset https://github.com/cvdfoundation/kinetics-dataset
LibriSpeech (SSL for Speech) https://www.openslr.org/12
Federated Learning LEAF Federated Learning Benchmark https://leaf.cmu.edu/
FedScale Dataset Suite https://github.com/SymbioticLab/FedScale
Google Federated EMNIST https://figshare.com/articles/dataset/Federated_EMNIST_Dataset/26308777
NIID-Bench Federated Dataset https://github.com/Xtra-Computing/NIID-Bench
Explainable AI UCI Adult Dataset (XAI Benchmark) https://archive.ics.uci.edu/ml/datasets/adult
COMPAS Recidivism Dataset https://www.kaggle.com/datasets/danofer/compass
OpenML Explainability Benchmarks https://www.openml.org/search?type=data
MIMIC-IV (Clinical XAI) https://mimic.physionet.org/
FICO Explainable ML Challenge Dataset https://www.kaggle.com/datasets/lhagiimn/fico-dataset
Quantum Machine Learning QML Benchmark Datasets (IBM) https://huggingface.co/datasets/Cohaerence/ibm-qml-kernel
Quantum Data Sets (UCI-style) https://quantum-machine.org/datasets/
QASM Circuit Dataset https://github.com/FujiiLabCollaboration/MNISQ-quantum-circuit-dataset
QML Toy Datasets (PennyLane) https://pennylane.ai/datasets/collection/qml-benchmarks
Edge AI / TinyML Google Speech Commands (TinyML) https://www.tensorflow.org/datasets/catalog/speech_commands
MLPerf Tiny Benchmark Dataset https://github.com/mlcommons/tiny
Edge Impulse Public Datasets https://docs.edgeimpulse.com/datasets
UCI HAR (Embedded Sensors) https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
WISDM Wearable Dataset https://archive.ics.uci.edu/dataset/507/wisdm+smartphone+and+smartwatch+activity+and+biometrics+dataset
Generative AI LAION-5B Multimodal Dataset https://laion.ai/blog/laion-5b/
The Pile (Text Generation) https://pile.eleuther.ai/
Common Crawl https://commoncrawl.org/
CelebA (Image Generation) https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
MusicNet (Audio Generation) https://homes.cs.washington.edu/~thickstn/musicnet.html
CodeSearchNet (Code Generation) https://github.com/github/CodeSearchNet
Neuromorphic Computing Spiking Heidelberg Digits (SHD) https://ieee-dataport.org/open-access/heidelberg-spiking-datasets
Spiking Speech Commands https://zenkelab.org/resources/spiking-heidelberg-datasets-shd/
DVS Gesture Dataset https://github.com/VicenteAlex/DVS-Gesture-Chain?tab=readme-ov-file
N-MNIST https://www.garrickorchard.com/datasets/n-mnist
Data Science and Analytics UCI Adult Income Dataset https://archive.ics.uci.edu/ml/datasets/adult
OpenML Benchmark Suite https://www.openml.org/search?type=data
World Bank Open Data https://data.worldbank.org/
NYC Open Data https://www.kaggle.com/datasets/nycopendata/new-york
Self-Supervised Learning YCB Object and Model Set https://www.ycbbenchmarks.com/
RoboNet Dataset https://github.com/SudeepDasari/RoboNet
DROID Robot Manipulation Dataset https://droid-dataset.github.io/
Oxford RobotCar Dataset https://robotcar-dataset.robots.ox.ac.uk/
Signals and Systems MIT-BIH Arrhythmia Database https://physionet.org/content/mitdb/
ECG-ID Database https://physionet.org/content/ecgiddb/
NOAA Signal Data https://www.ngdc.noaa.gov/
UCR Time Series Archive https://www.timeseriesclassification.com/
PhysioNet Signal Archive https://physionet.org/about/database/
Blockchain Ethereum Blockchain Dataset https://www.kaggle.com/datasets/bigquery/ethereum-blockchain
Bitcoin Historical Data https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data
Elliptic Bitcoin Dataset https://www.kaggle.com/datasets/ellipticco/elliptic-data-set
Blockchain.com Charts Data https://www.blockchain.com/charts
5G Network ITU IMT-2020 Evaluation Data https://www.itu.int/md/meetingdoc.asp?lang=en&parent=R19-IMT.2020.SAT-C&source=WP4B
Open5GCore Datasets https://github.com/OPENAIRINTERFACE/openairinterface5g
5G NR Channel Model Data https://zenodo.org/records/15210986
VANET VeReMi Dataset https://veremi-dataset.github.io/
Luxembourg SUMO Traffic Dataset https://github.com/lcodeca/LuSTScenario
NGSIM Vehicle Trajectories https://www.kaggle.com/datasets/nigelwilliams/ngsim-vehicle-trajectory-data-us-101
TAPAS Cologne Traffic Dataset https://sumo.dlr.de/docs/Data/Scenarios/TAPASCologne.html
V2X Communication OPV2V Autonomous Driving Dataset https://mobility-lab.seas.ucla.edu/opv2v/
DAIR-V2X Dataset https://github.com/AIR-THU/DAIR-V2X?tab=readme-ov-file#dataset
V2X-Sim Dataset https://github.com/ai4ce/V2X-Sim
OFDM Wireless Communication COST 207 Channel Model Data https://onlinelibrary.wiley.com/doi/epdf/10.1002/0470847808.app5
IEEE 802.11 OFDM Signal Dataset https://ieee-dataport.org/open-access/deepwiphy-synthetic-and-real-world-ieee-80211ax-ofdm-symbol-dataset
Wireless InSite Ray-Tracing Dataset https://github.com/sowang46/mmWave_V2X_dataset
MANET (Mobile Ad Hoc Networks) CRAWDAD MANET Traces https://crawdad.org/
MIT MANET Mobility Dataset https://tracebase.org/tracebase/
Manet Routing Dataset https://www.kaggle.com/datasets/gymprathap/manet-routing-dataset
SDN (Software Defined Networking) ARP SDN Traffic Dataset https://github.com/nisha077/ARP-SDN-Dataset
InSDN Dataset https://www.kaggle.com/datasets/badcodebuilder/insdn-dataset
SDN DDoS Dataset https://ieee-dataport.org/documents/sdn-ddos-attack-image-dataset
Mininet Traffic Dataset https://data.mendeley.com/datasets/9hz6f62gtk/1
Underwater Sensor Network SFI Smart Ocean Acoustic Dataset https://ieee-dataport.org/open-access/sfi-smart-ocean-dataset-underwater-acoustic-communications
Underwater Acoustic Communication Dataset https://ieee-dataport.org/documents/band-full-duplex-underwater-acoustic-communication-measurements-lake-environment
UUV Simulator Data https://catalog.data.gov/dataset/teamer-electrically-engaged-undulation-system-for-unmanned-underwater-vehicles-7ffb5
Sea Trial Acoustic Dataset https://zenodo.org/records/6372728
IoT (Internet of Things) IoT-23 Dataset https://www.stratosphereips.org/datasets-iot23
TON_IoT Dataset https://research.unsw.edu.au/projects/toniot-datasets
Intel Lab IoT Sensor Dataset https://db.csail.mit.edu/labdata/labdata.html
Smart Home IoT Dataset https://www.kaggle.com/datasets/taranvee/smart-home-dataset-with-weather-information
Quantum Networking Quantum Internet Dataset https://zenodo.org/records/17504715
IBM Quantum Network Data https://quantum-computing.ibm.com/services/resources
Quantum Entanglement Network Dataset https://zenodo.org/records/8279583
QKD Experimental Dataset https://archive.researchdata.leeds.ac.uk/1285/
6G Networks 6G Channel Measurement Dataset https://github.com/ocatak/6g-channel-estimation-dataset
Terahertz Communication Dataset https://ieee-dataport.org/documents/measurement-based-parameterization-physics-reflection-models-terahertz-communication-s21
Hexa-X 6G Dataset https://zenodo.org/records/17396743
AI-enabled 6G Network Dataset https://www.kaggle.com/datasets/ziya07/dynamic-network-slicing-dataset-in-6g-networks
Network Routing Rocketfuel ISP Topology Dataset https://www.cs.washington.edu/research/networking/rocketfuel/
CAIDA Internet Topology Data https://www.caida.org/catalog/datasets/
INET Routing Dataset https://www.kaggle.com/datasets/asfandyar250/network
Opensource Routing Traces https://github.com/BNN-UPC/NetworkModelingDatasets
Intrusion Detection System CIC-IDS2017 https://www.unb.ca/cic/datasets/ids-2017.html
UNSW-NB15 https://research.unsw.edu.au/projects/unsw-nb15-dataset
NSL-KDD https://www.unb.ca/cic/datasets/nsl.html
TII-SSRC-23 https://ieee-dataport.org/documents/tii-ssrc-23-dataset-edited
DARPA IDS Dataset https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset
MIMO (Multiple Input Multiple Output) DeepMIMO Dataset https://github.com/DeepMIMO/DeepMIMO
NYU Wireless mmWave MIMO Dataset https://github.com/nyu-wireless/mmwRobotNav
Massive MIMO Channel Measurements https://ieee-dataport.org/open-access/beamspace-channel-dataset-mmwave-massive-mimo
COST 2100 MIMO Dataset https://www.kaggle.com/datasets/forment/cost2100
Cognitive Radio Networks CRAWDAD Spectrum Occupancy Measurements https://crawdad.org/
Spectrum Measurement Dataset https://www.kaggle.com/datasets/ajithdari/cass-spectrum-dataset
Electrosense Radio Spectrum Dataset https://zenodo.org/records/7521246
IEEE 802.22 WRAN Simulation Data https://www.ieee802.org/22/
Digital Forensics Digital Corpora Forensic Images https://digitalcorpora.org/
DFRWS Forensic Challenge Datasets https://www.dfrws.org/forensic-challenges/
NIST CFReDS Dataset https://cfreds.nist.gov/
UC Irvine Memory Forensics Dataset https://daniyyell.com/datasets/Memory-Forensics-Attack-Simulation-Dataset/
Wireless Body Area Network (WBAN) MHEALTH Dataset https://archive.ics.uci.edu/ml/datasets/mhealth+dataset
WISDM Wearable Sensor Dataset https://www.cis.fordham.edu/wisdm/dataset.php
BSN Challenge Dataset https://physionet.org/content/bhi-2018-challenge/1.0/
LTE (Long Term Evolution) OpenAirInterface LTE Dataset https://data.europa.eu/data/datasets/oai-zenodo-org-10811147?locale=de
LTE Drive Test Dataset https://ieee-dataport.org/open-access/technical-university-denmark-lte-drive-test-measurements
Vienna LTE-A Link Level Simulator Data https://arxiv.org/html/2603.02638v1
Ad Hoc Networks MIT Reality Mining Dataset http://realitycommons.media.mit.edu/realitymining.html
FAN-GHETS24 Ad Hoc Dataset https://zenodo.org/records/13315419
Helsinki Mobility Traces https://www.tracebase.org/tracebase/
Forensic Science Digital Corpora Forensic Images https://digitalcorpora.org/
NIST CFReDS https://cfreds.nist.gov/
DFRWS Forensic Challenge Datasets https://www.dfrws.org/forensic-challenges/
GovDocs1 Forensic Corpus https://digitalcorpora.org/corpora/govdocs
Psychology Open Psychometrics Data https://openpsychometrics.org/_rawdata/
Human Connectome Project https://www.humanconnectome.org/
Child Mind Institute Dataset https://www.kaggle.com/competitions/child-mind-institute-problematic-internet-use/data
MIDUS Psychological Study https://midus.wisc.edu/data-access/
APA Open Data Repository https://www.apa.org/pubs/databases
Public Administration World Bank Governance Indicators https://www.imf.org/en/publications/sprolls/world-economic-outlook-databases
OECD Public Governance Data https://oecd-public-integrity-indicators.org/indicators/
UN Public Administration Dataset https://publicadministration.un.org/
USA Government Open Data https://www.data.gov/
European Open Government Data https://data.europa.eu/
Economics Penn World Table https://www.rug.nl/ggdc/productivity/pwt/
IMF World Economic Outlook Data https://www.data.imf.org/en
World Bank Development Indicators https://databank.worldbank.org/source/world-development-indicators
OECD Economic Outlook https://data-explorer.oecd.org/
FRED Economic Data https://fred.stlouisfed.org/
International Relations Correlates of War Dataset https://correlatesofwar.org/
UCDP Conflict Dataset https://ucdp.uu.se/
GDELT Global Events Database https://www.gdeltproject.org/
World Trade Organization Statistics https://stats.wto.org/
SIPRI Military Expenditure Database https://www.sipri.org/databases/milex
Education National Center for Education Statistics https://catalog.data.gov/dataset?publisher=NationalCenterforEducationStatistics%28NCES%29
OECD PISA Dataset https://www.oecd.org/pisa/data/
World Bank Education Statistics https://databank.worldbank.org/source/education-statistics
Open University Learning Analytics Dataset https://analyse.kmi.open.ac.uk/open-dataset
UCI Student Performance Dataset https://archive.ics.uci.edu/ml/datasets/student+performance
Commerce UN Comtrade International Trade Data https://comtrade.un.org/
World Bank Enterprise Surveys https://data360.worldbank.org/en/dataset/WB_ES
Retail Scanner Data (US Census) https://www.kaggle.com/datasets/census/retail-and-retailers-sales-time-series-collection
Eurostat Business Statistics https://ec.europa.eu/eurostat
Global Financial Data https://github.com/JerBouma/FinanceDatabase
Business Administration Harvard Dataverse Business Datasets https://dataverse.harvard.edu/
Crunchbase Open Data Map https://data.crunchbase.com/
Compustat Financial Dataset https://www.marketplace.spglobal.com/
IBM HR Analytics Dataset https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset
Wharton Research Data Services https://wrds.wharton.upenn.edu/
Physics CERN Open Data Portal https://opendata.cern.ch/
NASA Physical Sciences Data https://pds.nasa.gov/
LIGO Open Science Center https://losc.ligo.org/
Materials Project Dataset https://materialsproject.org/
NIST Physical Measurement Data https://www.nist.gov/data
Chemistry PubChem Database https://pubchem.ncbi.nlm.nih.gov/
ChemSpider https://www.chemspider.com/
NIST Chemistry WebBook https://webbook.nist.gov/chemistry/
Harvard Clean Energy Project Dataset https://cepdb.molecularspace.org/
QM9 Molecular Dataset https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/qm9.csv
Mathematics OEIS Integer Sequences https://oeis.org/
L-Functions and Modular Forms Database https://www.lmfdb.org/
UCI Mathematical Datasets https://archive.ics.uci.edu/ml/index.php
Numerical Dataset Archive https://www.kaggle.com/datasets/subhashinimariappan/numerical-dataset
Kaggle Mathematical Modeling Data https://www.kaggle.com/datasets/xinyilea/mathematical-modeling-data/data
Computational Science NERSC Scientific Data Repository https://www.nersc.gov
Argonne Leadership Computing Facility Data https://ieee-dataport.org/documents/argonne-leadership-computing-facility-data-catalog
NASA High-End Computing Data https://www.nas.nasa.gov/hecc/
LANL Simulation Datasets https://www.kaggle.com/c/LANL-Earthquake-Prediction
Statistics StatLib Data Archive http://lib.stat.cmu.edu/datasets/
UCI Machine Learning Repository https://archive.ics.uci.edu/ml/
World Bank Statistical Data https://databank.worldbank.org/
OECD Statistics https://stats.oecd.org/
US Census Bureau Statistics https://www.kaggle.com/datasets/census/census-bureau-usa
Biology NCBI BioProject https://www.ncbi.nlm.nih.gov/bioproject/
Ensembl Genome Database https://www.useast.ensembl.org/
Human Protein Atlas https://www.proteinatlas.org/
PDB Biological Structures https://www.rcsb.org/
BioStudies Database https://www.ebi.ac.uk/biostudies/
Botany TRY Plant Trait Database https://www.try-db.org/
GBIF Plant Occurrence Data https://www.gbif.org/
USDA PLANTS Database https://plants.sc.egov.usda.gov/
Global Biodiversity Information Facility (Plants) https://www.gbif.org/dataset
Plant Phenotyping Dataset https://www.plant-phenotyping.org/datasets
Zoology GBIF Animal Occurrence Data https://www.gbif.org/
PanTHERIA Mammal Traits Dataset https://esapubs.org/archive/ecol/E090/184/
Animal Diversity Web Data https://animaldiversity.org/
Movebank Animal Tracking Data https://www.kaggle.com/datasets/pulkit8595/movebank-animal-tracking
VertNet Vertebrate Dataset https://vertnet.org/
Microbiology NCBI Genome Database https://www.ncbi.nlm.nih.gov/genome/
PATRIC Bacterial Bioinformatics Resource https://www.patricbrc.org/
IMG/M Microbial Genome Database https://img.jgi.doe.gov/
Human Microbiome Project https://www.hmpdacc.org/
MicrobiomeDB https://microbiomedb.org/
Genetics NCBI Gene Database https://www.ncbi.nlm.nih.gov/gene/
1000 Genomes Project https://www.internationalgenome.org/
GWAS Catalog https://www.ebi.ac.uk/gwas/
ClinVar Genetic Variants https://www.ncbi.nlm.nih.gov/clinvar/
OMIM Genetic Disorders Database https://www.omim.org/
Genomics ENCODE Project Dataset https://www.encodeproject.org/
GenBank https://www.ncbi.nlm.nih.gov/genbank/
TCGA Genomic Data https://portal.gdc.cancer.gov/
UCSC Genome Browser Data https://genome.ucsc.edu/
ArrayExpress Genomics Data https://www.ebi.ac.uk/arrayexpress/
Molecular Biology Protein Data Bank https://www.rcsb.org/docs/general-help/organization-of-3d-structures-in-the-protein-data-bank
UniProt Protein Database https://www.uniprot.org/
BioGRID Interaction Dataset https://downloads.thebiogrid.org/BioGRID
STRING Protein Interaction Data https://string-db.org/
Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/sites/GDSbrowser/
Immunology ImmPort Immunology Data https://www.immport.org/
IEDB Immune Epitope Database https://www.iedb.org/
Human Cell Atlas Immune Data https://www.data.humancellatlas.org/
Vaccine Adverse Event Reporting System https://vaers.hhs.gov/data.html
FlowRepository Cytometry Data https://flowrepository.org/
Neurobiology Allen Brain Atlas https://brain-map.org/
OpenNeuro https://openneuro.org/
Human Connectome Project https://www.humanconnectome.org/
Neurodata Without Borders https://www.nwb.org/
CRCNS Neural Data Repository https://crcns.org/data-sets
Bioinformatics NCBI Sequence Read Archive https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/
KEGG Pathway Database https://www.genome.jp/kegg/
Reactome Pathway Dataset https://reactome.org/download-data
BioMart Data Portal https://asia.ensembl.org/info/data/biomart/index.html?
Zenodo Bioinformatics Datasets https://www.ncbi.nlm.nih.gov/geo/
Marine Biology NOAA Oceanographic Data https://www.nodc.noaa.gov/
Coral Reef Monitoring Dataset https://www.kaggle.com/datasets/jxwleong/coral-reef-dataset
World Ocean Atlas https://www.ncei.noaa.gov/products/world-ocean-atlas
Marine Microbial Eukaryote Transcriptome Project https://gold.jgi.doe.gov/sraexperiment?id=SRX554091
Wildlife Biology Movebank Wildlife Tracking Data https://www.movebank.org/
Global Biodiversity Information Facility https://www.gbif.org/
IUCN Red List Data https://www.iucnredlist.org/resources/spatial-data-download
Wildlife Insights Camera Trap Data https://www.wildlifeinsights.org/
Living Planet Database https://livingplanetindex.org/data_portal
Human Biology UK Biobank https://www.ukbiobank.ac.uk/
Human Protein Atlas https://www.proteinatlas.org/
NHANES Health Dataset https://www.kaggle.com/datasets/cdc/national-health-and-nutrition-examination-survey
Human Cell Atlas https://www.humancellatlas.org/
GTEx Gene Expression Dataset https://gtexportal.org/home/
Robotics and Automation YCB Object and Model Set https://www.ycbbenchmarks.com/
RoboNet Dataset https://github.com/SudeepDasari/RoboNet
DROID Robot Manipulation Dataset https://droid-dataset.github.io/
Oxford RobotCar Dataset https://robotcar-dataset.robots.ox.ac.uk/