{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyze dataset\n", "In this notebook, we analyze the dataset structure and statistics.\n", "\n", "## Verify dataset and create statistics\n", "We verify that every folder (= session) contains exactly three subfolders:\n", "* Lapse (time lapse images taken every hour)\n", "* Motion (images taken on motion)\n", "* Full (pre-selected motion images with actual moving entities, should be a subset of Motion)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from py.Dataset import Dataset, DatasetStatistics\n", "from py.FileUtils import list_jpegs_recursive\n", "\n", "from tqdm import tqdm\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 32 sessions\n" ] } ], "source": [ "DIR = '/home/AMMOD_data/camera_traps/BayerWald/Vielkadaver-Projekt/'\n", "\n", "ds = Dataset(DIR)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Beaver_01', 'Marten_01', 'Raccoon_01', 'Reddeer_01', 'Roedeer_01', 'Wildboar_01', 'Badger_02', 'Beaver_02', 'Fox_02', 'Marten_02', 'Raccoon_02', 'Rat_02', 'Reddeer_02', 'Roedeer_02', 'Badger_03', 'Fox_03', 'Raccoon_03', 'Reddeer_03', 'Wildboar_03', 'Badger_04', 'Rat_04', 'Reddeer_04', 'Wildboar_04', 'Badger_05', 'Beaver_05', 'Ermine_05', 'Fox_05', 'Marten_05', 'Raccoon_05', 'Reddeer_05', 'Roedeer_05', 'Wildboar_05']\n" ] } ], "source": [ "print(ds.get_sessions())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 32 sessions with unique names. The name consists of the animal type and a session number. We will not count the images in all subfolders to create more advanced statistics." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 32/32 [00:34<00:00, 1.09s/it]\n" ] } ], "source": [ "# Only run this when statistics have not been created yet.\n", "# Otherwise, use the code in the cell below to load the stats from file.\n", "stats = ds.create_statistics()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It takes a while to create the statistics, so it makes sense to save them for later usage. The statistics can be restored using the load() method or via the constructor." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loaded from dataset_stats.npy.\n" ] } ], "source": [ "stats = DatasetStatistics(load_from_file=\"dataset_stats.npy\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the stats." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LapseMotionFullTotal
Badger_02172847152026645
Badger_03464245674358
Badger_0456480192728
Badger_05117438601085142
Beaver_0117346952002629
Beaver_02172728902704887
Beaver_0513212415323768
Ermine_0586723801353382
Fox_0295711102002267
Fox_033854952065739
Fox_051083753651901
Marten_01246231052005767
Marten_0217268832002809
Marten_05890161707017130
Raccoon_018503751201345
Raccoon_0216393321622133
Raccoon_0335475100610
Raccoon_05108916001162805
Rat_026268451181589
Rat_043964096775
Reddeer_011628938020011208
Reddeer_0216356752002510
Reddeer_03102717701412938
Reddeer_04461690891825
Reddeer_0510781825116619495
Roedeer_011380388201840218
Roedeer_0288827702063864
Roedeer_051090131767914345
Wildboar_01173228951374764
Wildboar_0346515102663
Wildboar_043922501072396
Wildboar_05903222638623252
Z_Total315791679184390203887
\n", "
" ], "text/plain": [ " Lapse Motion Full Total\n", "Badger_02 1728 4715 202 6645\n", "Badger_03 46 4245 67 4358\n", "Badger_04 56 480 192 728\n", "Badger_05 1174 3860 108 5142\n", "Beaver_01 1734 695 200 2629\n", "Beaver_02 1727 2890 270 4887\n", "Beaver_05 1321 2415 32 3768\n", "Ermine_05 867 2380 135 3382\n", "Fox_02 957 1110 200 2267\n", "Fox_03 38 5495 206 5739\n", "Fox_05 1083 753 65 1901\n", "Marten_01 2462 3105 200 5767\n", "Marten_02 1726 883 200 2809\n", "Marten_05 890 16170 70 17130\n", "Raccoon_01 850 375 120 1345\n", "Raccoon_02 1639 332 162 2133\n", "Raccoon_03 35 475 100 610\n", "Raccoon_05 1089 1600 116 2805\n", "Rat_02 626 845 118 1589\n", "Rat_04 39 640 96 775\n", "Reddeer_01 1628 9380 200 11208\n", "Reddeer_02 1635 675 200 2510\n", "Reddeer_03 1027 1770 141 2938\n", "Reddeer_04 46 1690 89 1825\n", "Reddeer_05 1078 18251 166 19495\n", "Roedeer_01 1380 38820 18 40218\n", "Roedeer_02 888 2770 206 3864\n", "Roedeer_05 1090 13176 79 14345\n", "Wildboar_01 1732 2895 137 4764\n", "Wildboar_03 46 515 102 663\n", "Wildboar_04 39 2250 107 2396\n", "Wildboar_05 903 22263 86 23252\n", "Z_Total 31579 167918 4390 203887" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.view()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Total 1.000000\n", "Full 0.021532\n", "Lapse 0.154885\n", "Motion 0.823584\n", "Name: Z_Total, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.df.iloc[-1] / 203887" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have a total of 203,887 images. Overall, there are significantly more motion images than lapse images. We can take a better look by plotting a bar chart:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "stats.plot_sessions(exclude_last_row=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Verify that 'Full' folder is a subset of 'Motion'\n", "We expect the 'Full' folder to be a subset of 'Motion'. The following code checks that by iterating over all files in 'Full' for every session and looking for them in 'Motion' of the same session." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1cac1c192a0149ab8c5cc0a2d52247a6", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/32 [00:00 fullFilesFound:\n", " ok = False\n", "if not ok:\n", " print(\"There were files missing!\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check sessions for duplicates\n", "\n", "Using the check_lapse_duplicates() function, the Lapse folder is checked for duplicate dates and duplicate files. We call a duplicate inconsistent or deviant, if the two images show different scenes but have the same associated date." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 3 sessions\n", "Session 'Marten_01' at folder: ./ResizedSessions_NoBackup/VIELAAS_Spring_Session01-VIELAAS_Marten_01\n", "Loaded scans.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1733/1733 [00:00<00:00, 10210.85it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "* 1733 lapse dates\n", "* 729 duplicates\n", "* 0 multiples (more than two files per date)\n", "* 108 deviant duplicates: [datetime.datetime(2021, 6, 17, 23, 0), datetime.datetime(2021, 6, 18, 0, 0), datetime.datetime(2021, 6, 18, 1, 0), datetime.datetime(2021, 6, 18, 2, 0), datetime.datetime(2021, 6, 18, 3, 0), datetime.datetime(2021, 6, 18, 4, 0), datetime.datetime(2021, 6, 18, 5, 0), datetime.datetime(2021, 6, 18, 6, 0), datetime.datetime(2021, 6, 18, 7, 0), datetime.datetime(2021, 6, 18, 8, 0), datetime.datetime(2021, 6, 18, 9, 0), datetime.datetime(2021, 6, 18, 10, 0), datetime.datetime(2021, 6, 18, 11, 0), datetime.datetime(2021, 6, 18, 12, 0), datetime.datetime(2021, 6, 18, 13, 0), datetime.datetime(2021, 6, 18, 14, 0), datetime.datetime(2021, 6, 18, 15, 0), datetime.datetime(2021, 6, 18, 16, 0), datetime.datetime(2021, 6, 18, 17, 0), datetime.datetime(2021, 6, 18, 18, 0), datetime.datetime(2021, 6, 18, 19, 0), datetime.datetime(2021, 6, 18, 20, 0), datetime.datetime(2021, 6, 18, 21, 0), datetime.datetime(2021, 6, 18, 22, 0), datetime.datetime(2021, 6, 18, 23, 0), datetime.datetime(2021, 6, 19, 0, 0), datetime.datetime(2021, 6, 19, 1, 0), datetime.datetime(2021, 6, 19, 2, 0), datetime.datetime(2021, 6, 19, 3, 0), datetime.datetime(2021, 6, 19, 4, 0), datetime.datetime(2021, 6, 19, 5, 0), datetime.datetime(2021, 6, 19, 6, 0), datetime.datetime(2021, 6, 19, 7, 0), datetime.datetime(2021, 6, 19, 8, 0), datetime.datetime(2021, 6, 19, 9, 0), datetime.datetime(2021, 6, 19, 10, 0), datetime.datetime(2021, 6, 19, 11, 0), datetime.datetime(2021, 6, 19, 12, 0), datetime.datetime(2021, 6, 19, 13, 0), datetime.datetime(2021, 6, 19, 14, 0), datetime.datetime(2021, 6, 19, 15, 0), datetime.datetime(2021, 6, 19, 16, 0), datetime.datetime(2021, 6, 19, 17, 0), datetime.datetime(2021, 6, 19, 18, 0), datetime.datetime(2021, 6, 19, 19, 0), datetime.datetime(2021, 6, 19, 20, 0), datetime.datetime(2021, 6, 19, 21, 0), datetime.datetime(2021, 6, 19, 22, 0), datetime.datetime(2021, 6, 19, 23, 0), datetime.datetime(2021, 6, 20, 0, 0), datetime.datetime(2021, 6, 20, 1, 0), datetime.datetime(2021, 6, 20, 2, 0), datetime.datetime(2021, 6, 20, 3, 0), datetime.datetime(2021, 6, 20, 4, 0), datetime.datetime(2021, 6, 20, 5, 0), datetime.datetime(2021, 6, 20, 6, 0), datetime.datetime(2021, 6, 20, 7, 0), datetime.datetime(2021, 6, 20, 8, 0), datetime.datetime(2021, 6, 20, 9, 0), datetime.datetime(2021, 6, 20, 10, 0), datetime.datetime(2021, 6, 20, 11, 0), datetime.datetime(2021, 6, 20, 12, 0), datetime.datetime(2021, 6, 20, 13, 0), datetime.datetime(2021, 6, 20, 14, 0), datetime.datetime(2021, 6, 20, 15, 0), datetime.datetime(2021, 6, 20, 16, 0), datetime.datetime(2021, 6, 20, 17, 0), datetime.datetime(2021, 6, 20, 18, 0), datetime.datetime(2021, 6, 20, 19, 0), datetime.datetime(2021, 6, 20, 20, 0), datetime.datetime(2021, 6, 20, 21, 0), datetime.datetime(2021, 6, 20, 22, 0), datetime.datetime(2021, 6, 20, 23, 0), datetime.datetime(2021, 6, 21, 0, 0), datetime.datetime(2021, 6, 21, 1, 0), datetime.datetime(2021, 6, 21, 2, 0), datetime.datetime(2021, 6, 21, 3, 0), datetime.datetime(2021, 6, 21, 4, 0), datetime.datetime(2021, 6, 21, 5, 0), datetime.datetime(2021, 6, 21, 6, 0), datetime.datetime(2021, 6, 21, 7, 0), datetime.datetime(2021, 6, 21, 8, 0), datetime.datetime(2021, 6, 21, 9, 0), datetime.datetime(2021, 6, 21, 10, 0), datetime.datetime(2021, 6, 21, 11, 0), datetime.datetime(2021, 6, 21, 12, 0), datetime.datetime(2021, 6, 21, 13, 0), datetime.datetime(2021, 6, 21, 14, 0), datetime.datetime(2021, 6, 21, 15, 0), datetime.datetime(2021, 6, 21, 16, 0), datetime.datetime(2021, 6, 21, 17, 0), datetime.datetime(2021, 6, 21, 18, 0), datetime.datetime(2021, 6, 21, 19, 0), datetime.datetime(2021, 6, 21, 20, 0), datetime.datetime(2021, 6, 21, 21, 0), datetime.datetime(2021, 6, 21, 22, 0), datetime.datetime(2021, 6, 21, 23, 0), datetime.datetime(2021, 6, 22, 0, 0), datetime.datetime(2021, 6, 22, 1, 0), datetime.datetime(2021, 6, 22, 2, 0), datetime.datetime(2021, 6, 22, 3, 0), datetime.datetime(2021, 6, 22, 4, 0), datetime.datetime(2021, 6, 22, 5, 0), datetime.datetime(2021, 6, 22, 6, 0), datetime.datetime(2021, 6, 22, 7, 0), datetime.datetime(2021, 6, 22, 8, 0), datetime.datetime(2021, 6, 22, 9, 0), datetime.datetime(2021, 6, 22, 10, 0)]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "ds = Dataset(\"./ResizedSessions_NoBackup/\")\n", "res = ds.create_session(\"marten_01\").check_lapse_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright © 2023 Felix Kleinsteuber and Computer Vision Group, Friedrich Schiller University Jena" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.6.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 2 }