- Notifications
You must be signed in to change notification settings - Fork 15
Description
I downloaded the Core-S2L2A data and found that there are 7197 repeated grid cells in the metadata. Some of the rows with the same grid cells are totally the same and some are different in other info like product id and cloud cover. However, all these corresponding rows in the parquet are exactly the same images with same product id, (no matter whether the metadata is different in other info or totally the same). Also, since the product id from the parquet is to name the image files and some overwriting happened, after downloaded all images using the provided script, I only got 2,245,886 - 7, 197 = 2,238,689 images.
It indicates that there were probably some mismatches when generating the datasets. It is fine for me to ignore these images but I want to confirm if other metadata and images are matched perfectly.
The first 30 pairs of the repeated grid cell rows in the metadata is showed below. Most of these paired rows are not identical but some are, for example rows 6174 and 6175 are totally the same. However, all pairs of these corresponding rows in the parquet have the same image content and image names (product id).
grid_cell grid_row_u grid_col_r 667 917D_239R -917 239 668 917D_239R -917 239 1819 907D_75L -907 -75 1820 907D_75L -907 -75 1976 906D_58L -906 -58 1977 906D_58L -906 -58 2678 902D_38L -902 -38 2679 902D_38L -902 -38 2927 901D_224R -901 224 2928 901D_224R -901 224 3126 900D_227R -900 227 3127 900D_227R -900 227 3178 900D_305R -900 305 3179 900D_305R -900 305 3388 899D_315R -899 315 3389 899D_315R -899 315 3756 897D_278R -897 278 3757 897D_278R -897 278 5056 890D_332L -890 -332 5057 890D_332L -890 -332 5437 888D_325L -888 -325 5438 888D_325L -888 -325 5490 888D_156L -888 -156 5491 888D_156L -888 -156 5640 887D_317L -887 -317 5641 887D_317L -887 -317 5918 886D_61L -886 -61 5919 886D_61L -886 -61 6174 884D_339L -884 -339 6175 884D_339L -884 -339 product_id 667 S2A_MSIL2A_20160116T193712_N0201_R113_T57CWJ_20160116T193710 668 S2B_MSIL2A_20201023T174439_N0500_R069_T57CWJ_20230307T175341 1819 S2B_MSIL2A_20211016T083939_N0301_R035_T23CMK_20211016T115409 1820 S2B_MSIL2A_20221204T081929_N0400_R092_T23CMK_20221204T101225 1976 S2B_MSIL2A_20220124T083939_N0301_R035_T25CDK_20220124T113510 1977 S2B_MSIL2A_20200221T082939_N0500_R135_T25CDK_20230427T224239 2678 S2B_MSIL2A_20221130T070229_N0400_R034_T27CVL_20221130T081600 2679 S2B_MSIL2A_20191117T080929_N0500_R049_T27CVL_20230612T124130 2927 S2A_MSIL2A_20200103T221841_N0500_R086_T52CDR_20230424T192507 2928 S2B_MSIL2A_20191226T220859_N0500_R043_T52CDR_20230601T164232 3126 S2B_MSIL2A_20210201T221839_N0500_R086_T52CDR_20230516T155514 3127 S2B_MSIL2A_20200121T222829_N0500_R129_T52CDR_20230426T100350 3178 S2B_MSIL2A_20200203T191459_N9999_R027_T59CNL_20230904T043145 3179 S2B_MSIL2A_20181217T190459_N9999_R127_T59CNL_20230421T175343 3388 S2B_MSIL2A_20200205T181509_N9999_R055_T60CVR_20230905T125206 3389 S2B_MSIL2A_20201127T183459_N0500_R141_T60CVR_20230321T114502 3756 S2B_MSIL2A_20221104T182459_N0400_R098_T56CMR_20221104T213737 3757 S2B_MSIL2A_20210127T194529_N0500_R013_T56CMR_20230603T074316 5056 S2B_MSIL2A_20181221T184459_N9999_R041_T02CNS_20230421T232504 5057 S2A_MSIL2A_20160209T155812_N0201_R025_T02CNS_20160209T155809 5437 S2B_MSIL2A_20221206T172409_N0509_R126_T03CWM_20221206T190829 5438 S2B_MSIL2A_20230220T174429_N0509_R069_T03CWM_20230220T223723 5490 S2B_MSIL2A_20210215T115259_N0500_R137_T17CNM_20230518T001413 5491 S2B_MSIL2A_20201012T113309_N0500_R051_T17CNM_20230325T024812 5640 S2B_MSIL2A_20211209T164349_N0301_R097_T04CES_20211209T201608 5641 S2B_MSIL2A_20210201T171409_N0500_R083_T04CES_20230530T090343 5918 S2B_MSIL2A_20211012T085959_N0301_R121_T25CEM_20211012T121006 5919 S2B_MSIL2A_20220119T093009_N0301_R107_T25CEM_20220119T123235 6174 S2B_MSIL2A_20201203T171359_N0500_R083_T03CVM_20230303T124834 6175 S2B_MSIL2A_20201203T171359_N0500_R083_T03CVM_20230303T124834 timestamp cloud_cover nodata centre_lat centre_lon 667 2016-01-16 19:37:12 0.000000 1.0 -82.318521 161.746594 668 2020-10-23 17:44:39 0.431957 0.0 -82.318521 161.746594 1819 2021-10-16 08:39:39 15.424540 0.0 -81.422448 -45.075915 1820 2022-12-04 08:19:29 23.754629 0.0 -81.422448 -45.075915 1976 2022-01-24 08:39:39 0.000000 0.0 -81.333697 -34.436068 1977 2020-02-21 08:29:39 0.000000 0.0 -81.333697 -34.436068 2678 2022-11-30 07:02:29 22.641203 0.0 -80.973720 -21.563397 2679 2019-11-17 08:09:29 15.894633 0.0 -80.973720 -21.563397 2927 2020-01-03 22:18:41 20.046308 0.0 -80.884333 127.884393 2928 2019-12-26 22:08:59 0.000000 0.0 -80.884333 127.884393 3126 2021-02-01 22:18:39 0.000000 0.0 -80.794284 128.172593 3127 2020-01-21 22:28:29 19.668094 0.0 -80.794284 128.172593 3178 2020-02-03 19:14:59 2.530369 0.0 -80.792787 172.106859 3179 2018-12-17 19:04:59 0.000000 0.0 -80.792787 172.106859 3388 2020-02-05 18:15:09 22.548710 0.0 -80.704521 176.096834 3389 2020-11-27 18:34:59 6.223085 0.0 -80.704521 176.096834 3756 2022-11-04 18:24:59 12.573381 0.0 -80.524480 152.603947 3757 2021-01-27 19:45:29 22.404491 0.0 -80.524480 152.603947 5056 2018-12-21 18:44:59 0.000000 0.0 -79.894843 -170.246226 5057 2016-02-09 15:58:12 0.000000 1.0 -79.894843 -170.246226 5437 2022-12-06 17:24:09 9.481915 0.0 -79.714898 -163.848452 5438 2023-02-20 17:44:29 7.681234 0.0 -79.714898 -163.848452 5490 2021-02-15 11:52:59 20.495623 0.0 -79.713908 -78.524767 5491 2020-10-12 11:33:09 0.003244 0.0 -79.713908 -78.524767 5640 2021-12-09 16:43:49 2.377208 0.0 -79.625552 -158.472930 5641 2021-02-01 17:14:09 14.692396 0.0 -79.625552 -158.472930 5918 2021-10-12 08:59:59 19.077803 0.0 -79.533921 -30.054839 5919 2022-01-19 09:30:09 21.544348 0.0 -79.533921 -30.054839 6174 2020-12-03 17:13:59 6.117003 0.0 -79.356587 -165.121839 6175 2020-12-03 17:13:59 6.117003 0.0 -79.356587 -165.121839 crs 667 EPSG:32757 668 EPSG:32757 1819 EPSG:32723 1820 EPSG:32723 1976 EPSG:32725 1977 EPSG:32725 2678 EPSG:32727 2679 EPSG:32727 2927 EPSG:32752 2928 EPSG:32752 3126 EPSG:32752 3127 EPSG:32752 3178 EPSG:32759 3179 EPSG:32759 3388 EPSG:32760 3389 EPSG:32760 3756 EPSG:32756 3757 EPSG:32756 5056 EPSG:32702 5057 EPSG:32702 5437 EPSG:32703 5438 EPSG:32703 5490 EPSG:32717 5491 EPSG:32717 5640 EPSG:32704 5641 EPSG:32704 5918 EPSG:32725 5919 EPSG:32725 6174 EPSG:32703 6175 EPSG:32703 parquet_url 667 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00002.parquet 668 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00002.parquet 1819 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00004.parquet 1820 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00004.parquet 1976 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00004.parquet 1977 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00004.parquet 2678 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00006.parquet 2679 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00006.parquet 2927 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00006.parquet 2928 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00006.parquet 3126 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3127 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3178 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3179 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3388 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3389 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00007.parquet 3756 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00008.parquet 3757 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00008.parquet 5056 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5057 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5437 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5438 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5490 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5491 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00011.parquet 5640 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00012.parquet 5641 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00012.parquet 5918 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00012.parquet 5919 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00012.parquet 6174 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00013.parquet 6175 https://huggingface.co/datasets/Major-TOM/Core-S2L2A/resolve/main/images/part_00013.parquet parquet_row geometry 667 167 POINT (161.747 -82.319) 668 168 POINT (161.747 -82.319) 1819 319 POINT (-45.076 -81.422) 1820 320 POINT (-45.076 -81.422) 1976 476 POINT (-34.436 -81.334) 1977 477 POINT (-34.436 -81.334) 2678 178 POINT (-21.563 -80.974) 2679 179 POINT (-21.563 -80.974) 2927 427 POINT (127.884 -80.884) 2928 428 POINT (127.884 -80.884) 3126 126 POINT (128.173 -80.794) 3127 127 POINT (128.173 -80.794) 3178 178 POINT (172.107 -80.793) 3179 179 POINT (172.107 -80.793) 3388 388 POINT (176.097 -80.705) 3389 389 POINT (176.097 -80.705) 3756 256 POINT (152.604 -80.524) 3757 257 POINT (152.604 -80.524) 5056 56 POINT (-170.246 -79.895) 5057 57 POINT (-170.246 -79.895) 5437 437 POINT (-163.848 -79.715) 5438 438 POINT (-163.848 -79.715) 5490 490 POINT (-78.525 -79.714) 5491 491 POINT (-78.525 -79.714) 5640 140 POINT (-158.473 -79.626) 5641 141 POINT (-158.473 -79.626) 5918 418 POINT (-30.055 -79.534) 5919 419 POINT (-30.055 -79.534) 6174 174 POINT (-165.122 -79.357) 6175 175 POINT (-165.122 -79.357)