Merge master-2.x commits 2023/06/01~2023/06/30 into main#17717
Merged
alluxio-bot merged 18 commits intoAlluxio:mainfrom Jul 3, 2023
Merged
Merge master-2.x commits 2023/06/01~2023/06/30 into main#17717alluxio-bot merged 18 commits intoAlluxio:mainfrom
alluxio-bot merged 18 commits intoAlluxio:mainfrom
Conversation
Fix LocalPageStore NPE. I encountered the following exception while using local cache ``` 2023-05-31T17:25:13.453+0800 ERROR 20230531_092513_00010_uqx2a.1.0.0-7-153 alluxio.client.file.cache.NoExceptionCacheManager Failed to put page PageId{FileId=76f9c79d5d43c725de31295c263291e0, PageIndex=534}, cacheContext CacheContext{cacheIdentifier=null, cacheQuota=alluxio.client.quota.CacheQuota$1@1f, cacheScope=CacheScope{id=.}, hiveCacheContext=null, isTemporary=false} java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "java.lang.Exception.getMessage()" is null at alluxio.client.file.cache.store.LocalPageStore.put(LocalPageStore.java:80) at alluxio.client.file.cache.LocalCacheManager.putAttempt(LocalCacheManager.java:345) at alluxio.client.file.cache.LocalCacheManager.putInternal(LocalCacheManager.java:274) at alluxio.client.file.cache.LocalCacheManager.put(LocalCacheManager.java:234) at alluxio.client.file.cache.CacheManagerWithShadowCache.put(CacheManagerWithShadowCache.java:52) at alluxio.client.file.cache.NoExceptionCacheManager.put(NoExceptionCacheManager.java:55) at alluxio.client.file.cache.CacheManager.put(CacheManager.java:196) at alluxio.client.file.cache.LocalCacheFileInStream.localCachedRead(LocalCacheFileInStream.java:218) at alluxio.client.file.cache.LocalCacheFileInStream.bufferedRead(LocalCacheFileInStream.java:144) at alluxio.client.file.cache.LocalCacheFileInStream.readInternal(LocalCacheFileInStream.java:242) at alluxio.client.file.cache.LocalCacheFileInStream.positionedRead(LocalCacheFileInStream.java:287) at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:153) at alluxio.hadoop.HdfsFileInputStream.readFully(HdfsFileInputStream.java:170) at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) at io.trino.filesystem.hdfs.HdfsInput.readFully(HdfsInput.java:42) at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64) at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:120) at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:330) at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31) at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149) at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93) ``` NO pr-link: Alluxio#17552 change-id: cid-f35c64b837748c1d46ba7092dae5ad6ef5003bb7 ### What changes are proposed in this pull request? [DOCFIX]Update cn version of docs/_data/table/cn/master-metrics.yml ### Why are the changes needed? The Chinese docs/_data/table/cn/master-metrics.yml doc is problematic: the description of Master.CreateDirectoryOps is wrong. This PR synchronizes these updates. ### Does this PR introduce any user facing changes? Developers can get to know Alluxio in Chinese easily. pr-link: Alluxio#17011 change-id: cid-15f7792d87a6a9c2d537ce5ff82f697dc9b38117
[DOCFIX] Update cn version of OSS doc The Chinese ufs/OSS doc is not updated with the latest changes, this PR synchronizes these updates. Developers can get to know Alluxio in Chinese easily. pr-link: Alluxio#16932 change-id: cid-fcfee42cb97534141c02c74b8011f31f015dce17
…n2 doc [DOCFIX] Update cn version of Azure-Data-Lake-Gen2 doc The Chinese ufs/Azure-Data-Lake-Gen2 doc is not updated with the latest changes, this PR synchronizes these updates. Developers can get to know Alluxio in Chinese easily. pr-link: Alluxio#16929 change-id: cid-f0c0ac2323fda070c69a1a1b4ea0d7be76c4f0f3
[DOCFIX] Update cn version of Azure-Data-Lake doc Chinese reader could more easily understand Azure Data Lake. Developers can get to know Alluxio in Chinese easily. pr-link: Alluxio#16921 change-id: cid-3b03ba284873b8d99e789be9acde4fc98028993b
[DOCFIX] Update cn version of Deep-Leaning doc The Chinese solutions/Deep-Leaning doc is not updated with the latest changes, this PR synchronizes these updates. Developers can get to know Alluxio in Chinese easily. pr-link: Alluxio#16920 change-id: cid-c75d057724dd6cd9db3da4605acc7bd7c9ece702
### What changes are proposed in this pull request? The purpose of this pr is to fix some incorrect code in WorkerWebUILogs. ### Why are the changes needed? There is some incorrect code in WorkerWebUILogs, we should fix them. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#17052 change-id: cid-47d099e6705ded625466492faa846a3ae6232cf1
### What changes are proposed in this pull request? When the standby master send a heartbeat request to the leader master, many times there is no trace. When necessary, we should record their correspondence. ### Why are the changes needed? Adding some logs is necessary, here are some reasons: 1. When the master node fails, logs can help troubleshoot the cause. 2. It can make the communication between the standby master and the leader master clearer. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#17204 change-id: cid-ec012f3a6736d030f9f15a141254df8cf77d861b
### What changes are proposed in this pull request? The purpose of this pr is to improve the comments related to DefaultMetaMaster and add some @links. ### Why are the changes needed? In DefaultMetaMaster, some of the documentation descriptions are not clear enough, and we should fix them as much as possible. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#17202 change-id: cid-3ca5c75c9374dfd5e92be806d115811a737c3aff
### What changes are proposed in this pull request? Remove table operations from user operation docs. ### Why are the changes needed? The SDS (table) service is deprecated. The link to `#table-operations` is dead. ### Does this PR introduce any user facing changes? Yes. pr-link: Alluxio#17581 change-id: cid-ea608c71ef24c22da3399ce744c6937ccbc7eb45
…reads to fail Fix read failure when a mismatch occurs between the block size recorded in memory by `BlockMeta` and the length of the actual physical block file. When such a mismatch is detected, the block is removed from worker storage, and worker falls back to reading from UFS. This causes reading the block to fail. No. pr-link: Alluxio#17564 change-id: cid-2758ff97e5016c1aae7ededb244e919233ae6d3b
In Section 6.4 Write to UFS Only (THROUGH), write completion and persistence are described in the wrong order. In the case of using UFS only, write should be completed after persistence to ensure that the written data will not be lost. Suggest to update the sentence ”This write type ensures that data will be persisted after the write completes“ to ”This write type ensures that data will be persisted before the write completes“ pr-link: Alluxio#17536 change-id: cid-76189e192afafd96afe410aeedec73eb65e3e161
…[DOCFIX] Fix some descriptions related to the FileSystemMaster module The purpose of this pr is to fix some descriptions related to the FileSystemMaster module. In the FileSystemMaster module, there are some inappropriate descriptions or comments, and we should fix them as much as possible. Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#17215 change-id: cid-361c9b28832a3797941987663080f7e2a57d4965
…adSleeper Implements light thread sleeper to support invoke a sleeping sleeper to determine whether need continue to sleep. Without this feature, when we need to refresh the interval of Heartbeat thread, it cannot take effect. No pr-link: Alluxio#17298 change-id: cid-a4c471eb06e6389868278fab088556c8d7a23986
### What changes are proposed in this pull request? Calls shutdown on the executor service in the S3 UFS class. ### Why are the changes needed? If shutdown is not called the threads will not be garbage collected. ### Does this PR introduce any user facing changes? No pr-link: Alluxio#15748 change-id: cid-86d16e80118882044bf529a619b7915c8451eb03
### What changes are proposed in this pull request? Fix deadlock issue when master process exit ### Why are the changes needed? ``` stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Shutdown.exit(java.base@11.0.12-ga/Shutdown.java:173) - **waiting to lock <0x00007f55e98d1920> (a java.lang.Class for java.lang.Shutdown)** at java.lang.Runtime.exit(java.base@11.0.12-ga/Runtime.java:116) at java.lang.System.exit(java.base@11.0.12-ga/System.java:1752) at alluxio.ProcessUtils.fatalError(ProcessUtils.java:83) at alluxio.ProcessUtils.fatalError(ProcessUtils.java:63) at alluxio.master.journal.MasterJournalContext.waitForJournalFlush(MasterJournalContext.java:99) at alluxio.master.journal.MasterJournalContext.close(MasterJournalContext.java:109) - locked <0x00007f5602000010> (a alluxio.master.journal.MasterJournalContext) at alluxio.master.journal.StateChangeJournalContext.close(StateChangeJournalContext.java:55) at alluxio.master.block.DefaultBlockMaster.getNewContainerId(DefaultBlockMaster.java:906) - locked <0x00007f594a000298> (a alluxio.master.block.BlockContainerIdGenerator) at alluxio.master.file.meta.InodeDirectoryIdGenerator.initialize(InodeDirectoryIdGenerator.java:83) at alluxio.master.file.meta.InodeDirectoryIdGenerator.getNewDirectoryId(InodeDirectoryIdGenerator.java:57) - **locked <0x00007f55e197de68> (a alluxio.master.file.meta.InodeDirectoryIdGenerator)** at alluxio.master.file.meta.InodeTree.createPath(InodeTree.java:979) at alluxio.master.file.DefaultFileSystemMaster.createDirectoryInternal(DefaultFileSystemMaster.java:2746) at alluxio.master.file.InodeSyncStream.loadDirectoryMetadataInternal(InodeSyncStream.java:1374) at alluxio.master.file.InodeSyncStream.loadDirectoryMetadata(InodeSyncStream.java:1294) at alluxio.master.file.TxInodeSyncStream.concurrentLoadMetadata(TxInodeSyncStream.java:123) at alluxio.master.file.TxInodeSyncStream.loadMetadataForPath(TxInodeSyncStream.java:98) at alluxio.master.file.InodeSyncStream.syncInodeMetadata(InodeSyncStream.java:743) at alluxio.master.file.InodeSyncStream.syncInternal(InodeSyncStream.java:491) at alluxio.master.file.InodeSyncStream.sync(InodeSyncStream.java:409) at alluxio.master.file.DefaultFileSystemMaster.syncMetadata(DefaultFileSystemMaster.java:4075) at alluxio.master.file.TxFileSystemMaster.syncMetadata(TxFileSystemMaster.java:298) at alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1111) at alluxio.master.file.TxFileSystemMaster.listStatus(TxFileSystemMaster.java:827) ... ``` ``` stackTrace: java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@11.0.12-ga/Native Method) - waiting on <no object reference available> at java.lang.Thread.join(java.base@11.0.12-ga/Thread.java:1300) - waiting to re-lock in wait() <0x00007f55e98d0918> (a java.lang.Thread) at java.lang.Thread.join(java.base@11.0.12-ga/Thread.java:1375) at java.lang.ApplicationShutdownHooks.runHooks(java.base@11.0.12-ga/ApplicationShutdownHooks.java:107) at java.lang.ApplicationShutdownHooks$1.run(java.base@11.0.12-ga/ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(java.base@11.0.12-ga/Shutdown.java:130) at java.lang.Shutdown.exit(java.base@11.0.12-ga/Shutdown.java:174) - locked <0x00007f55e98d1920> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(java.base@11.0.12-ga/Runtime.java:116) at java.lang.System.exit(java.base@11.0.12-ga/System.java:1752) at alluxio.ProcessUtils.fatalError(ProcessUtils.java:83) at alluxio.ProcessUtils.fatalError(ProcessUtils.java:63) at alluxio.master.journal.MasterJournalContext.waitForJournalFlush(MasterJournalContext.java:99) at alluxio.master.journal.MasterJournalContext.close(MasterJournalContext.java:109) - locked <0x00007f5c0a60bda0> (a alluxio.master.journal.MasterJournalContext) at alluxio.master.journal.StateChangeJournalContext.close(StateChangeJournalContext.java:55) at alluxio.master.journal.FileSystemMergeJournalContext.close(FileSystemMergeJournalContext.java:90) - locked <0x00007f5c0a60bde0> (a alluxio.master.journal.FileSystemMergeJournalContext) at alluxio.master.file.RpcContext.closeQuietly(RpcContext.java:141) at alluxio.master.file.RpcContext.close(RpcContext.java:129) at alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1227) at alluxio.master.file.TxFileSystemMaster.listStatus(TxFileSystemMaster.java:827) ``` ``` java.lang.Thread.State: BLOCKED (on object monitor) at alluxio.master.file.meta.InodeDirectoryIdGenerator.peekDirectoryId(InodeDirectoryIdGenerator.java:76) - **waiting to lock <0x00007f55e197de68> (a alluxio.master.file.meta.InodeDirectoryIdGenerator)** at alluxio.master.file.DefaultFileSystemMaster.stop(DefaultFileSystemMaster.java:787) at alluxio.master.file.TxFileSystemMaster.stop(TxFileSystemMaster.java:245) at alluxio.master.AbstractMaster.close(AbstractMaster.java:156) at alluxio.master.file.DefaultFileSystemMaster.close(DefaultFileSystemMaster.java:800) at alluxio.master.file.TxFileSystemMaster.close(TxFileSystemMaster.java:581) at alluxio.Registry.close(Registry.java:156) at alluxio.master.AlluxioMasterProcess.stop(AlluxioMasterProcess.java:412) - locked <0x00007f55e51e2ba8> (a java.util.concurrent.atomic.AtomicBoolean) at alluxio.ProcessUtils.lambda$stopProcessOnShutdown$0(ProcessUtils.java:98) at alluxio.ProcessUtils$$Lambda$363/0x00007f54f02dc840.run(Unknown Source) at java.lang.Thread.run(java.base@11.0.12-ga/Thread.java:829) ``` The blocked listStatus thread(called thread1) wait to get lock <0x00007f55e98d1920> , while it is owned by another wating listStatus thread(called thread2) which also want to exit, thread2 wait the hook process(alluxio-process-shutdown-hook) finished and then continue exit. The alluxio-process-shutdown-hook wait to get lock <0x00007f55e197de68>, which is owned by thread1. ### Does this PR introduce any user facing changes? No pr-link: Alluxio#17628 change-id: cid-d077517ba20445ebe95520a1710c173b41810f6d
dbw9580 approved these changes Jul 3, 2023
elega approved these changes Jul 3, 2023
Contributor Author
| alluxio-bot, sync-merge this please |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Merge missing commits from master-2.x to main. The commits in 2023/05/01~2023/05/31 from main...master-2.x will be included by this PR.
Some commits are skipped because they have been manually ported to
main4a57bab
a84b6e6
ec066dc
This commit 11af4ee is not a cherry-pick but a missed out clean up work for #17638, where a test should be removed together with the Block API.