So for my project I use STM32L4R5 (custom PCB) and need to connect parallel 8 wires to control addresses in a sensor. I am toggling 8 GPIO pins in parallel providing the addresses, but speed is crazy low (I would need 10~20MHz). I checked several other similar problems in different sites and optimized as much as I can. I guess it may be very simple error but what I have at the moment is as follows:
basically, I have 48MHz crystal, I use PLL to go up to 100MHz for system clock, 80MHz for ADC (different topic...). I actually touched the code a few times testing between 48MHz and less so I paste also the final clock settings in code:
void SystemClock_Config(void) { RCC_OscInitTypeDef RCC_OscInitStruct = {0}; RCC_ClkInitTypeDef RCC_ClkInitStruct = {0}; RCC_PeriphCLKInitTypeDef PeriphClkInit = {0}; if (HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1) != HAL_OK) { Error_Handler(); } RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI48|RCC_OSCILLATORTYPE_HSE; RCC_OscInitStruct.HSEState = RCC_HSE_ON; RCC_OscInitStruct.HSI48State = RCC_HSI48_ON; RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON; RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE; RCC_OscInitStruct.PLL.PLLM = 12; RCC_OscInitStruct.PLL.PLLN = 50; RCC_OscInitStruct.PLL.PLLP = RCC_PLLP_DIV2; RCC_OscInitStruct.PLL.PLLQ = RCC_PLLQ_DIV2; RCC_OscInitStruct.PLL.PLLR = RCC_PLLR_DIV2; if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK) { Error_Handler(); } RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK |RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2; RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;//RCC_SYSCLKSOURCE_HSE;//RCC_SYSCLKSOURCE_PLLCLK; RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;//RCC_SYSCLK_DIV1; RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV1; RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV1;//RCC_HCLK_DIV4; if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_4) != HAL_OK) { Error_Handler(); } if(vers==1){ PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_I2C1|RCC_PERIPHCLK_USB |RCC_PERIPHCLK_ADC; PeriphClkInit.I2c1ClockSelection = RCC_I2C1CLKSOURCE_PCLK1; PeriphClkInit.AdcClockSelection = RCC_ADCCLKSOURCE_PLLSAI1;//RCC_ADCCLKSOURCE_SYSCLK PeriphClkInit.UsbClockSelection = RCC_USBCLKSOURCE_HSI48; PeriphClkInit.PLLSAI1.PLLSAI1Source = RCC_PLLSOURCE_HSE; PeriphClkInit.PLLSAI1.PLLSAI1M = 6; PeriphClkInit.PLLSAI1.PLLSAI1N = 20; PeriphClkInit.PLLSAI1.PLLSAI1P = RCC_PLLP_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1Q = RCC_PLLQ_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1R = RCC_PLLR_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1ClockOut = RCC_PLLSAI1_ADC1CLK; } if(vers==2){ PeriphClkInit.PeriphClockSelection = RCC_PERIPHCLK_I2C1|RCC_PERIPHCLK_ADC; PeriphClkInit.I2c1ClockSelection = RCC_I2C1CLKSOURCE_SYSCLK;//RCC_I2C1CLKSOURCE_PCLK1; PeriphClkInit.AdcClockSelection = RCC_ADCCLKSOURCE_PLLSAI1;//RCC_ADCCLKSOURCE_PLLSAI1;//RCC_ADCCLKSOURCE_SYSCLK PeriphClkInit.PLLSAI1.PLLSAI1Source = RCC_PLLSOURCE_HSE; PeriphClkInit.PLLSAI1.PLLSAI1M = 6; PeriphClkInit.PLLSAI1.PLLSAI1N = 20; PeriphClkInit.PLLSAI1.PLLSAI1P = RCC_PLLP_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1Q = RCC_PLLQ_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1R = RCC_PLLR_DIV2; PeriphClkInit.PLLSAI1.PLLSAI1ClockOut = RCC_PLLSAI1_ADC1CLK; } if (HAL_RCCEx_PeriphCLKConfig(&PeriphClkInit) != HAL_OK) { Error_Handler(); } //__HAL_FLASH_SET_LATENCY(FLASH_LATENCY_4); __HAL_FLASH_PREFETCH_BUFFER_ENABLE(); __HAL_FLASH_DATA_CACHE_ENABLE(); } I added 4 Flash wait states according to the reference manual. Now we reach the funny part. My original code was change into the BSRR register (commented in the loop). I noticed this line takes about 30 tics to execute. So I made an array "lineindex" with all 32bit values to be written in BSRR, no calculations required; Oscilloscope on the lowest pin gave me this: 
I was surprised by the low value. Run from flash/RAM brought like 100-200kHz difference. So spent 2 days studying how to do the same using DMA. Now I use TIM1 to trigger DMA to make MEM2PERIPH transfer (MEM source is the 32bit word array, sending data through DMA directly to BSRR). If I get it right MCU is not used during the burst sends (160transfers at once), manual says bus can run on 120MHz (set at 100MHz now), so... what happens? Code:
//Timer static void MX_TIM1_Init(void) { /* 3 (Timer TIM1) */ __HAL_RCC_TIM1_CLK_ENABLE(); htim1.Instance = TIM1; htim1.Init.Prescaler = 0; htim1.Init.CounterMode = TIM_COUNTERMODE_UP; htim1.Init.Period = 7; htim1.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1; htim1.Init.RepetitionCounter = 0; HAL_TIM_Base_Init(&htim1); /* 7 (Enable TIM for DMA events) */ __HAL_TIM_ENABLE_DMA(&htim1, TIM_DMA_UPDATE); TIM_ClockConfigTypeDef sClockSourceConfig = {0}; sClockSourceConfig.ClockSource = TIM_CLOCKSOURCE_INTERNAL; if (HAL_TIM_ConfigClockSource(&htim1, &sClockSourceConfig) != HAL_OK) { Error_Handler(); } } // DMA2 (includes ADC code on channel3, channel6 is used for the current loop) static void MX_DMA_Init(void) { /* DMA controller clock enable */ __HAL_RCC_DMAMUX1_CLK_ENABLE(); __HAL_RCC_DMA2_CLK_ENABLE(); __DMA2_CLK_ENABLE(); /* DMA interrupt init */ hdma_adc1.Instance = DMA2_Channel3; hdma_adc1.Init.Direction = DMA_PERIPH_TO_MEMORY; hdma_adc1.Init.PeriphInc = DMA_PINC_DISABLE; hdma_adc1.Init.MemInc = DMA_MINC_ENABLE; hdma_adc1.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD; hdma_adc1.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD; hdma_adc1.Init.Mode = DMA_CIRCULAR; hdma_adc1.Init.Priority = DMA_PRIORITY_HIGH; //HAL_DMA_DeInit(&hdma_adc1); HAL_DMA_Init(&hdma_adc1); __HAL_LINKDMA(&hadc1, DMA_Handle, hdma_adc1); DMAMUX1_Channel9->CCR &= ~( DMAMUX_CxCR_DMAREQ_ID ); DMAMUX1_Channel9->CCR = 0x5; /* 4 (DMA2 Channel6) */ hdma_tim1_uev.Instance = DMA2_Channel6; hdma_tim1_uev.Init.Direction = DMA_MEMORY_TO_PERIPH; hdma_tim1_uev.Init.PeriphInc = DMA_PINC_DISABLE; hdma_tim1_uev.Init.MemInc = DMA_MINC_ENABLE; hdma_tim1_uev.Init.PeriphDataAlignment = DMA_PDATAALIGN_WORD; // 16 bits hdma_tim1_uev.Init.MemDataAlignment = DMA_MDATAALIGN_WORD; hdma_tim1_uev.Init.Mode = DMA_NORMAL; hdma_tim1_uev.Init.Priority = DMA_PRIORITY_LOW; HAL_DMA_Init(&hdma_tim1_uev); __HAL_LINKDMA(&htim1,hdma[TIM_DMA_ID_UPDATE],hdma_tim1_uev); DMAMUX1_Channel12->CCR &= ~( DMAMUX_CxCR_DMAREQ_ID ); DMAMUX1_Channel12->CCR = 46U; //DMAMUX1_Channel9->CCR &= ~( DMAMUX_CxCR_DMAREQ_ID ); //DMAMUX1_Channel9->CCR |= ( 0x8 << DMAMUX_CxCR_DMAREQ_ID_Pos ); /* DMA interrupt init */ /* DMA1_Channel1_IRQn interrupt configuration */ HAL_NVIC_SetPriority(DMA2_Channel3_IRQn, 0, 0); HAL_NVIC_EnableIRQ(DMA2_Channel3_IRQn); HAL_NVIC_SetPriority(DMA2_Channel6_IRQn, 0, 0); HAL_NVIC_EnableIRQ(DMA2_Channel6_IRQn); /* DMAMUX1_OVR_IRQn interrupt configuration */ HAL_NVIC_SetPriority(DMAMUX1_OVR_IRQn, 0, 0); HAL_NVIC_EnableIRQ(DMAMUX1_OVR_IRQn); } // Loop in main for (counteri=1;counteri<100000;counteri++){ HAL_DMA_Start_IT(htim1.hdma[TIM_DMA_ID_UPDATE],(uint32_t) lineindex,(uint32_t)&GPIOF->BSRR, 160); __HAL_TIM_ENABLE(&htim1); while((TIM1->CR1 & 1U)); } I know the timer Period is a magic number. Changed timer speed from like 5MHz to 50MHz. The shown magic number leads to the maximum loop speed of 6MHz I got, so just left it there.
Now the questions. What am I doing wrong? In other posts I saw reasonable speeds of 15-20MHz with pin toggle function. Direct writing in the registers and even excluding the MCU did not help. I guess it is either some clock error I don't see, or just bus waiting, but it looks strange since there is nothing else going there at the moment. I noticed that in debug mode one asm("NOP"); takes 4 tics. No change regardless whether I run from Flash or RAM (thought it is wait states at first). On slower clock I couldn't even reach 1MHz. Also, big worry is the transient peaks. -0.7~-0.8V to 4V? I wonder whether this is some oscilloscope settings issue or it is normal?
To add more, the code bellow is used since I read analog data from the sensor:
for (counteri=1;counteri<100000;counteri++){ for(uint8_t vc = 1; vc < 321; vc++){ GPIOF -> BSRR = lineindex[vc-1];//(vc) | (((~vc) & 0xFF) << (16)); //col asm("NOP"); ADC1->CR =268435461U;//|= 4U; // start ADC reading while(!(ADC1->ISR & ADC_ISR_EOS)); // wait for ADC conversion to complete ADC1->ISR &= 14U; } } More magic numbers, but tried to eliminate all possible calculations even when values are written in registers. So, with ADC, reading 4 channels using DMA, at 80MHz, 2.5cycles sampling, 12bit, I got less than 500kHz. ADC should be 5.88Msps, 4 channels with some overhead, I would expect at least double, but please if you have any ideas correct me: 


// Loop in main ... for (counteri=1;counteri<100000;counteri++){ ...it looks like you're waiting for the timer withwhile((TIM1->CR1 & 1U));. But surely you should be waiting for the DMA to complete rather than the timer? Each timer event will trigger a single DMA transfer, so as you have it now it seems to me that you only get a single DMA transfer before restarting it over again. \$\endgroup\$