Yes, BASIC is much slower than assembly for many operations. For an easy example, try out this program on a Commodore 64 or emulator:
for i = 1024 to 1984 : poke i,peek(i) or 128 : next
You will see each character on the screen reverse, row by row, over the course of ten seconds. By contrast, the exact same routine in machine language inverts the entire screen in a fraction of a second; there's almost no perceptable gap between the first character and last character being inverted. (The source and a BASIC loader for it are appended below, if you want to see how it works or run it yourself.)
The two main issues that make it much slower are that each line of BASIC is read and interpreted before it's executed, and the data formats used by BASIC often have much higher overhead than the wider variety of formats one can use in machine language.
In some cases the latter is due to BASIC not using the most efficient formats it has available. For example, BASIC always uses floating point for the index of a for loop rather than having extra code to determine whether it could use integer variables instead. Thus, adding one to i in the code above ends up executing machine-language procedures to copy several bytes of data to the FAC (floating point accumator), do the floating point addition, and copy it back out. This is many dozens of instructions, whereas a loop that meets the restrictions that allow integers to be used (as in the machine-language routine below) can do its math in a small handful of instructions.
In other cases, BASIC just doesn't support at all the kind of techniques and formats you can use in assembler. As Harper points out in a comment below, unrolling the loop in the following assembly routine would save some arithmetic and several memory lookups, probably doubling the speed of the routine. That kind of optimization is something that assembler programmers can do in the right circumstances, and you can't really work at the level at all in BASIC.
Appendix
The following is a machine language routine to invert the screen on a Commodore 64 in a way similar to how it was done in BASIC above. Note that this is deliberately not optimized; it's written instead with an eye towards clarity and generality. (For example, a simple change could make this update 32 KB, rather than just 1 KB.)
All numbers in the listing are in hexadecimal (base 16). The # in front of some of them means to load that actual number itself into the A or Y register; otherwise it's loading data from the address in memory specified by that number. In the case of the [addr],Y references, it's loading a 16-bit address from addr, adding the Y register to that value, and that determines the memory location of the load or store. We need to do this because the Y register is only 8 bits, holding values up to only FF (256 decimal), so we need to count through 256 four times to to read and write all 1024 screen addresses. (Actually, there are only 960 displayed on the screen, but we do 4×256 to keep the code simple.)
00FC addr .equ 00fc ; unused zero-page location C000 A9 00 invscr: lda #00 ; screen RAM start low byte C002 85 FC sta addr ; unused zero-page location C004 A9 04 lda #04 ; screen RAM start high byte C006 85 FD sta addr+1 ; unused zero-page location C008 A0 00 nextpage: ldy #00 ; set 8-bit register Y to 0 C00A B1 FC nextchar: lda [addr],Y ; load character from addr + Y C00C 09 80 ora #80 ; set bit 7 to make it inverse C00E 91 FC sta [addr],y ; store modified character C010 C8 iny ; increment Y C011 D0 F7 bne nextchar ; branch back if y != 0 C013 E6 FD inc addr+1 ; increment 16-bit screen address by 256 C015 A5 FD lda addr+1 C017 C9 08 cmp #08 ; reached end of screen? C019 D0 ED bne nextpage C01B 60 rts
And here's a BASIC program that will load the routine; you can run it after that with sys 49152.
10 loc=49152 : rem store the routine at $c000 20 read v: if v = -1 then end 30 poke loc,v : loc = loc + 1 : goto 20 50 data 169,0,133,252,169,4,133,253 60 data 160,0,177,252,9,128,145,252,200,208,247 70 data 230,253,165,253,201,8,208,237,96 90 data -1
PEEKat andPOKEinto address anywhere in RAM. You were basically doing pointer arithmetic and writing to addresses directly. But your programs were stored as strings which were interpreted, so it was much slower than assembly.%suffix) in Applesoft (Apple II Basic) - after all, the first Apple II Basic ("Integer Basic") only used integer variables in the first place. Of course you had to use them, or your loops would still be slow.