To assert that the tuples are in fact one element each, so you get an exception if the assumption is violated, rather than silently discarding data, you can use iterable unpacking in any of the following three forms to extract the element from a known single-element iterable:
[el for el, in p] [el for (el,) in p] [el for [el] in p]
All three are 100% equivalent in behavior and performance, it's mostly about which form you feel is most readable.
As a bonus, unpacking is typically a little faster than indexing (only about 5-10% on Python 3.11, not enough to worry about if you wanted to ignore extra elements, but if your code's correctness relies on the assumption that there is exactly one element per tuple, this gets you that check automatically, with negative cost).
If the tuples might be variable length and you want to keep the contents, itertools.chain.from_iterable is the way to go:
from itertools import chain list(chain.from_iterable(p)) # Or [*chain.from_iterable(p)] if you prefer
which will avoid discarding any data.
Lastly, if you're just looking to optimize taking the first element and you're really okay with your assumptions being violated, you could micro-optimize this a tiny bit for large inputs:
from operator import itemgetter list(map(itemgetter(0), p)) # Or [*map(itemgetter(0), p)] if you prefer
but that's a small, and shrinking (as they make the interpreter faster) benefit over the listcomp, and probably not worth the trouble unless you're sure this is the hot loop and there is no way to change what you're doing to improve it algorithmically.
On Performance
Some performance notes to address your concern about "for loops" due to "my Teacher in university suggested me to avoid using for loop when it is possible to use "built in " function cause they are built in C so they are really really faster".
Plain for loops repeatedly appending to a list will be slower.
lst = [] for x, in p: lst.append(x)
will lose out on performance due to the cost of repeatedly loading lst from the stack, looking up its append method, and calling it through generalized code paths. That said, as of 3.11, they put a bunch of cached and self-modifying/specializing bytecode in the interpreter, so it doesn't actually cost that much more anymore.
map with a built-in function implemented in C can win. But even before 3.11, the gains were small (if unpacked to a list, typically no more than 10% faster than an equivalent listcomp, and only slightly faster than that against the higher CPU overhead generator expression equivalent). As of 3.11, with the massive interpreter speed improvements, the differences have shrunk even more.
For a case example, here's the timings I get running IPython microbenchmarks on Python 3.11.6 on my local machine, making a 10K element list of one-tuples, then converting it to a list of their contents with one of the three approaches given in the main body of my answer, or the explicit for loop I mentioned as being slower:
>>> import random >>> from itertools import chain; from operator import itemgetter >>> %%timeit p = [(random.randrange(1000),) for _ in range(10000)] ... [x for x, in p] ... 346 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) >>> %%timeit p = [(random.randrange(1000),) for _ in range(10000)] ... [*map(itemgetter(0), p)] ... 344 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) >>> %%timeit p = [(random.randrange(1000),) for _ in range(10000)] ... [*chain.from_iterable(p)] ... 491 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) >>> %%timeit p = [(random.randrange(1000),) for _ in range(10000)] ... lst = [] ... for x, in p: ... lst.append(x) ... 388 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
As you can see:
map with itemgetter(0) (both of which are C-level built-ins, though itemgetter could stand to be optimized a titch better for special cases like being given a single integer index) does win, but only barely; a mere 2 µs difference per loop (and the standard deviation is higher than the difference; in further runs, map did tend to win more often than not, but it wasn't a sure thing), less than 1% of the total time.
Using chain.from_iterable lost badly, despite being a built-in implemented entirely in C, presumably because chain assumes the sub-iterables can be anything and has no optimizations for tuples, let alone one-tuples (it constructs an iterator for each of them, pulls from it twice, with the second pull failing each time, and moves on to the next). Being a C built-in is no guarantee of speed if it can't specialize to the task at hand.
While the plain for loop, without a listcomp involved, did lose, it only took about 12% longer than the listcomp, and 13% longer than map. I'd favor the listcomp over the plain for loop for brevity and clarity, reserving the for loop for more complicated work, but speed-wise? Not going to be your problem. Actually processing the data will almost certainly involve an order of magnitude or so more effort (you're getting this from a database, which, unless it's an SQLite DB in memory, involves disk or network access, either of which will be much slower than Python); unpacking it is going to be a pretty irrelevant to the overall performance of your code.
tupleas the base; it's repeated concatenation, throwing away the oldtupleeach time to build newtuples at every step. It'sO(n²)where it could beO(n).