I think you should be able to get your "Updating one point took about 15 minutes" down to a few seconds by using arcpy.env.extent.
With 8,000 points the approach I am suggesting should complete within a day (worst case) - even if you write everything to disk (which I would do in initial testing), but an in_memory workspace should trim that further.
- arcpy.da.SearchCursor iterates your point feature class to read its coordinates and an identifier
- Select_analysis uses the identifier to copy a one point feature class from those 8,000
- Set arcpy.env.extent to a rectangle that is say a tenth of a metre around the coordinate
- CopyFeatures_management copies out the OSMM layer within the Geoprocessing extent i.e. almost always a single point but if you occasionally strike a boundary then you may get a few - this should take only a second or two because I frequently use this procedure on a 3.5 million polygon cadastre
- Intersect_analysis your one point feature class with your one (or few) polys feature class. If "transfer attributes of the polygons" is not all attributes then just reading/writing them via cursors may be used to speed this up to.
- Append_management your 8,000 intersected point feature classes back into a single feature class or, preferably use arcpy.da.InsertCursor to do this part a lot faster.
All in all, focus on testing step 4 first - if that is taking more than a few seconds then multiplying it by 8,000 becomes an issue.
Take care to turn arcpy.env.extent back to "MAXOF" once you have finished processing.