I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain confidentiality but its very similar to what I'm working on) Each example consists of the following structured features:
1. Per-Student Features (N × M features per example) Each class has N students, and for each student, we track:
[Student ID, Attendance Rate, Homework Completion, Participation Score, Study Hours, Previous Exam Score, Quiz Average, Extra Credit, Stress Level]
Since there are N students in the class, and each student has M attributes, this results in N × M features.
2. Class-Level Features (K features per example) The entire class has shared attributes that might impact all students, such as:
[Class ID, Average GPA of Class, Class Size, Instructor Experience, Subject Difficulty] This results in K features per class.
3. Exam-Level Features (L features per example) The specific exam being analyzed has attributes like:
[Exam Difficulty, Exam Length, Number of Questions, Type of Questions (MCQ vs Essay)] This results in L features per exam.
4. Target Label: The goal is to predict the student who will achieve the highest score in the exam.
This is represented as a one-hot encoded vector of size N, where the student who gets the highest score is marked as 1.
My Question:
Given these structured inputs, what is the best way to represent them in a neural network?
Should I concatenate all features into a single vector (N × M + K + L features)?
Since student features are relative to the class and exam, does it make sense to flatten everything into one input vector? Would it be beneficial to treat per-student features as a sequence ((N, M)) instead of concatenating everything?
Could using an LSTM (but order doesnt matter here?) or Transformer help better model dependencies between students? Should I use separate encoders for per-student, class-level, and exam-level features, then merge them later?
For example, using a feedforward network for class/exam features and an LSTM/Transformer for per-student features before concatenation. Has anyone worked with structured data like this before? What are the best practices for feeding this type of input into a neural network?