0
$\begingroup$

I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain confidentiality but its very similar to what I'm working on) Each example consists of the following structured features:

1. Per-Student Features (N × M features per example) Each class has N students, and for each student, we track:

[Student ID, Attendance Rate, Homework Completion, Participation Score, Study Hours, Previous Exam Score, Quiz Average, Extra Credit, Stress Level]

Since there are N students in the class, and each student has M attributes, this results in N × M features.

2. Class-Level Features (K features per example) The entire class has shared attributes that might impact all students, such as:

[Class ID, Average GPA of Class, Class Size, Instructor Experience, Subject Difficulty] This results in K features per class.

3. Exam-Level Features (L features per example) The specific exam being analyzed has attributes like:

[Exam Difficulty, Exam Length, Number of Questions, Type of Questions (MCQ vs Essay)] This results in L features per exam.

4. Target Label: The goal is to predict the student who will achieve the highest score in the exam.

This is represented as a one-hot encoded vector of size N, where the student who gets the highest score is marked as 1.

My Question:

Given these structured inputs, what is the best way to represent them in a neural network?

Should I concatenate all features into a single vector (N × M + K + L features)?

Since student features are relative to the class and exam, does it make sense to flatten everything into one input vector? Would it be beneficial to treat per-student features as a sequence ((N, M)) instead of concatenating everything?

Could using an LSTM (but order doesnt matter here?) or Transformer help better model dependencies between students? Should I use separate encoders for per-student, class-level, and exam-level features, then merge them later?

For example, using a feedforward network for class/exam features and an LSTM/Transformer for per-student features before concatenation. Has anyone worked with structured data like this before? What are the best practices for feeding this type of input into a neural network?

$\endgroup$

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.