I am following this tutorial for language detection using machine learning. In the dataset I am using, however, there are multiple variables as features. I tried, in the place of X = data["Text"], X = df["message", "fingers", "tail"],(message, fingers, and tail are the three feature variables I am using) but it throws a KeyError;
Traceback (most recent call last): File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc return self._engine.get_loc(casted_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc File "pandas\\_libs\\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\\_libs\\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: ('message', 'fingers', 'tail') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Users\usr\Downloads\thecode.py", line 13, in <module> X = df["message", "fingers", "tail"] ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__ indexer = self.columns.get_loc(key) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\usr\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc raise KeyError(key) from err KeyError: ('message', 'fingers', 'tail') How should I implement code so as to use all features without throwing errors?
X=df[["message", "fingers", "tail"]]x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.20). It says it found input variables with inconsistent numbers of samples: [3,500]. Could you help with that too?Xandyare not the same length, from the error message one of your variables has 3 rows, and the other has 500.X, the entire data frame, andy, a vector of labels, don't have the same length. For example, it looks like your data frameXonly has 3 rows and you have 500 labels iny. I'd check your code to make sure you haven't accidentally sliced your data frame with something likedf=df.head()or adf = df.dropna()before definingX(and that your data frame is complete)