how to convert generated data into pandas dataframe

Question

from sklearn.datasets import make_classification df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18, class_sep=2, n_informative=4)

after creating the data. it is tuple and after converting tuple into pandas dataframe

 df = pd.DataFrame(data, columns=["1","2","3","4","5","6","7","8","9"])

so i got 9 features (columns) but when i try to insert 9 cols it says.

ValueError: Shape of passed values is (2, 1), indices imply (2, 9)

Basically i wanna generate data and convert it into pandas dataframe but could not get to it. error is:

jhmt · Accepted Answer · 2021-04-26 12:18:57Z

The first entry of the tuple contains the feature data and the the second entry contains the class labels. So if you want to make a pd.dataframe of the feature data you should use pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"]).

2er0 · Accepted Answer · 2021-04-26 12:21:24Z

The make_classification returns a tuple with two NumPy arrays. Just use the first result of the tuple result.

Have a look at the return type in the Sklearn documentation.

import pandas as pd pd.DataFrame(df[0])

result:

 0 1 2 ... 6 7 8 0 1.223113 -1.962002 -0.288322 ... -2.152126 1.563291 2.790191 1 -0.239416 -3.782512 -1.587514 ... -0.519075 1.218147 -0.543413 2 -1.275076 -1.354999 -1.030673 ... -0.866303 1.915653 2.526826 3 -0.516765 -2.098868 -1.034506 ... 0.470277 1.917153 0.849975 4 -0.893197 -2.489030 1.012410 ... 3.562431 2.806255 -2.825570 ... ... ... ... ... ... ... ... 9995 -1.665167 -1.106121 -0.381195 ... 0.543236 2.406625 2.216029 9996 -0.783265 -1.405607 0.257606 ... -0.251951 2.167685 2.461260 9997 2.341676 -3.382589 -0.120150 ... 0.066099 2.453412 -0.758382 9998 -0.662257 -1.531187 -0.709562 ... 0.156203 2.495238 2.452315 9999 -0.756892 -4.895147 -0.385215 ... 0.898117 2.624591 -2.188389

Plus: There is a mismatch between the import and the usage:

!!! from sklearn.datasets import make_regression !!! df = make_classification(…)

Parham Hassani · Accepted Answer · 2024-06-13 05:05:58Z

The make_classification from sklearn.datasets returns a tuple with two arrays. the first entry of this tuple is the value of features and the the second value is the results for each sample. This code converts the data into a data frame of Pandas.

from sklearn.datasets import make_classification import pandas as pd df= make_classification(n_samples=2, n_features=3, n_redundant=0, n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1) df_list = list(df[1]) df = pd.DataFrame(df[0],columns=['Item1','Item2','Item3']) df = df.assign(Results= df_list )

this is the result of this code:

index	Item1	Item2	Item3	Results
0	0.4542479646686235	-0.49801066491099416	1.3198274314649199	0
1	1.2497447542635314	-1.0534352345831717	0.5920838260510906	0

How to Ask says no pictures when posting code. How to Answer probably as well! Please paste the output of print(df) instead.

user12188405 · Accepted Answer · 2021-04-26 12:23:27Z

df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18, class_sep=2, n_informative=4)

This line returns a tuple in which the first entry has Feature values or 'X' and the second entry has target values.

So, to make it a pandas data frame you have to make slice it like this,

df = pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"])

Full Code:

from sklearn.datasets import make_classification import pandas as pd df = make_classification(n_samples=10000, n_features=9, n_classes=1, random_state = 18, class_sep=2, n_informative=4) df = pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"]) print(df)

Output:

Collectives™ on Stack Overflow

how to convert generated data into pandas dataframe

4 Answers 4

Comments

Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Related