1
$\begingroup$

I'm trying to manipulate some data in Biolabs Orange, using the built in Python Script widget and information at Biolabs Orange tutorial on scripting.

However, I'm struggling with taking the results of SMOTE and putting them into a format for Orange:

This is the code I'm using in the Python Script widget:

# Get libraries import Orange import numpy as np from Orange.data import Domain, Table from imblearn.over_sampling import SMOTE #in_data = Orange.data.Table('WORKING_temp.csv') df = in_data.copy() # set variables for SMOTE sm = SMOTE(random_state=42) # get table of data (X) and class variables (y) X, y = df.X, df.Y # resample data and classes X_res, y_res = sm.fit_sample(X, y) df.X = X_res df.Y = y_res temp = Orange.data.Table(df.X, df.Y) temp.domain = df.domain out_data = Orange.data.Table(temp) 

The result is a ValueError, which I think is related to changing the length of the class variables and data table, while leaving the original index length?

"ValueError: could not broadcast input array from shape (3724,10) into shape (3724)" 
$\endgroup$

2 Answers 2

0
$\begingroup$

With some help from the Orange team, I was able to solve the problem! Link to thread on github

The following code can be pasted into a Python Script widget, and will balance your classes before training a classifier in Orange:

# Get libraries import Orange import numpy as np from Orange.data import Domain, Table from imblearn.over_sampling import SMOTE # How Orange passes data to widget df = in_data.copy() # set variables for SMOTE sm = SMOTE(random_state=42) # get table of data (X) and class variables (y) X, y = df.X, df.Y # resample data and classes X_res, y_res = sm.fit_sample(X, y) # Get the target and feature variables d = Domain(df.domain.attributes, df.domain.class_vars) # Create a new Orange Table object with the appropriate headers # This is how Orange passes the data on to the next widget out_data = Orange.data.Table(d, X_res, y_res) 
$\endgroup$
2
  • $\begingroup$ Thanks, that helps me too, appreciate your looping back from the Orange team's help and posting it back here! (Interestingly, the scikit-learn being used by my update Orange installation 13.3 - was an outdated rev 0.18 of scikit-learn - so I had to upgrade to sklearn 0.19.1 within the Orange installation first, before installing the latest imblearn package - before getting SMOTE as used in the script to work) $\endgroup$ Commented May 4, 2018 at 1:25
  • $\begingroup$ This answer no longer works with the current version of imblearn $\endgroup$ Commented May 28, 2019 at 2:37
0
$\begingroup$

This works with the latest version of imblearn. You have to import imblearn from Orange's add-ons, add-more menu though.

# Get libraries import Orange import numpy as np from Orange.data import Domain, Table from imblearn.over_sampling import SMOTE # How Orange passes data to widget df = in_data.copy() # set variables for SMOTE sm = SMOTE(random_state=42) # get table of data (X) and class variables (y) X, y = df.X, df.Y # resample data and classes X_res, y_res = sm.fit_resample(X, y) # Get the target and feature variables d = Domain(df.domain.attributes, df.domain.class_vars) # Create a new Orange Table object with the appropriate headers # This is how Orange passes the data on to the next widget out_data = Orange.data.Table(d, X_res, y_res) 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.