#MemoryError clf.fit(X)
#500k dataset [0,0,0]
# I cannot explore the data if I cannot create a pickle so how do I process this data set in batches and get accurate results?
###################################################
df = pd.read_csv('uberunixtime.csv')
df.drop(['Base','DateTime'], 1, inplace=True)
df.convert_objects(convert_numeric=True).dtypes
df.dropna(inplace=True)
df['timeSeconds'] = df['timeSeconds']/10
X = np.array(df)
X = preprocessing.scale(X)
clf = MeanShift()
clf.fit(X)
Seems like you're not seeing any deprecation warnings, in the newer versions of pandas convert_objects/convert_numeric is deprecated, I presume due to memory issues.
Are you not seeing a deprecation warning?
If so:
pip freeze > freeze.txt
pip install --upgrade pandas
and re-run your python file, you'll then see the deprecation warnings.
Instead of convert_objects(convert_numeric=True) try using pd.to_numeric(), you only need timeSeconds converted to numerical type, converting one field instead of all is one memory optimization step I see clearly visible.
To see other memory issues, try using a memory profiler to find the memory bottlenecks. The profiler will tell you which line numbers are hogging up resources.
memory_profiler and line_profiler are two profilers that I use when bottleneck resolution is required.
pip install memory_profiler
pip install line_profiler
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit