#8 : ChatGPT Code Interpreter for Data Science Workflow
AI assisted Programming using Cursor IDE and intelligent code completion tools !
Most existing LLMs that can generate text can also generate code, but these models are further trained and fine-tuned for code generation for better performance. So today, let’s look at several use cases these LLMs have opened up in programming.
⚙️ Tool : Mintlify - Code Documentation AI
So, basically, this is a free extension called Mintlify that you can install as extension for any of the IDEs, be it VS Code or PyCharm.
Let’s say you want to document a function you have written. So you just need to select this whole function and you get a prompt: “generate docs” . Just click on it and boom, It has generated a whole document for you for this particular function. So start automating document generation for your codebase today !
😇 Today’s Recipe : ChatGPT Code-Interpreter for Data Science workflow .
Code Interpreter is an experimental version of ChatGPT that has access to python interpreter in a sandbox environment. It allows you to upload files and perform operations on them. Thus, it improves ChatGPT’s logic abilities by giving it the ability to use python and allowing you to develop simple games or software tools.
Today, we will be using code interpreter for Data Science workflow. Note that code interpreter is available to only ChatGPT+ subscribers. Below is the prompt formula we will be using.
I want you to act as a data analyst. You are performing data analysis on the XYZ dataset. Perform the following task: Task1
{ Add more instructions if needed }
I have 5 tasks below that follows the Data Science workflow :
Data Preparation → Exploration and visualization (EDA) → Data Storytelling
( Each of these tasks must be followed sequentially. Additionally, it is important to note that I am not requesting immediate execution of these tasks. This is where your domain expertise comes into play. Instead of viewing it as an autonomous agent, consider this tool as a complement to your Data science proficiency. )
Provide a simple description of the columns in this dataset.
Identify data cleaning issues and bring them to my attention. Do not perform any data cleaning tasks without further instructions. Identify other ways we can improve the quality of the dataset.
{ I would like you to handle missing values accordingly : Drop the rows where X and Y column data are missing. For Z, make missing rows into “NaN” }
For handling outliers, I would like you to handle outliers for X and Y columns. Can you please provide strategies for dealing with outliers? Do not perform any tasks without further instructions.
{ I would like you to winsorize the columns, where we replace the outliers with 99% percentile data. Can you perform it please ? }
{ Can you visualize the newly created columns in a boxplot ? }
Next we can perform analysis. Provide at least 5 subquestions we should investigate to further understand the dataset. Do not perform any data analysis without further instructions, and me signing off on the questions.
{ I would like you to answer following 3 questions about dataset : Q1, Q2 , Q3. }
I would now like you to create a report aimed at the executive that contains the following :
a. A description of the dataset
b. Structure the report into headers and subheaders.
c. Provide placeholders for where to use images to make for an easy reading experience.
d. Tailor the language you use for an executive.
🔗 Article : AI-Assisted Programming
Artificial Intelligence has transformed various industries, including software development. An AI-powered code suggestion feature examines patterns, learns from existing codebases (primarily open source), and offers real-time suggestions (e.g. completing code snippets, suggesting function signatures, providing context-aware suggestions). This greatly diminishes the time and effort needed to write top-notch code.
Since Copilot, a generation of VSCode plugins have launched (including Cody, Tabnine , Codium and Codeium ), only to be challenged by Copilot X itself. ~50% of code typed through VS Code is now written by Copilot. By using the right combination of tools, this percentage can go even higher !
When selecting a specific tool it is important to understand its capabilities. For instance,
With CodiumAI, you can easily and quickly create comprehensive test suites that help you ensure the reliability and correctness of your software. ( Supports Python, Javascript and Typescript )
Tabnine enhances the coding experience with intelligent code completion, error detection and fixes, refactoring assistance, and automatic code documentation, aiding developers in writing efficient and high-quality code.
Next, If you haven’t heard of Cursor, you may have been living under a rock. It’s an AI powered IDE that can do amazing things :
a. Want to make complex changes across several files at once? Cursor can suggest editable multifile diffs.
b. Want to ask questions about your existing codebase or external docs/code? Cursor can help with both.
c. Accidentally introduced subtle bugs? AI-powered linting will flag problematic regions of code and suggest quick fixes.
Within the cursor IDE, you can install any of the plugins discussed earlier. Here's a brief video showcasing Cursor IDE in action :
Remember that code completion tools work best when they understand the context of your code. Clear and concise comments, variable names, and function definitions help the tool grasp the purpose and requirements of your code. Properly documented code enhances the accuracy and relevance of suggestions, allowing you to write code faster and with fewer errors. You now know that even these tasks can be handled using AI-enabled plugins, resulting in illegal levels of productivity for you !