U

Tutor for robotics RL project

Upwork

, US Contractor Remote $30 - 70 Posted 23 hours ago

Share this job:

Job Description

I'm looking for a teacher to help me design AI networks for my robotics projects and walk me through the core concepts.

I'm a second-year university student, but I've been building robots for over 6 years but I am mostly self-taught in electronics, 3D design and programming. While I love figuring things out on my own, going solo means I'm burning time and money on mistakes that are caused by knowledge gaps.

I'm hoping to find a teacher with robotics and AI experience (specifically RL) who knows how to combine robotics, AI and simulation to train actual robots. I'm currently working with a custom Python simulator for my latest project.

Just so you know where I'm coming from: I've built around ten robots using Arduino’s, Raspberry Pi’s, and ATtiny’s, and I've tackled AI projects like balancing a quadruple inverted pendulum in realistic physics simulations, though getting it to work on the actual robot is where I got stuck (I didn’t get to implement the AI policy, I got stuck in the characterization process and RF).

The project I'm hoping to get help with right now is an automatic drawing robot. Here's a short summary of the project :

### Project Summary

My project is to build a robot able to replicate any image using an airbrush and a fine liner with artistic techniques like colour layering. Using those tools, it will be able to draw on multiple different mediums like paper, clothes, leather, etc. To generate the movements, my robot uses a open-loop (with limit switches to calibrate) control scheme that integrates RL with a differentiable physics simulation to replicate images using subtractive colour mixing (CMY) and vector-based inking (cubic Bezier curve).

The robot is a 3-axis Cartesian manipulator (similar to a CNC or 3D printer) that is driven by Klipper (software that parses G-Code and controls the 11 motors simultaneously and receives sensor inputs). The controller is a UNIX computer that communicates to an Octopus Pro (a motor controller that runs Klipper) via a Unix Domain Socket API, ensuring a deterministic execution of the G-code streams (this will also allow for robot vision in the future for a digital twin as a potential feature). For now, the plan is to rely on an offline calibration, generating LUTs for colour dynamics, point spread functions (”spread” of the airbrush) and alpha transparency (colour intensity), ensuring the simulator accurately mirrors the physical system. Right now, the design is similar to a digital twin, but cannot be qualified as one due to the lack of a real time information stream from the system to the twin.

AI Architecture Design

The plan for the core intelligence is to employ a multi-resolution architecture designed to optimize compute while keeping the twin’s fidelity. Right now, the plan for the stroke generating pipeline consists of two distinct AI networks:

1. **Strategist (Policy Network):** Utilizes a ResNet-34 (or smaller) CNN to extract features from the the multiple image inputs before passing trough the policy network. Instead of the standard global average pooling, the network employs heatmaps with 2D soft-argmax to output a 15D continuous stroke parameters (cubic Bézier control points, Z-depth, and ink feed rate). This preserves sub-pixel spatial precision for coordinate prediction.

2. **Technician (Optimization):** Given that there will be variability in each stroke (the rl policy will likely not generate optimal brushstrokes due to a lack of resolution) an additional optimizer (the technician) will optimize each stroke to achieve the lowest LPIPS (Learned Perceptual Image Patch Similarity, a way to define how close the image is to the target image that is similar to human perception) score via gradient descent. It would leverage a differentiable renderer that rasterizes strokes and composites them using the calibrated LUTs. This allows the system to back-propagate error gradients through the rendering process to fine-tune stroke trajectories before execution.

Right now, I am thinking of training the policy using either PPO, APPO or SAC. For the software stack, I plan to manage the Env’s through Gym, RL Games (or Sample factory) for the policy optimization and Optuna for the hyper-parameter optimization. To prevent reward hacking and ensure perceptual quality, the objective function minimizes LPIPS rather than pixel-wise MSE. A distinct "tiled" architecture separates the physics simulation grid (high resolution) from the policy observation input (downsampled), allowing for high-fidelity reward calculation without overloading the inputs and to reduce the computational load.