We address a robotic flow shop scheduling problem where two part types are processed on each given set of dedicated machines. A single robot moving on a fixed rail transports one part at a time, and the processing times of the parts vary on the machines within a given time interval. We use a reinforcement learning (RL) approach to obtain efficient robot task sequences to minimise makespan. We model the problem with a Petri net used for a RLenvironment and develop a lower bound for the makespan. We then define states, actions, and rewards based on the Petri net model; further, we show that the RL approach works better than the first-in-first-out (FIFO) rule and the reverse sequence (RS), which is extensively used for cyclic scheduling of a robotic flow shop; moreover, the gap between the makespan from the proposed algorithm and a lower bound is not large; finally, the makespan from the RL method is compared to an optimal solution in a relaxed problem. This research shows the applicability of RL for the scheduling of robotic flow shops and its efficiency by comparing it to FIFO, RS and a lower bound. This work can be easily extended to several other variants of robotic flow shop scheduling problems.